SQL and NoSQL Data Model Optimization in Write-Optimized Databases*
Though SQL and NoSQL data models are usually very different, write-optimized databases offer some similar benefits for both. I'll describe some of the background and implementation of write-optimized data structures, and we'll discuss some strategies for choosing data models to get the best performance out of them.
Data modeling for database performance requires some knowledge of the underlying data structures. Most databases rely on B-trees, where storing data is about as expensive as getting it back again, and this leads developers and DBAs to make certain choices about how to model their data. SQL and NoSQL data models are usually very different, but most existing strategies are built on assumptions that the underlying data structure’s performance will be that of a B-tree.
Write-optimized data structures like LSM-trees and fractal trees (like those used in TokuDB and TokuMX, Cassandra, LevelDB, and RocksDB) have very different performance characteristics from B-trees. In particular, they make it so that storing data is far less expensive than looking it up, and often provide significant compression advantages. Because of this, they are best served by different data modeling strategies, and admit new models that B-tree databases can’t support well.
In this talk, I’ll describe some of the theoretical background on these types of data structures and the specific implementation used by TokuDB and TokuMX (but algorithms classes won’t be a prerequisite). We’ll then discuss some strategies for creating better SQL and NoSQL data models using write-optimized databases and I’ll give some real-world examples.
Databases, performance, data modeling, SQL, nosql, mysql, mongodb, tokudb, tokumx
https://www.youtube.com/watch?v=q6BnG74FZMQ is a data structures talk I gave at Percona Live a while back, http://www.slideshare.net/leifwalsh/introducing-tokumx-the-performance-engine-for-mongodb-nycrb-20131210 are some slides from a lightning talk recently, most of my speaking experience other than this is in the classroom
Leif Walsh is a software engineer at Tokutek, working on TokuDB and TokuMX. He has worked on performance-critical software at Google and Microsoft, and helped start RethinkDB. Leif studied math and computer science at Stony Brook University.