You might think the details of how your database operates internally is arcane knowledge for ops witches, but no! Understanding how it performs writes, reads, and stores data can help you develop better data models that can more effectively support your query patterns and dramatically improve the performance of your application. In this talk, we’ll examine using Cassandra, modeling some data and using composite columns to aggregate our data based on expected query patterns. We’ll explore how composite columns allow us to filter data by taking advantage of how columns are stored alphanumerically on disk. We’ll observe how the data is represented and how that differs from how the data is stored internally. Then we’ll dig into the details of how Cassandra performs writes and reads on a single node, and talk about the commit log, memtable, and SSTables. Finally, we’ll conclude with a war story about how switching compaction strategies cut our query times in half.
Amy Hanlon Website: http://www.mathamy.com/ Twitter: @amygdalama
Biography I’m a software engineer on Venmo’s Scaling team in New York City.
I’m also a Recurse Center alum, where I compiled a Harry Potter-themed Python interpreter and converted a picture of my cat to sound.
Previously I studied pure math and did some data analysis and machine learning in Austin, TX.