immutability - Using Cassandra to store immutable data? -


we're investigating options store , read lot of immutable data (events) , i'd feedback on whether cassandra fit.

requirements:

  1. we need store 10 events per seconds (but rate increase). each event small, 1 kb.
  2. a important requirement need able replay events in order. fine read data in insertion order (like table scan) explicit sort might not necessary.

querying data in other way not prime concern , since cassandra schema db don't suppose it's possible when events come in many different forms? cassandra fit this? if there 1 should aware of?

i've had exact same requirements "project" (rather tool) year ago, , used cassandra , didn't regret. in general fits well. can fit quite lot of data in cassandra cluster , performance impressive (although might need tweaking) , natural ordering nice thing have.

rather expressing benefits of using it, i'll rather concentrate on possible pitfalls might not consider before starting.

you have think schema. data naturally ordered within 1 row clustering key, in case timestamp. however, cannot order data between different rows. might ordered after query, not guaranteed in way don't think it. there kind of way write query before 2.1 believe (using order by , disabling paging , allowing filtering) introduced bad performance , don't think possible now. should order data between rows on querying side.

this might issue if have multiple variable types (such temperature , pressure) have replayed @ same time, , put them in different rows. have rows different variable types, resorting on querying side. way put variable types in 1 row, filtering subset issue solve.

rowlength limited 2 billion elements, , although seems lot, not unreachable time series data. because don't want near 2 billions, keep lower in hundreds of millions maximum. if put parameter on split rows (some increasing index or rounding day/month/year) have implement in query logic well.

experiment queries first on dummy example. cannot arbitrarily use <, > or = in queries. there specific rules in sql filtering, or using clause..

all in these things might seem important, not of hassle when know cassandra bit. i'm underlining them give heads up. if not logical @ first fall understanding why , whole theory data distribution , ring topology.

don't expect collections within columns, length limited ~65000 elements.

don't fall misconception batched statements faster (this 1 classic :) )


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -