Monday, July 30, 2012

Wabash Wisdom

At a fortunate moment of indecision, I decided to go all the way to Udupi, a restaurant 50 miles away from my dear WestLala, lured by the prospect of gobbling idlis and vadas (yes, you read the verbs right). As an after-effect of the above decision, I was forced to go for a jog in order to fit myself into my T-shirts again. In a certain sense, I'm happy that I decided to jog, since jogging along the northern parts of Wabash Heritage trail was one of my to-do things at West Lafayette. Just in case you are wondering, yes, I'm still looking for the second item in that list.  But let that be. For, as a fall out of my expedition today, I learnt many a thing that I'm postulating here as the dozen pearls of jogging wisdom

I learnt 
  • that Rajinikanth is right, as always. He once said "vazhkai oru vattam da" (roughly translates to 'circle of life')
  • that it takes at least 7 miles of running for your stomach to feel normal after hogging 11 medu vadas.
  • that less traversed under-the-bridge pathways are a popular destination for men with OAB, no matter what country you are in.
  • that 'The Road not taken' is an inspired poetry is a lie. Robert Frost had once taken a stroll along the N. 9th Street, Lafayette.
  • that all roads in West Lafayette lead to the Happy Hollow park.
  • why India does not compete in track and field events - we are just not meant to be athletes.
  • It is final.  The winner in the category "best music track for inspired running" is Bhag bhag DK Bose. Sanjay Subrahmanyan's Upacaharamu cheseva comes a distant second.
  • that fitness Apps in iPhone are not to be trusted for predicting distances. Ever. 
  • what Jenny meant when she said "Run Forrest, run."
  • that it is a good idea to take bath after running 9 miles . Apparently you turn into an effective rodent repellent after 7 miles.
  • deer in west lafayette is not a fable. I saw two by the US-52.
  • the meaning of life


WestLalaa rhyming pet name for West Lafayette, influenced by hours of watching innumerable tamizh songs with completely worthless lyrics.  
Bhag bhag DK Bose - hindi equivalent for encouraging a certain gentleman who goes by the name of D.K.Bose, to run. Really. Even Mr.Aamir Khan said so.
Upacaharamu cheseva - a carnatic composition by Saint Thyagaraja in the raga Bhairavi.
Rajinikanth - "      " intentionally left blank, undefined - the inventors of Boson particle at CERN are still working on this
Medu vadas - fried salted spiced lentil donuts. Salt - no sweet,.
US-52 - a very popular and landmark destination connecting the historic towns of Lafayette and West Lafayette across the might Wabash. All visitors of the Walmart definitely pass by this historic location.

A note to the posterity - this is the route that I ran today. May you find peace and prosperity and enough oil to light your lamps!

View Larger Map

Wednesday, June 20, 2012

Understanding Cassandra's consistency and conflicts

After a long time, here comes another technical entry into my blog. I have been playing around with Cassandra trying to understand it as a system and one of the things that had often come up in many forums is the difficulty in understanding Cassandra's consistency. In this post, I hope to consolidate what I inferred from various forums, solutions and problems extended by many others who have been working with Cassandra. I intend this post to be useful to someone who has already had a flavor of what Cassandra is about and is familiar with the fundamental concepts of Distributed Systems and data stores. If not, I strongly recommend skimming through some of the basics here before proceeding further with this post.

A different ACID

Consistency in Cassandra doesn't directly relate to the 'C' in ACID. Consistency in traditional database systems refers to transactional consistency which ensures the correctness of the state for a given DB transaction. Whereas, in Cassandra, consistency of data across its replicas. In fact, it would be simpler to view this as the consistency of data that can be observed by an external client or client side consistency.  The consistency that is observed within Cassandra cluster (which might be different from that observed by an external client) can be defined as server side consistency. Cassandra does not provide transactional consistency (across multiple reads/writes) which is traded off for the higher speed and scalability. If essential, transactions and transactional consistency has to be implemented by the client. 

No rollbacks! 

An important implication of the above fact is that a Cassandra cluster (or simply cluster) could have partial writes (or writes in progress) but would not provide a roll-back mechanism for any (potentially) failed operations. For example, consider a Cassandra cluster with 3 nodes (N1, N2, N3), a replication factor (RF) of 3 and Read-Write Consistency Level (CL) of 2. Consider a write to X  is initiated on nodes N1,N2 and node N1 fails while the write is in progress. The write to N2 would succeed, a timeout is reported to client, but the write on N2 is not rolled-back as would have been the case with traditional databases. Fixing this inconsistency by retrying the failed operation or any corrective mechanism is the responsibility of the client. I believe, the only case of a true failure reported by Cassandra is when not enough nodes in the cluster are live for a given operation.  

Eventually consistent

"So what you are saying is, Cassandra cannot provide any consistency guarantees what-so-ever." No - this is a common misconception of many people that I have observed in many a user forum. Cassandra is eventually consistent. Huh!? Okay. Let me put it this way. Cassandra can be as consistent as you want it to be. The condition for strong consistency is 

R + W > N, where 
N - Number of replicas
W - Number of nodes that need to agree for a successful write
R - Number of nodes that need to agree for a successful read

And if R + W <= N, we say that the cluster is configured to have weak consistency or is eventually consistent. For example, consider the Figure 1. The system has a Read consistency (R CL) of quorum and a Write consistency (W CL) of ANY ( at least 1) and is therefore said to be eventually consistent. Since writes can succeed with just one node,  (W3) write 3 to N1 at time T0 and  (W5) write 5 to N2 at time T1 can happen independently to the same variable X. However, we can see from right hand side of the figure that a read  (R CL = quorum) at time T2 can result in different values depending on the set of nodes (N1, N2) or (N1,N3) which are chosen to serve the read request. 

Okay, I see that there can be inconsistency. But will it always remain so? And the answer is No - thanks to the read repairs that happens on the background. In both the cases illustrated here the read repairs (shown in purple) will ensure that subsequent reads will have converged on the same value for X. And this is why we say that Cassandra is eventually consistent.

Strong Consistency

Alright, now what would happen if I were to have R + W > N. Let us consider the extreme case where W CL = ANY and R CL = ALL shown in Figure 2. In this case, for the read to succeed, all replicas need to be in agreement and therefore have to be consistent before we respond back to the client.

Conflict resolution

Hold on, how did you decide that 5 and not 3 is the correct value? I didn't, Cassandra did. To resolve conflicts, all columns in Cassandra has a time stamp associated with it. Since T1 > T0 in our example, 5 becomes a later write and is therefore assumed to be correct. It is therefore evident that the nodes in the Cassandra cluster need to be synchronized in their measure of time to be semantically correct.

I thought Cassandra used vector clocks, no? After going through a number of threads and forums, I realized that this is not true. Vector clocks and version vectors are popular methods used for conflict identification. However, Cassandra already employs a per column time stamp for resolving conflicts thereby obviating the need for a causal ordering that is provided by the vector clocks.

Okay, I have synchronized my clocks. But what if I have a truly concurrent write with the same time stamp? In the unlikely case that you precisely end up with two time stamps that match in its microsecond, you might end up with a bad version but Cassandra ensures that ties are consistently broken by comparing the byte values.

Partial writes

Done. I did my math and made my cluster strongly consistent. Am I safe? The answer is both yes and no. This subtle but interesting scenario comes up when we have failures, which brings in the notion of partial writes. Consider Figure 3 which is the same example as shown before, but at time T1, the node N2 is disconnected from the cluster momentarily due to which the W5 gets timed out. This would mean that the a value of 5 is written to N2 but the write operation is not successful yet as it could not meet the required consistency level. Thereby N2 has the updated value while N1 and N3 have the older values. At a later time T2 (when N2 is back in cluster), the read can give different results based on the nodes which serve the reads.

Wait! Are you saying that we do not have strong consistency? No. We are still running strongly consistent, but there is a little non-determinism in the system. Consider Case 1 where the read goes to (N1,N3) and the value of 3 is returned after nodes N1 and N3 arrive at a consensus. It should be noted that the W5 is still in flight or in a timeout which is being handled by the client. In other words, the write is still in progress. Therefore it is semantically correct that the previous value 3 be returned.  In case 2, the read goes to (N1, N2). Here N2 has a more recent value (time stamp, remember?) and a consensus is reached with N1 before the value 5 is  returned. Now the subtlety - in this process, the W5 which was still in progress gets completed. Since W5 is now complete, the value returned (5),  is semantically the correct one. The read repairs shown by the purple lines happen asynchronously in both cases. So, despite the non-determinism we see in such cases, the semantics of consistency is still maintained and the conditions for strong consistency described earlier hold! 

To summarize, consistency in Cassandra is (a) different from that of transactional consistency (b) can be eventually or strongly consistent. A cliched conclusion to the post, I know, but I hope to have discussed some of the finer aspects that help understand what consistency means in Cassandra. Though some of these design decisions made in Cassandra incur additional effort for the developer, it keeps Cassandra simple and focused on its primary purpose - store and deliver data at blinding speeds. And trust me, it does that! 

Friday, February 17, 2012

The art of bus-riding

Today morning, a fellow student accidentally held on to my backpack to steady himself when the bus turned a corner. He looked up at me with a worried expression on his face and said apologetically "I'm so sorry, I didn't mean to.. I, I, I... was out of balance.." I looked up at him and smiled.

In the state where I hail from, school kids have a free-ride in government run buses - provided they are able to produce their
bus-pass within a micro-second to the bus-overseer (colloquially: 'kandectaar annan' - big brother conductor). As I was to learn in my undergraduate courses later, this was the speed at which early microprocessors used to operate. Therefore, keeping it folded in the wallet or inside the backpack is both infeasible and unacceptable. I'll avoid getting into the intricacies here, since carrying backpacks while riding a bus requires metier and is a topic requiring an intense treatment on its own. Losing the bus-pass is an expensive incident (as you don't get a replacement for that year) that earns severe rebuke from all affected parties. An exceptionally unfortunate situation is when teh kid realizes that he had lost his bus-pass when asked to produce on-board the bus. Elder co-passengers suddenly realize their social responsibility in educating the already shattered kid on the "art of carefulness".

Often, what clothing to wear when you ride the bus is a great dilemma, especially for the youth. In fact, it puts your morality, up-bringing and social-status at stake and determines where you end up in the social chart of your co-passengers. For instance, a young looking fellow by default is a good-for-nothing vagabond who apparently can have no respectable social status what-so-ever. A fellow with a neatly pressed shirt is tolerable and anyone with a ripped jean has no salvation, ever.

To top it up, bus-riding has a bunch of innate protocols associated with it. Any person who is occupying a double-seater is always required to offer the seat when he sees a couple standing. The decision to offer a seat when you see a couple is intricately complex and a direct confrontation of your morality. The worst part is, the correctness of your decision is never influenced by the ensuing behaviour of the couple. If they are married and abide by the socially accepted protocol of having non-mushy conversations, then your decision to offer the seat makes you earn a scornful look from your co-passengers. In case, the couple seem happy enough, you earn disdain. And if they turn out to be unmarried or just friends -  congratulations, you just got promoted to being an uncouth fellow and an object of sheer contempt for having allowed them to sit together.

With a billion people, buses are the life-line of public transportation in India and it is natural to expect them to be crowded. Bus-riding without jostling or stepping on someone is therefore a rare event and if you manage to get out of the bus after accidentally step on someone without having to put your all your known ancestors to shame, you should definitely buy yourself a lottery ticket that day. As a kid carrying a heavy backpack, my life was always in an unstable equilibrium when riding a bus. Enriching experiences have trained me not to worry too much about what goes on around you when in a bus.

But today, I looked up at him and smiled.

PS. Just in case you were wondering where I managed click this photograph, it was taken at our own Chennai a few years ago when I had been there visiting my friends.