Wednesday, May 29, 2013

The functional illiteracy called Geography

Years ago, when I moved out of India, I was amazed to find that most conversations in the western world always begin with the weather, something that we never encounter in the subcontinent. If I were to say this now in India - Looks like it is going to rain today, the most compassionate remark I can hope to get is - Are you crazy ? Of course it will rain - it is monsoon now ! And there's good reason to it too. Unlike in the west, weather is totally predictable in India. During summer it is hot, during winter it is pleasant (cold in northern parts), during monsoon it rains. There you go. As simple as it comes. 

The obvious reason for weather to be a hot conversational material (despite the numerous weather apps in the smartphones that people checked N times a day) is because it works as a wonderfully polite, safe, universally approved, social-awkwardness substitute. Elevators, Subway trains, birthday parties,weddings, B-DUBS, you name it. Any imaginable awkward conversation can be handled effortlessly by progressively suggesting weather alternatives - starting off with a mild excitement, transcending to a reticent amusement, or a disinterested observation, moving on to an affable annoyance before finally settling down on explicit bitching. It works - always. 

Extending this thought, I have been consciously trying to experiment with geography as a possible conversational substitute for weather. Over the years, here are a few of the priceless nuggets that came up during the conversation, those which I managed to remember or document. I have tried to present them verbatim to retain their natural conversational flair.

Wait! Korea isn't an island ? I'd always thought it was.
Yeah. Too bad.

I need to take a vacation in the Andamans. That'd be cool. The Arabian sea, I've heard is beautiful. 
You mean the Bay of Bengal, right ?
Same difference.

Really? Hmm. What is the capital of Assam?
Guwahati.
Nope. Its Dispur. 
Dude! You cant make stuff up. There's even an IIT there !

Japan to the United States is a long flight. 
Why ?
What why ? They are at the far end of the maps - that's why.

Which direction does the moon rise?
West, of course. Wait - that's opposite to the Sun right ?

So, if I need to get to Alaska, I have to catch a flight or take a ship right ?
Or you could drive. 
How? Isn't Alaska north-west of Canada ?
Yup. But Canada has roads too.

In the month of January, watching a game of cricket played between South Africa and India..
Why does the commentator keep talking about summer season games? Isn't that like another 5 months away. Brr...rrrr...

If I go across Antarctica, I should reach the Arctic circle right ? No ?

Do you know how many states India has now ?
Now, now. You're just getting political.

You know, the Musi river flows right beside Osmania campus..
He he ! Yeah., right.

The spread of South Indian culture northwards would have happened if not for the Vindhya ranges..
I dont think so. I remember reading that Agasthya had pushed those mountains back into the earth a long long time ago.

Malaysians are good badminton players.
Of course, they come from the Chinese sub-continent.
You mean, south-east asian
No, no. Chinese.

You know, India is divided roughly into two halves by the tropic of cancer.
The tropic of what ?

There are many more, but despite being free, online acreage is to be used judiciously. Jokes aside - the one thing that I want you to remember as a take-away is the value of questioning information that were fed into us as facts. Sun rising in the east is a fact - very few questioned, why? Fewer still, understood. True to the fact that humanity evolves and solves bigger and smarter problems, it remains to ponder what we traded off. A common example is the use of GPS navigators, an epitome of human innovation - replacing human directional skills. Good? Bad ? Your call. 

May our kids be able to spell Geography.

InvalidKeyPair.NotFound EC2 Boto package



It is fair to say that I have been acquainted with Amazon AWS for the past couple of years for my research requirements. Actually it is more than fair - it has been my bread-and-butter environment for scripting experiments on cloud platforms and wide-area storage networks.  And boto has been my go-to api for AWS tasks given that most of my experiments evolve from ad-hoc python scripts that does a bit of this and that.

And yesterday, I started receiving "InvalidKeyPair" response exception from the API out of the blue (not entirely true since I had made a host of changes to my scripts), which suggested that my keypair doesn't exist. A simple ls said otherwise - I found the keypair well and truly alive with the right set of permissions as well.

Source:

EC2Connection.run_instances(image_id='ami-09b11233', key_name='awskeys.pem', instance_type='m1.small',security_groups='default'])

Exception:

boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
InvalidKeyPair.NotFound. The key pair 'awskeys.pem' does not exist

After almost running out of hair to pull and Googl'ing all abstract keywords, here is the solution:

EC2Connection.run_instances(image_id='ami-09b11233', key_name='awskeys', instance_type='m1.small',security_groups='default'])


Apparently, the key is just "awskeys" and not "awskeys.pem". I haven't yet figured out why this exception wasn't raised until now or performed a trace-back to see when this started becoming an issue, but wanted to make a note right away just in case this is useful to someone else too.

Issued in public interest to all AWS dependent developers. May you always hit the right price for your spot  instances. Amen !

~Titan.

Saturday, February 09, 2013

Eight years of writing

Yesterday, I opened my inbox to a pleasant shock and surprise. I had received a note from the Google analytics service which said that my blog had seen its 10000th visitor. Looking back, Is it an achievement? Well, Yes and No. No - because I didn't expect that many people to read what I wrote (and I'm thankful to them all !). Yes - because, I see the transition of Titan, the kid who had been awed by his exposure to a virtual world to the Titan, who is now realizing the value, virtue and vices of his virtual entity. And this blog has been one of the catalysts. As we say in tamizh movies, some elderly gentleman from the bygone era should have definitely said this (and I have no clue who he is) "A man's writing is a reflection of himself ". Actually, it doesn't matter if it hasn't been said so. It is at least largely true with most people that I have seen and known.

I happened to re-read my initial posts in this blog, which was written back in 2005. So many things have changed since then, so many things haven't. The most obvious is the content of the blog in itself. The transition from the half-broken language and muddled usage of words to a more clean and non-arbitrary diction in my writing (I recommend taking a look at my initial posts. I can assure you a 30 min episode worth laughter in it). But more interestingly, I found a significant change in the clarity of thought and in the structure of what I write. I do think (hope) there still is a fair bit of humor in my writing, but now, I also see the link between the words and the thoughts behind them. I'm guessing, this also means I have changed over time as a person. Maybe, maybe not.

Some of the complaints that I had been receiving from my readers (Yes! they do exist and they do reach out  to me by email too!)  is with the mix of technical (I mean, why not ! I'm a technology person still!) and non-technical posts in my blog. Some of them have out-right mentioned to me that those posts were boring, long, completely out of the blue and terribly uninteresting. Guilty as charged - but those posts are as much a part of me as my other posts are. And I'm sure you all know what I mean.

My writing has gained me many friends. When I was a kid, I remember the good old Amar chitra katha comics having a section called "Pen Pals". I used to wonder what it means. And now, I know. Some of my blogger friends (since I came to know them from the blogs) have been known to me for almost 8 years now, and I haven't met quite a few of them in person at all. But we have read our write-ups so many times over the years that, I think, if we happened to meet in the future, our conversation is likely to begin with a simple "Hey, What's up?". 

A common impediment that keeps many from writing on a public forum is the fear of saying something silly  and friends, relatives or prospective employers seeing it. I have seen many articles that spread this notion of being restrained or distancing your virtual avatar from yourself is a must. And to this end, I ask - if you are silly and write something silly and a person who you never meet in your life reads it, how does it matter? And if you do happen to meet him in the future, wouldn't he then realize that you are silly? I have found great satisfaction in my writing, and I write because it has helped me evolve. It has taught me to retrospect and constantly become a better person. I know many of my friends who would have loved to write, but haven't yet done so due to the fear of being judged. To them, I say - do write. Not for me, not for others, but for your own joy.

If that gentleman from the tamizh movies of the bygone era happens to stop by now and gives a talk about the virtues of writing, I'd pick up a coffee, switch off my cell-phone and listen.

Monday, July 30, 2012

Wabash Wisdom


At a fortunate moment of indecision, I decided to go all the way to Udupi, a restaurant 50 miles away from my dear WestLala, lured by the prospect of gobbling idlis and vadas (yes, you read the verbs right). As an after-effect of the above decision, I was forced to go for a jog in order to fit myself into my T-shirts again. In a certain sense, I'm happy that I decided to jog, since jogging along the northern parts of Wabash Heritage trail was one of my to-do things at West Lafayette. Just in case you are wondering, yes, I'm still looking for the second item in that list.  But let that be. For, as a fall out of my expedition today, I learnt many a thing that I'm postulating here as the dozen pearls of jogging wisdom


I learnt 
  • that Rajinikanth is right, as always. He once said "vazhkai oru vattam da" (roughly translates to 'circle of life')
  • that it takes at least 7 miles of running for your stomach to feel normal after hogging 11 medu vadas.
  • that less traversed under-the-bridge pathways are a popular destination for men with OAB, no matter what country you are in.
  • that 'The Road not taken' is an inspired poetry is a lie. Robert Frost had once taken a stroll along the N. 9th Street, Lafayette.
  • that all roads in West Lafayette lead to the Happy Hollow park.
  • why India does not compete in track and field events - we are just not meant to be athletes.
  • It is final.  The winner in the category "best music track for inspired running" is Bhag bhag DK Bose. Sanjay Subrahmanyan's Upacaharamu cheseva comes a distant second.
  • that fitness Apps in iPhone are not to be trusted for predicting distances. Ever. 
  • what Jenny meant when she said "Run Forrest, run."
  • that it is a good idea to take bath after running 9 miles . Apparently you turn into an effective rodent repellent after 7 miles.
  • deer in west lafayette is not a fable. I saw two by the US-52.
  • the meaning of life

Glossary:

WestLalaa rhyming pet name for West Lafayette, influenced by hours of watching innumerable tamizh songs with completely worthless lyrics.  
Bhag bhag DK Bose - hindi equivalent for encouraging a certain gentleman who goes by the name of D.K.Bose, to run. Really. Even Mr.Aamir Khan said so.
Upacaharamu cheseva - a carnatic composition by Saint Thyagaraja in the raga Bhairavi.
Rajinikanth - "      " intentionally left blank, undefined - the inventors of Boson particle at CERN are still working on this
Medu vadas - fried salted spiced lentil donuts. Salt - no sweet,.
US-52 - a very popular and landmark destination connecting the historic towns of Lafayette and West Lafayette across the might Wabash. All visitors of the Walmart definitely pass by this historic location.

A note to the posterity - this is the route that I ran today. May you find peace and prosperity and enough oil to light your lamps!

 
View Larger Map

Wednesday, June 20, 2012

Understanding Cassandra's consistency and conflicts

After a long time, here comes another technical entry into my blog. I have been playing around with Cassandra trying to understand it as a system and one of the things that had often come up in many forums is the difficulty in understanding Cassandra's consistency. In this post, I hope to consolidate what I inferred from various forums, solutions and problems extended by many others who have been working with Cassandra. I intend this post to be useful to someone who has already had a flavor of what Cassandra is about and is familiar with the fundamental concepts of Distributed Systems and data stores. If not, I strongly recommend skimming through some of the basics here before proceeding further with this post.

A different ACID

Consistency in Cassandra doesn't directly relate to the 'C' in ACID. Consistency in traditional database systems refers to transactional consistency which ensures the correctness of the state for a given DB transaction. Whereas, in Cassandra, consistency of data across its replicas. In fact, it would be simpler to view this as the consistency of data that can be observed by an external client or client side consistency.  The consistency that is observed within Cassandra cluster (which might be different from that observed by an external client) can be defined as server side consistency. Cassandra does not provide transactional consistency (across multiple reads/writes) which is traded off for the higher speed and scalability. If essential, transactions and transactional consistency has to be implemented by the client. 

No rollbacks! 

An important implication of the above fact is that a Cassandra cluster (or simply cluster) could have partial writes (or writes in progress) but would not provide a roll-back mechanism for any (potentially) failed operations. For example, consider a Cassandra cluster with 3 nodes (N1, N2, N3), a replication factor (RF) of 3 and Read-Write Consistency Level (CL) of 2. Consider a write to X  is initiated on nodes N1,N2 and node N1 fails while the write is in progress. The write to N2 would succeed, a timeout is reported to client, but the write on N2 is not rolled-back as would have been the case with traditional databases. Fixing this inconsistency by retrying the failed operation or any corrective mechanism is the responsibility of the client. I believe, the only case of a true failure reported by Cassandra is when not enough nodes in the cluster are live for a given operation.  

Eventually consistent

"So what you are saying is, Cassandra cannot provide any consistency guarantees what-so-ever." No - this is a common misconception of many people that I have observed in many a user forum. Cassandra is eventually consistent. Huh!? Okay. Let me put it this way. Cassandra can be as consistent as you want it to be. The condition for strong consistency is 

R + W > N, where 
N - Number of replicas
W - Number of nodes that need to agree for a successful write
R - Number of nodes that need to agree for a successful read

And if R + W <= N, we say that the cluster is configured to have weak consistency or is eventually consistent. For example, consider the Figure 1. The system has a Read consistency (R CL) of quorum and a Write consistency (W CL) of ANY ( at least 1) and is therefore said to be eventually consistent. Since writes can succeed with just one node,  (W3) write 3 to N1 at time T0 and  (W5) write 5 to N2 at time T1 can happen independently to the same variable X. However, we can see from right hand side of the figure that a read  (R CL = quorum) at time T2 can result in different values depending on the set of nodes (N1, N2) or (N1,N3) which are chosen to serve the read request. 



Okay, I see that there can be inconsistency. But will it always remain so? And the answer is No - thanks to the read repairs that happens on the background. In both the cases illustrated here the read repairs (shown in purple) will ensure that subsequent reads will have converged on the same value for X. And this is why we say that Cassandra is eventually consistent.

Strong Consistency

Alright, now what would happen if I were to have R + W > N. Let us consider the extreme case where W CL = ANY and R CL = ALL shown in Figure 2. In this case, for the read to succeed, all replicas need to be in agreement and therefore have to be consistent before we respond back to the client.


Conflict resolution

Hold on, how did you decide that 5 and not 3 is the correct value? I didn't, Cassandra did. To resolve conflicts, all columns in Cassandra has a time stamp associated with it. Since T1 > T0 in our example, 5 becomes a later write and is therefore assumed to be correct. It is therefore evident that the nodes in the Cassandra cluster need to be synchronized in their measure of time to be semantically correct.


I thought Cassandra used vector clocks, no? After going through a number of threads and forums, I realized that this is not true. Vector clocks and version vectors are popular methods used for conflict identification. However, Cassandra already employs a per column time stamp for resolving conflicts thereby obviating the need for a causal ordering that is provided by the vector clocks.


Okay, I have synchronized my clocks. But what if I have a truly concurrent write with the same time stamp? In the unlikely case that you precisely end up with two time stamps that match in its microsecond, you might end up with a bad version but Cassandra ensures that ties are consistently broken by comparing the byte values.

Partial writes

Done. I did my math and made my cluster strongly consistent. Am I safe? The answer is both yes and no. This subtle but interesting scenario comes up when we have failures, which brings in the notion of partial writes. Consider Figure 3 which is the same example as shown before, but at time T1, the node N2 is disconnected from the cluster momentarily due to which the W5 gets timed out. This would mean that the a value of 5 is written to N2 but the write operation is not successful yet as it could not meet the required consistency level. Thereby N2 has the updated value while N1 and N3 have the older values. At a later time T2 (when N2 is back in cluster), the read can give different results based on the nodes which serve the reads.


Wait! Are you saying that we do not have strong consistency? No. We are still running strongly consistent, but there is a little non-determinism in the system. Consider Case 1 where the read goes to (N1,N3) and the value of 3 is returned after nodes N1 and N3 arrive at a consensus. It should be noted that the W5 is still in flight or in a timeout which is being handled by the client. In other words, the write is still in progress. Therefore it is semantically correct that the previous value 3 be returned.  In case 2, the read goes to (N1, N2). Here N2 has a more recent value (time stamp, remember?) and a consensus is reached with N1 before the value 5 is  returned. Now the subtlety - in this process, the W5 which was still in progress gets completed. Since W5 is now complete, the value returned (5),  is semantically the correct one. The read repairs shown by the purple lines happen asynchronously in both cases. So, despite the non-determinism we see in such cases, the semantics of consistency is still maintained and the conditions for strong consistency described earlier hold! 

To summarize, consistency in Cassandra is (a) different from that of transactional consistency (b) can be eventually or strongly consistent. A cliched conclusion to the post, I know, but I hope to have discussed some of the finer aspects that help understand what consistency means in Cassandra. Though some of these design decisions made in Cassandra incur additional effort for the developer, it keeps Cassandra simple and focused on its primary purpose - store and deliver data at blinding speeds. And trust me, it does that!