Wednesday, February 26, 2014

Energy approach to life, the universe, and everything

I recently read Scott Adams new book titled: "How to fail at almost everything and still win big". I enjoyed reading the book a lot; Scott discusses a lot of intriguing ideas as usual. The main idea in the book was that you need systems instead of goals. Motivation is ephemeral, but systems/processes are persistent. As Jeff Bezos says "Good intentions don't work, good systems work!"

Another main idea in the book is the importance of monitoring and managing your energy for being successful. The book advises you to "watch what you eat, exercise, match mental states to activity and attack tasks when you have the appropriate energy level".

I think this energy approach to personal productivity and life is promising. So I wanted to collect my thoughts on this and see what I can add on this topic.

Things that increase energy and decrease energy

It isn't the mountains ahead to climb that wears you down.
It is the pebble in your shoe.
– Muhammad Ali
This quote from Muhammad Ali is extremely insightful, and nicely summarizes everything that needs to be said on this topic. Here are my observations on what energizes me and drains my energy.

small victoriesbig victories
tea/healthy foodeating a lot
happy moodbad mood
waking up earlywaking up late
reading a good bookbrowsing junk/news

Conserving energy

To keep your energy level high, you need to find tricks to conserve energy. Instilling useful "habits" is a great trick to conserve energy. When you make something a habit, you don't need to waste your energy for remembering to do it and more importantly for finding the willpower to do it. Habits make inertia work for you. The key to instilling habits is to start with baby steps. See tiny habits by Dr. Fogg to learn more.

Keeping things simple in your life helps conserve energy. Being an organized person (Getting Things Done) avoids ongoing chaos in your life, and conserves energy. I use Emacs Org-Mode to track all my tasks so I can clear my mind. Even by cleaning/organizing your room office you can notice a boost in your energy.

On this tangent, being a principled/virtuous/religious person can help a lot for success. Once you make your decision and commit to it, the temptations you need to fight become less, you can ignore a lot of distractions because they are out of bounds (designated as sin) for you. This is a very pragmatic (very Benjamin Franklin) way to approach the issue of character/virtue/religion, but there it is.

Growing your energy potential

Maintaining high energy levels is good, but you know what is better: Growing your energy potential.

The antifragility book tells us that to organic things grow by exploring and pushing/straining its limits occasionally. Related to this topic is the "Growth mindset" concept, which has been put forth by the Stanford psychiatrist Carol Dweck (I highly recommend her book). Growth mindset describes an antifragile approach to personality and life. Growth mindset people like challenges and failures, as these would make them learn, improve, and grow.

Following this line of thought, in order to grow your energy potential, you need to strain it and totally deplete it from time to time. Occasionally, you should take on more than you can handle, exercise vigorously and push your physical limits, pull some all-nighters working on a project, and fail at some of your projects. Pushing your limits and failing is good for you. It will make you grow. If nothing else, it will teach you some humbleness and throw you out of the fixed mindset you may have acquired (e.g., I know it all, I have it all, I am naturally successful). You need some occasional resets to be able to experience the beginner's mind.

Focusing/intensifying your energy

It is also important to learn how to control and focus your energy to get things done. I doubt there is an easy or universal way to achieve this.

I think intensifying your emotions helps for focusing your energy. Being curious and asking questions focuses your energy, because you will really want to get answers to your unresolved questions. Being very determined and motivated (you need to find ways to motivate yourself) will help in focusing your energy. Finally, even anger helps. I occasionally argue and pick fights with papers/books, in order to focus my energy to better understand them.

Wednesday, February 12, 2014

Consistency-Based Service Level Agreements for Cloud Storage

This paper is from Microsoft Research and appeared in SOSP'13. This paper is more of a position and vision paper. The paper introduces a consistency-based SLA concept, which can provide a win-win arrangement for cloud providers and developers.


For performing reads from key-value stores, currently you have two options. You can do strongly-consistent reads by increasing the size of your read replica quorum, but this will increase latency of the responses, and you don't get the flexibility to revert to a quick dirty (eventually-consistent) read if a strong-consistent read would take a long time to respond. Or you go with best effort reads (which are eventually-consistent) from the key value store because you insist on low-latency answers.

(Actually there is another option: You can use a strongly consistent multiversion data store like BigTable or Spanner, and relax it by reading slightly stale data and get flexibility. I will revisit this option in the discussion.)

| Rank | Consistency | Latency | Utility |
|   1.    | strong         | 150 ms  |     1.0  |
|   2.    | eventual      | 150 ms  |     0.5  |
|   3.    | strong         | 500 ms  |    0.25 |

Enter the consistency-based SLA concept. The SLA acts as an interface between the application developer and the inners of the cloud. The developer provides a wishlist for their get (i.e., read) operations from the key-value store as above. Here the developer says "I want a reply in under 150 ms and prefer strongly consistent data but will accept any data; if no data can be obtained quickly then I am willing to wait up to 500ms for up-to-date data." The cloud-provider backend is structured such that it keeps track of which of these reads is feasible currently for that location and it satisfies the highest ranked one it can in order to give the best utility to the developer.

Using such an SLA makes good business sense. With this SLA the developers put their money where their mouth is. They agree to pay more for better utility provided to them. The cloud-providers can use the SLAs to prioritize the read requests: they can give more priority to consistency requiring higher paying (higher utility) customers.

To illustrate, Figure 3 shows some read latencies at a given point from given locations. The developer does not have access to all per region or per client latencies like this, but in the SLA she can state her ranked preferences for latency and consistency of the reads she thinks would make most sense for her application, and through this interface she has access to dynamic tuning of performance of her application.

Pileus Architecture:

To showcase the SLA, the authors developed a replicated key-value store called Pileus. Pileus is a type of cloud formation, it is a cap cloud. (Get it? A "CAP" cloud.) Pileus dynamically selects which servers to access in order to deliver the best service given the current configuration and system conditions.

Some storage nodes are designated as primary nodes, which hold the master data, while others are secondary nodes. All Puts (i.e., writes) in Pileus are performed and strictly ordered at a primary site. Secondary nodes eventually receive from the primary core all the updated objects along with their update timestamps. Since all Put operations are assigned increasing update timestamps from the primary site and the asyncronous replication protocol transfers updated objects in timestamp order, at any point in time, each secondary node has received a prefix of the overall sequence of Put operations.

When selecting the node to which a Get operation should be sent, the desired consistency guarantee, along with the previous object versions that have been read or written in the current session and the key being read, determines the minimum acceptable read timestamp. The minimum acceptable read timestamp indicates how far a secondary node can lag behind the primary and still provide an answer to the given Get operation with the desired consistency. This is being decided by the client library of Pileus.

This architecture forces all the writes to be performed on a single primary core limits the problem space, and simplifies things for ensuring consistency for the reads in the consistency-spectrum. But this also limits the performance on reads (except for eventual-consistency reads). Moreover, with this setup you don't get to specify latency bounds for writes.

Evaluation results show that consistency-based SLAs can indeed improve application-specific levels of service (i.e., utility).


Q: How rich is the class of applications that benefit from this SLA?

A: I am still confused about this. It sounds like this can be applicable to a large class of applications, but sometimes I revert to thinking maybe not that big.

For latency-favoring (eventual-consistency happy) applications there are existing solutions: DynamoDB, and several key-value stores. And the target applications are those that tolerate relaxed consistency but, nevertheless, benefit from improved consistency. It may seem that these are already served to some extent by the eventual-consistent key-value stores. They are just best effort. You don't know what you get, but fresher more consistent data improves service the same as in Pileus. Pileus gives you tuned performance, but maybe you could have gotten that performance by probabilistic means also. (Peter Bailis has a very nice work on probabilistically bounded staleness, which is also a related approach here.)

For consistency-favoring applications, there are existing solutions like Bigtable, Spanner. And you can still do a quick dirty read from Spanner, by giving a slightly past read timestamp. This works because Spanner is a multiversion key-value store. But I guess you still need to manage when you would want to revert to the quick dirty reads.

Q: How does Pileus change the application code?

A: Yes we learn from API when we get back a consistent read and when not, but reacting on the type of reads may lead to polluting my program with a lot of branches and checks. Maybe programming languages people may have an answer to that. I guess, this way is still better than monitoring for latencies and implement these tuning in your application.

Saturday, February 8, 2014

The scalable commutativity rule: Designing scalable software for multicore processors

This paper was one of the best paper's at SOSP 13. The paper is open access since SOSP paid for the open access fees for each paper that appeared in the conference.

The scalable commutativity rule says that "Whenever interface operations commute, they can be implemented in a way that scales". In other words, whenever several operations commute at the interface-level (i.e., there's no way to distinguish their execution order using the interface), for those operations we can have implementations whose memory accesses are conflict-free.

(Scalable commutativity sounds like a misnomer, shouldn't the theorem be called "commutative scalability" theorem? The end goal is scalability, and commutativity is a tool for it. Thus "commutative" should qualify "scalability", not the other way around.)

I found several nice generalizable insights in the paper.

The first one is using commutativity as a detectable/measurable and controllable witness for concurrency. Concurrency is an abstract and specification-dependent concept. If you disregard safety (which is defined by the specification), you can boost concurrency easily by parallelizing aggressively, and you end up with a highly scalable system full of race conditions and incorrect/unsafe operations. It is hard to measure "safe concurrency" and boost it. By way of contrast, commutativity is well defined and easier to measure and control. If the outcome doesn't change when you change the order of operations then the order is not important and that means you don't need to lock anything and you can find a lock-free/wait-free/coordination-free implementation.

The second insight is to pose the rule as a "possibility result" to guide the implementation process. It is clear that the paper is motivated by practical problems and from the ground up. These guys are Linux kernel hackers, they are trying to modify Linux to run on multicore machines in a more efficient/scalable manner. As part of their work to make Linux more scalable, they work at the trenches and develop a lot of tricks and hacks. But they didn't know when to quit and when to persist: "Before the rule, we tried to determine if these operations could scale by analyzing all of the implementations we could think of. This process was difficult, unguided, and itself did not scale to complex interfaces, which motivated our goal of reasoning about scalability in terms of interfaces." When operations commute at the specification level, this rule tells them that they should persist because scalable implementations exist. When operations don't commute, this rule tells them to quit because a scalable implementation may not exist with this specifications.

Finally this rule can be used to guide the design as well. When a scalable implementation may not be available with the current specifications of the operations (i.e., the operations do not commute at the interface level), the developers may consider changing the operations slightly to find a design where operations commute (where it is guaranteed to find scalable implementations). To make the operations commute, the operations may be designed to use different parameters or relax the guarantees on the returned result. For example, in the case of a file operation, don't return the smallest unused file descriptor FD, but return an unused FD. If you change the specification this way, then you prevent the contention on the smallest unused FD. In a sense this is moving away from a queue semantic (which is order sensitive) to a set semantic (which is order insensitive, a.k.a. commutative). The essence of trick has been used to modify operations to commute and achieve conflict-free implementations.

The paper is a tour de force. The authors developed a tool called COMMUTER that accepts high-level interface models and generates tests of operations that commute and hence could scale. The tool uses a combination of symbolic and concolic execution, and generates test cases for an arbitrary implementation based on a model of that implementation's interface.

"First, ANALYZER takes a symbolic model of an interface and computes precise conditions under which that interface’s operations commute. Second, TESTGEN takes these conditions and generates concrete test cases of sets of operations that commute according to the interface model, and thus should have a conflict-free implementation according to the commutativity rule. Third, MTRACE checks whether a particular implementation is conflict-free for each test case."

The evaluation of the paper is also exhaustive. They modeled several POSIX file system and virtual memory calls in COMMUTER, then used this both to evaluate Linux's scalability and to develop a scalable file and virtual memory system for their sv6 research kernel. I loved Figure 6, it shows a lot of things without overwhelming the viewer.

To show how the results in Figure 6 translate to scalability on real hardware they evaluated with some microbenchmarks and a mail server application.


Question: The scalable commutativity rule says: If interface operations commute, then they can be implemented in a way that scales. Is there a reason why you don't claim the inverse: If interface operations don't commute, then they cannot be implemented in a way that scales. If the inverse is true, then this rule will have stronger implications for interface design.

A: The paper avoids this question and doesn't have an answer to this. My guess is the inverse would indicate only a relative unscalability, not an absolute one. And it is hard to quantify the relative unscalability. But the original theorem is clean, it provides an absolute result: an implementation with conflict-free memory accesses, i.e., full-scalability is possible.

Question: Is this rule is applicable to the general distributed systems and not just multicore systems?

A: The paper claims this is the case in the related work but doesn't elaborate more on that. Yes, this should work but there are assumptions. The operation should have invocation and response parts. The high level specification of the operations should make it clear what the operation does so you can perform the commutativity tests.

Question: What are the tradeoffs when coming up with variants of operations to make them commute?

A: Instead of a big operation, you may divide it into two small operations to improve commutativity. But did this make the job of the developers harder. Well maybe, but it is worth it if you made it conflict-free on multicores. And now instead of crossing the kernel boundary once, you may be crossing the kernel boundary twice because you have replaced one operation with two operations. So you pay overhead, but it is OK if you had improved the commutativity and scalability of the operations.


Martin Vechev emailed me about my first question, and he had this to say: " we had an earlier POPL'11 paper that shows one part of the inverse question you are asking, that if you have strong non-commutativity you need expensive instructions:
It was also profiled in Linux Weekly:"

Sunday, February 2, 2014

Black sheep

"Publish or Perish" has become a motto for academics, but it is hard not to get depressed about this rat race in academia. I see researchers who have published an implausible amount of papers in good venues, but they still remain obscure. With all those papers, they haven't even made a splash in their field of study. I then ask myself what chance I stand for making a big enough contribution to be useful. Then, I proceed to have my semi-annual existential crisis, and question the point of publishing and my career in academia. I recover in a couple days generally, because I like working on problems and I am curious about stuff. But, these questions loom: Am I wasting my time? Am I irrelevant with my publications? How should I allocate/manage my time to be useful?

Publish less? 

It is simplistic to say "It is the quality that counts. If you publish less you will publish better quality work and get recognition". Experience refutes this. If you strive to publish less, and hold your self to imaginary/unattainable standards, it will actually harm your impact. Feynman talks about the "Nobel Prize effect" in his memoir. Hamming describes the Nobel Prize effect as follows:

The day the prize was announced we all assembled in Arnold Auditorium; all three winners got up and made speeches. The third one, Brattain, practically with tears in his eyes, said, "I know about this Nobel-Prize effect and I am not going to let it affect me; I am going to remain good old Walter Brattain." Well I said to myself, "That is nice." But in a few weeks I saw it was affecting him. Now he could only work on great problems.
When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren't good afterwards, but they were superb before they got there and were only good afterwards.

Baa baa black sheep

So publishing less is bad, and publishing more does not guarantee you make an impact. Then what is a good heuristic to adopt to be useful and to have an impact?

I suggest that the rule is to "be daring, original, and bold". We certainly need more of that in the academia. The academia moves more like a herd, there are flocks here and there mowing the grass together. And staying with the herd is a conservative strategy. That way you avoid becoming an outlier, and it is easier to publish and get funded because you don't need justify/defend your research direction; it is already accepted as a safe research direction by your community. (NSF will ask to see intellectual novelty in proposals, but the NSF panel reviewers will be unhappy if a proposal is out in left field and is attempting to break new ground. They will find a way to reject the proposal unless a panelist champions the proposal and challenges the other reviewers about their concocted reasons for rejecting. As a result, it is rare to see a proposal that suggests truly original/interesting ideas and directions.)

To break new ground, we need more mavericks that leave the herd and explore new territory in the jungle. Looking at the most influential names in my field of study, distributed systems, I see that Lamport, Dijkstra, Lynch, Liskov were all black sheep.

Of course, when you are daring and original, you will be wrong half (?) the time. But being wrong teaches you something, and this insight will help you to have a breakthrough eventually. This is the growth mindset thinking. (I am reading the book "Mindset: The New Psychology of Success", and it is awesome. I recommend it for everyone.)

Moreover, when you are onto something really good, others will not get or accept what you are doing for a while. That's the price you pay for daring to be original. You can read several accounts of amusing paper rejections Lamport received here. However, I still think it is better to be ignored temporarily than remain unoriginal permanently. If you are not original, you are not truly contributing. Being daring and original is the only way you have for making a genuine contribution and having impact.
Don't worry about people stealing an idea. If it's original, you will have to ram it down their throats.
Howard H. Aiken

The interesting thing is, to be the black sheep, you don't need to put on an act. If you have a no bullshit attitude about research and don't take fashionable research ideas/directions just by face value, you will soon become the black sheep. But, as Feynman used to say "what do you care what other people think?". Ignore everybody, work on what you think is the right thing.

PS: Coincidentally Daniel Lemire has recently discussed about the black sheep phenomena in "To be smarter, try being crazier?"

Two-phase commit and beyond

In this post, we model and explore the two-phase commit protocol using TLA+. The two-phase commit protocol is practical and is used in man...