Recently, a colleague emailed me the following:
Since you have been reading so much about clouds, CAP, and presumably lots of consensus things, you can answer better the question of algorithm scalability. How scalable are the popular algorithms? Can they do a reasonable job of consensus with 100,000 processes? Is this even a reasonable question? What are the fundamental problems, the algorithms or the lower level communication issues?
These are actually the right kind of questions to ask probing for deeper CS concepts. Here are my preliminary answers to these questions.
From what I read, consensus with 100K processes is really out of question. Paxos consensus was deployed on 5 nodes for GFS and similar systems: Zookeper, Megastore, etc. As another example Sinfonia's participant nodes in a transaction is also around limited to 5-10.
So what is wrong with algorithms, why are they unscalable? I guess one obstacle against scalability is the "online" processing requirement, and most algorithms are inherently limited because of that. When you accept offline processing, as in mapreduce tasks, then you can afford more scalability.
I think a more fundamental problem for scalability is the synchronization requirements in these algorithms. Synchronization is a deal breaker for scalability. The Ladis'08 summary has a very nice discussion related to this (see pages 4-5).
Our traditional complexity measures are for time complexity and message complexity. Time complexity is not much relevant for scalability, but message complexity may give some indication of synchronization complexity. A node sending a message to all other nodes is very bad for scalability (which is the case for both Paxos and Sinfonia). Workflows on the other hand have very little message complexity. They store data to disk (rather than requiring synchronizing several processes with messages), and any process may later visit this data if they need it. This decouples data from computation and enables more processing to be added as needed for scalability. Workflows tend to be more data-centric whereas algorithms tend to be more computation-centric. Here is a brief discussion of the workflow idea.
How does one formally define "synchronization point" that is not specific to a particular system/algorithm? I don't know of a good answer, but I think it is closer to a snapshot. Bloom also identifies synchronization points as culprits, and defines synchronization points as operations that are classified as non-monotonous in the Bloom declarative language.