tag:blogger.com,1999:blog-8436330762136344379.post7821757629402109672..comments2024-03-26T06:02:24.273-04:00Comments on Metadata: Modular Composition of Coordination ServicesMurathttp://www.blogger.com/profile/07842046940394980130noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-8436330762136344379.post-32801268491807401962016-07-01T14:46:09.621-04:002016-07-01T14:46:09.621-04:00Thanks for your interest in our paper! We’re excit...Thanks for your interest in our paper! We’re excited to see the multi-dc deployment question addressed in more research and looking forward to your paper.<br /><br />><i>Need for serialization</i><br />We tried to provide the semantics of ZooKeeper across datacenters. It may be that for some applications ZooKeeper’s semantics are not necessary. But there are others, where clients should see each other’s updates in a consistent manner, even if they are in different data-centers, which is what our solution provides.<br /><br />><i>What if both syncs occur at the same time?</i> <br />Each sync is invoked independently by its respective client and in this example sent to a different zookeeper instance. If both syncs occur after the writes complete, the reads will both see the written value due to linearizability. <br /><br />><i>Performance bug in ZooKeeper</i><br />We explain this briefly at the end of Section 4.1.1 but perhaps should have elaborated more. In the “usual way” ZooKeeper is deployed across WAN, i.e., voters in one DC and an observer in another, all traffic from the remote DC goes through the observer. Observers experienced the performance issue we found, and hence the improvement of our solution over ZooKeeper was extremely high. We didn’t want to present numbers which are boosted by what we perceived to be a bug in ZooKeeper, so we first fixed the bug, and then compared our solution to ZooKeeper, showing that the improvement is still 7x. <br /><br />><i>2-site ZooKeeper deployment.</i><br />We evaluated the solution with 5 DCs, and the improvement was even better. This isn’t in the paper, you’re right.<br /><br />><i>Reads are asynchronously pipelined to compensate for the latency introduced by the sync operation. </i><br />Asynchronous operations is the usual way ZooKeeper is used. There is no reason to wait for completion after each op unless the application requires it. Nevertheless, a large part of the evaluation focuses on the overhead of syncs.<br /><br />><i>Benchmark the throughput when the system is saturated.</i><br />This is the way ZooKeeper was benchmarked in the original ZooKeeper paper, which allowed us to have a comparison.<br /><br />><i>ZooNet does not support transactions and watches.</i><br />The only form of transactions in ZooKeeper currently is a multi update operation and indeed, we don’t support multi updates which mix updates to different data-centers. It would be interesting to solve this, in particular if there’s a compelling use-case.<br /><br />Wrt watches, each client maintains a session with all the zookeeper clusters it operates on, including watches.<br /><br /><br />Finally, to simplify adoption, we wanted a solution that’s applicable to ZooKeeper with minimal changes. Our solution doesn’t require any server-side changes to ZooKeeper (besides the isolation fix, which we already contributed to ZooKeeper).Anonymoushttps://www.blogger.com/profile/01920400352275082831noreply@blogger.com