Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB

This paper appeared in VLDB 2019.

Benchmarks are a necessary evil for database evaluation. Benchmarks often focus on narrow aspects and specific workloads, creating a misleading picture for broader real-world applications/workloads. However, for a quick comparative performance snapshot, they still remain a crucial tool.

Popular benchmarks like YCSB, designed for simple key-value operations, fall short in capturing MongoDB's features, including secondary indexes, flexible queries, complex aggregations, and even multi-statement multi-document ACID transactions (since version 4.0).

Standard RDBMS benchmarks haven’t been a good fit for MongoDB either, since they require normalized relational schema and SQL operations. Consider TPC-C, which simulates a commerce system with five types of transactions involving customers, orders, warehouses, districts, stock, and items represented with data in nine normalized tables. TPC-C requires specific relational schema and prescribed SQL statements.

Adapting TPC-C to MongoDB demands a delicate balancing act. While mimicking the familiar TPC-C workload and adhering to its ACID requirements is essential for maintaining the benchmark's value for those accustomed to it, significant modifications are necessary to account for MongoDB's unique data structures and query capabilities. This paper provides such an approach, creating a performance test suite that incorporates MongoDB best practices while remaining consistent with TPC-C's core principles.

To build this benchmark, the paper leverages Andy Pavlo's 2011 repository PyTPCC, a Python-based framework for running TPC-C benchmarks on NoSQL systems. While PyTpCC included an initial driver implementation for MongoDB, it lacked support for transactions since it was written for NoSQL systems of 2011. This paper addresses this gap by adding transaction capability to PyTPCC. The modified benchmark, available at https://github.com/mongodb-labs/py-tpcc, enables a detailed evaluation of MongoDB multidocument transactions over a single replicaset deployment.


Background

Please note that this evaluation focuses only on transactions over a single replicaset deployment using MongoDB 4.0 in 2019. In a previous post, we had reviewed the basics of transaction implementation across three different MongoDB deployments: single node WiredTiger, replicaset, and sharded cluster deployments. While MongoDB now supports general multi-document transactions across sharded deployments, that topic is not included in this paper.

MongoDB query language (MQL) does not map directly to SQL, but it supports a similar set of CRUD operations as shown in Table 1. MongoDB supports primary as well as secondary indexes and can speed up queries by looking up documents in indexes including returning full result from index alone (covered index queries). Indexes can be created on regular fields as well as fields embedded inside arrays.

MongoDB transactions provide the expected ACID guarantees which TPC-C requires for correctness. Specifically, they provide snapshot isolation guarantee. A single snapshot of the data is used for the duration of the transaction. A snapshot is a single point in time view of the data at a distinct cluster time maintained via a cluster-wide logical clock. Once a transaction begins with a snapshot at cluster time, no subsequent writes outside of that transaction's context occurring after that cluster time will be seen within the transaction. However, transactions will be able to view their own subsequent writes that occur after the snapshot’s cluster time, providing the "read your own writes" guarantee. Once a transaction starts, its snapshot view of the data is preserved until it either commits or aborts. When a transaction commits, all data changes made in the transaction are saved and made visible outside the transaction. When a transaction aborts, all data changes made in the transaction are discarded without ever becoming visible.

Within MongoDB transactions, readConcern is always set to "snapshot". Multi-document transactions in MongoDB are committed with "majority" writeConcern, which means two out of three nodes in the replicaset must commit all operations before acknowledgment.

As we discussed in our previous post on MongoDB transactions, while they are "OCC", thanks to the underlying WiredTiger holding the lock on first access, they are less prone to aborting than a pure OCC transaction. An in-progress transaction stops later writes (be it from other transactions or single writes) instead of getting aborted by them. In other words, transactions immediately obtain locks on documents being written, or abort if the lock cannot be obtained. This ensures that attempts by two transactions to write to the same document will immediately fail for the second transaction at which point it can choose to retry as is appropriate for the application.


Evaluation

The replicaset deployment used the default database configuration provided by MongoDB Atlas cloud offering. Performance is reported for M60 Atlas replica set with writeConcern "majority" for durability along with readConcern "snapshot" for most transactions, and committed reads equivalent (readConcern "majority", causal consistency true) for STOCK LEVEL transaction. Figure 1 shows transactions-per-minute-C (tpmC) values for varying number of warehouses and client thread counts. Using more warehouses result in reduced throughput I think due to the need to coordinate transactions across more documents. And remember we don't get to reap the benefits of sharding to the face of increased warehouses as this deployment is a single replicaset deployment. 

The original PyTPCC benchmark provided a normalized option which mirrored the RDBMS schema exactly, and a denormalized option which embedded all customer information (orders, order lines, history) into the customer document. This specific denormalization, however, is identified as an antipattern as it leads to unbounded growth and performance degradation. Following recommended MongoDB schema practices, the evaluation adopted a modified denormalized schema, maintaining a normalized structure for most data, and only embedding order lines within their respective order documents. This aligns with common document database best practices since frequent access to order lines together with orders justifies the embedding. Order lines within an order are fixed in number, preventing unbounded growth. Interestingly, this optimized denormalized schema resulted in a smaller data footprint compared to the fully normalized one, because redundant information in order lines was eliminated. Results in Figure 2 further highlights the benefits of this modified denormalization. The performance win comes from reducing the number of round trips to the database: you can think of it as pre-joining order lines into the orders table.


Several areas presented opportunities for further latency reduction. Streamlining queries and requesting only necessary fields help reduce data transfer and processing time. Inspecting the logs showed that transaction retries stemmed from performing extensive operations before encountering write conflicts.  Re-ordering write operations to expose write conflicts as early in the transaction as possible, as well as moving such writes before reads where possible helped address these inefficiencies. Several transactions followed a pattern of selecting and updating the same record. Using MongoDB's findAndModify operation reduced those two database interactions to one, and significantly improved performance as shown in Figure 3.





Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book