Solving the “One Concurrency Control Does Not Fit All” Problem for OLTP Databases

By Dixin Tang and Aaron J. Elmore, University of Chicago

In this post, we present a new transactional database system that adaptively changes data organization and concurrency control protocols in response to workload changes.

With the increasing memory sizes of modern servers, many OLTP databases can entirely reside in the main memory of a single machine. With the elimination of disk stalls and the increasing number of CPU cores available in a single server, however, concurrency control is now one of the major performance bottlenecks for OLTP database systems.

Recent effort focuses on optimizing concurrency control protocols to achieve higher than ever throughput. However, these protocols are typically optimized for specific workload characteristics, such as being read-heavy or highly-skewed, and can suffer when faced with diverse workloadsa situation that can arise due to shifting or unknown access patterns. Therefore, the performance of a database using a single static concurrency control protocol may suffer in the presence of dynamic, unknown, or mixed workloads. This results in many OLTP applications suffering from reduced throughput or increased transaction latency. We have thus created Adaptive Concurrency Control (ACC), a main-memory database system for many-core machines that addresses the performance problem, and share results from some early experiments

At CIDR 2017, we presented our paper “Adaptive Concurrency Control: Despite the Looking Glass, One Concurrency Control Does Not Fit All,” our vision for addressing this problem. Here, we propose ACC as a holistic solution for varied and dynamic workloads for main memory OLTP databases. ACC automatically partitions the database into clusters and assigns CPU cores for each cluster according to its load statistics; for each cluster, ACC adaptively chooses a concurrency control protocol according to its workload characteristics.

ACC Overview

Figure 1. Sample Configuration of Adaptive Concurrency Control (ACC)

Figure 1. Sample Configuration of Adaptive Concurrency Control (ACC)

ACC partitions database records and their primary indices into clusters to minimize cross-cluster transactions. For each cluster, we collect workload statistics to determine the number of cores and assign a specific protocol to the cluster. Learned models predict the optimal protocol, and are composed of cascading binary classifiers, which are trained offline. ACC currently supports three candidate protocols: OCC based on Silo, 2PL based on VLL, and partition-based concurrency control protocol (PartCC) from H-Store.

Figure 1 shows a sample configuration of ACC. A sample database is partitioned into four clusters: the first cluster is assigned OCC and two cores, the second and the third clusters are assigned one core and PartCC respectively, and the last cluster is assigned 2PL and three cores.

To enable routing record access (i.e., read/write a record via primary key) to the correct cluster, we let all cores share a cluster routing table that stores the mapping from primary keys to cluster numbers. Secondary indices include mapping from secondary keys to primary keys; if they are aligned to the partition of primary indices, we partition them in the same way as primary indices, otherwise, we let them be shared by all cores and assign a dedicated protocol (e.g., OCC) to process any access to them.

Dataset Clustering

Dataset clustering in ACC requires partitioning both database records and CPU cores at the same time. We propose a Partition-Merge method to generate cluster layouts and assign cores. For the partition step, we partition the records into N clusters, where N is the number of cores available using existing partitioning algorithms. Next, we merge clusters to ensure that (i) the percentage of transaction operations accessing a single cluster (denoted as utilization) should be higher than a threshold (i.e., maintaining load-balance) and (ii) the cost of cross-cluster access should be lower than a threshold (i.e., minimizing cross-cluster transactions).

Specifically, we first compute the utilization of each cluster and then repeat the following step: for clusters having utilization lower than the threshold, we merge the two clusters with the lowest utilization. The merge process ends when there is no cluster or only one cluster with the utilization lower than the threshold left. If there is one such cluster left, we merge it with the cluster having highest utilization. Next, we compute the cross-cluster access cost for each pair of clusters. The cost is defined as the ratio of cross-cluster access to the total records co-access within the two clusters. We repeatedly merge all the pairs of clusters whose cross-cluster access cost is higher than the predefined threshold, which completes the merge step. Finally, we assign each cluster with the number of cores proportional to their utilization.

Workload Modeling

Figure 2: Basic idea of record contention extraction.

Figure 2: Basic idea of record contention extraction.

We model a workload by extracting its features that affect the comparative performance of candidate protocols. For the current three candidate protocols, we design and track four key features:

  • the ratio of read operations per transaction (ReadRatio),
  • the average number of operations issued by a transaction (TransLen),
  • the possibility of concurrent transactions reading/writing the same record (RecContention), and
  • the cost of cross-cluster transactions (CrossCost).

The first three features model record conflicts of concurrent transactions in a workload and are critical factors for the performance of PartCC, OCC, and 2PL. The fourth feature determines the applicability of PartCC.

ACC uses a central coordinator to manage the protocol selection process. Each transaction worker concurrently collects workload statistics and reports them to a central coordinator periodically. The coordinator computes features of the current workload for each cluster and uses the predictive model to determine which protocol should be used for this cluster.

Centralized feature computation may be a bottleneck under high core count, which delays the response to workload variation. Therefore, we move part of feature computation into concurrent workers. We propose a uniform feature extraction for all workers, which may execute different protocols. The basic idea is to emulate a uniform and separate process for all workers using sampled transactions or operations. For example, Figure 2 shows our basic idea of extracting RecContention. Here, each worker samples operations and repeats mark and detect phases to estimate contention. In the mark phase, workers mark the records accessed by the sampled operations and in the detect phase, they check how many other workers have marked the records of the sampled operations to estimate the contention.

Mixed Concurrency Control

Figure 3: Mixing PartCC, 2PL, and OCC

Figure 3: Mixing PartCC, 2PL, and OCC

Mixing concurrency control protocols can take advantage of candidate protocols, but can suffer from the overhead of coordinating conflicts across protocols. The cause of such overhead arises when one record can be accessed by different protocols. Therefore, we propose a data-oriented mixed concurrency control (DomCC), which lets a single protocol process all concurrent read/write operations to a portion of records (a cluster). For example, if a transaction that dispatched to a cluster running OCC accesses a record on a cluster managed by 2PL, DomCC will use 2PL to process this record access rather than OCC.

Figure 3 shows how DomCC mixes PartCC, 2PL, and OCC. After a transaction is issued, DomCC enters the Preprocess phase, where it checks all the clusters the transaction will access. For all clusters managed by PartCC, DomCC needs to acquire their partition locks. Afterwards, DomCC begins the Execution phase, where DomCC executes the transaction logic. When the transaction accesses a record, DomCC uses the protocol managing the record to process it. For example, if OCC manages a record, the transaction reads the timestamp and the value of this record using the logic of OCC; if 2PL manages the record, the transaction acquires a lock before reading or writing the record. Next, DomCC executes OCC’s validation phase to validate the records managed by OCC. Finally in Commit phase, DomCC applies all writes and releases locks. For example, DomCC needs to release partition locks for PartCC, record locks for 2PL, and write locks for OCC.

Experiments

We have developed a prototype of several ACC components based on Doppel, an open-source main-memory OLTP database. Transaction workers are extended to extract features from the current workload and report them to a central coordinator. The coordinator automatically selects and switches protocols online. Our prototype currently supports mixing PartCC of H-Store, OCC from Silo, and SS2PL No-Wait based on VLL. We compare ACC to single protocols and a hybrid approach of 2PL and OCC (denoted as Hybrid), which uses locks to protect highly conflicted records and uses validation for the other records. We run TPC-C with 32 warehouses to test the benefits of mixed concurrency control and the adaptivity of ACC.

All experiments are run on a single server with four NUMA nodes, each of which has an 8-core Intel Xeon E7-4830 processor (2.13 GHz), 64 GB of DRAM and 24 MB of shared L3 cache, yielding 32 physical cores and 256 GB of DRAM in total. We initially partition TPC-C by warehouse ID.

Mixed Concurrency Control

Figure 4. Test of mixed concurrency control

Figure 4. Test of mixed concurrency control

We first evaluate the effect of mixed concurrency control under workloads that have a mix of partitionable and non-partitionable access patterns. Specifically, we use workload of 100% NewOrder transactions with 16 warehouses having 100% cross-partition transactions and the other 16 warehouses well partitionable (i.e., no cross-partition transactions). In this case, ACC merges the first 16 warehouses into a single cluster and makes each of other warehouses a single cluster. We start with this workload and for the 16 warehouses with no cross-partition transactions, we increase the percentage of cross-cluster transactions from 0% to 50% to introduce transactions spanning multiple clusters and potentially multiple protocols.

At first, ACC mixes PartCC and 2PL by using PartCC to process well-partitionable workload and 2PL to process the rest of the highly conflicted and non-partitionable workload. With more cross-cluster transactions introduced partition conflicts increase; thus, ACC adopts 2PL for the whole system. Note that although ACC uses 2PL, it has slightly higher throughput than 2PL because it leverages the partial partitionability to avoid some conflicts. For example, lock contention across cores is reduced due to partitioning.

Adaptivity

Figure 5: Adaptivity Test of ACC

Figure 5: Adaptivity Test of ACC

We then evaluate ACC’s adaptivity in response to selecting a single protocol according to workload variation. If PartCC is used, the whole store is partitioned such that each CPU core owns a partition (i.e., a cluster). If OCC or 2PL is used, the partitioned stores are merged back into a single cluster (i.e., a shared store). We use three workloads each lasting 15 seconds: the first one includes a non-partitionable and low-conflict workload; then, ACC is switched to a partitionable and high-conflict workload; finally, ACC is changed back to a non-partitionable workload. For the workloads with low and high record conflicts, we use read-only transactions (e.g., OrderStatus in TPC-C) and write-intensive transactions (e.g., NewOrder and Payment Mix in TPC-C), respectively. To generate non-partitionable workloads, we use 100% cross-partition transactions and skew partition access distribution (i.e., Zipf with theta = 0.3). For all tests, record access distribution is Zipf with theta = 1.5.

Figure 5 shows how overall throughput evolves for TPC-C. We see that ACC can adaptively choose the optimal protocol for the three workloads. Specifically, ACC starts with OCC. When the workload becomes well-partitionable and has high record conflicts, ACC switches from OCC to PartCC, where it needs to partition the whole store first. ACC continues to process transaction requests during the partitioning and switches to PartCC when partitioning is done. Then, more partition conflicts are introduced in the workload; thus, ACC merges the partitioned store and switches from PartCC to 2PL. The dip in performance for ACC during workload shifts is due to lag while using the prior protocol and a short period where workers reorganize clusters and indices. The throughput improvement of ACC over PartCC, OCC, and 2PL can be up to 4.5x, 3.5x and 2.3x respectively, and ACC can achieve at least 95% throughput of the optimal protocol when it uses this protocol, demonstrating that the overhead of ACC is minimal.

Conclusion

We introduce Adaptive Concurrency Control (ACC), a main-memory database system for many-core machines that supports clustering a data store and dynamically selecting concurrency control protocols for each data cluster. We have outlined the design of data clustering, workload modeling, and mixing protocols, and have evaluated a prototype demonstrating preliminary results. We believe our initial results are promising.  Look for future publications from our group that explore issues related to adaptively mixing concurrency control within a single system!

This entry was posted in Big Data Architecture, Data Management, DBMS, ISTC for Big Data Blog and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


× 6 = forty two