OLTP Database Systems for Non-Volatile Memory

By Joy ArulrajJustin DebrabantAndrew PavloMichael StonebrakerStan Zdonik, and Subramanya Dulloor

In this joint collaboration between BrownCMUMIT CSAIL and Intel Labs, we explore two possible use cases of Non-Volatile Memory (NVM) for on-line transaction processing (OLTP) DBMSs.

Evaluation of software systems using NVM is challenging due to lack of hardware. In this study, we use a NVM hardware emulator developed by Intel Labs that uses special CPU microcode and an emulation model for latency emulation. A programmable bandwidth throttling feature in the memory controller performs bandwidth emulation. Using a hardware-based NVM emulator allows us to run existing DBMSs in an NVM environment without needing to change any source code.

The emulator supports two interfaces that can be used by applications accessing NVM. Applications can allocate and access memory using libnuma library or tools, such as numactl. We refer to this interface as the NUMA interface. Applications can also use the regular POSIX file system interface to allocate and access memory. We refer to this interface as the PMFS interface. This interface is implemented by PMFS, a file system optimized for persistent memory.

NVM-only Architecture

In this architecture, the DBMS uses NVM exclusively for its storage. We compare a memory-oriented DBMS with a disk-oriented DBMS when both are running entirely on NVM storage using the emulator’s NUMA interface. This architecture is illustrated in Figure 1. For the former, we use the H-Store DBMS, while for the latter we use MySQL with the InnoDB storage engine.

Figure 1: NVM-only architecture (1) Memory-oriented system and (2) Disk-oriented system

Memory-oriented System: We use the NUMA interface of the emulator to ensure that all in-memory data is stored on NVM. This data includes the database’s tuples, indices, views, and other elements. This also means that the DBMS is not aware that writes to the in-memory partitions are potentially durable. H-Store uses a logical logging scheme for recovery where the log only contains a record of the high-level operations that each transaction executed.

Disk-oriented System: In this type of system, the internal data is divided into in-memory and disk-resident components. The DBMS maintains a buffer pool in memory to store copies of pages retrieved from the database’s primary storage location on disk. We use the emulator’s NUMA interface to store the DBMS’s buffer pool in the byte-addressable NVM storage, while its data files and logs are stored in NVM through the PMFS interface.

NVM+DRAM Architecture

This is another use case of NVM for OLTP DBMSs. In this configuration, the DBMS relies on both DRAM and NVM for satisfying its storage requirements. This architecture is shown in Figure 2. If we assume that the entire dataset cannot fit in DRAM, the question arises of how to split data between the two storage layers. Because of the relative latency advantage of DRAM over NVM, one strategy is to attempt to keep the hot data in DRAM and the cold data in NVM. One approach to achieve this is to use a buffer pool to cache hot data, as is common in traditional disk-oriented DBMSs. In this case, there are two copies of cached data, one persistent copy on disk and another copy cached in the DRAM-based buffer pool. Pages are swapped in and out of the buffer pool, and writes must be persisted to disk in order to maintain consistency between the two copies of data. Because NVM is persistent, it is possible to directly replace the disk in this architecture with NVM.

Figure 2: NVM+DRAM Architecture (a) Anti-caching system (b) Disk-oriented system

Anti-caching System: Another approach is to use anti-caching. In this approach, data is again spread out over both memory and disk in the anti-caching architecture, with hot data residing in memory and cold data being evicted to disk. However, exactly one copy of the data exists at any point in time. Thus, a tuple is either in memory or in the anti-cache on disk. For this study, we extend H-Store’s anti-caching implementation so that cold data is stored in an NVM-optimized hash table rather than disk.

Disk-oriented System: We also configured MySQL to run on the hybrid NVM+DRAM architecture. We allow the buffer pool to remain in DRAM and store the data and log files using the PMFS interface. The main difference between this configuration and the NVM-only MySQL configuration is that all main memory accesses in this configuration go to DRAM instead of NVM.

Experimental Evaluation

All experiments were done on the NVM emulator described above. For each system, we evaluate the benchmarks on two different NVM latencies: 2X DRAM and 8X DRAM, where the base DRAM latency is approximately 90 ns. We consider these latencies to represent the best case and worst case NVM latencies respectively. We used the YCSB benchmark in our evaluation. We use H-Store’s internal benchmarking framework for both the H-Store on NVM and the anti-caching analysis. For the MySQL trials, we use the OLTP-Bench framework.

NVM-only Architecture

We first consider the impact of NVM latency on the throughput of memory-oriented and disk-oriented systems. The results for the read-heavy workload shown in Figure 3b indicate that increasing NVM latency decreases throughput of both H-Store and MySQL. However, there is no significant impact on H-Store’s performance in the read-only workload shown in Figure 3a. This indicates that NVM latency mainly impacts the performance of the logging mechanism.

Figure 3: NVM-only Architecture – YCSB (a) Read-only (b) Write-only (c) Write-heavy

We now consider the impact of skew on throughput. Interestingly, the throughput of these systems vary differently with skew. The impact of skew on H-Store’s performance is more pronounced in the read-heavy workload shown in Figure 3b. Throughput drops in the read-heavy workload as the skew is reduced. We attribute the drop in throughput to more widespread tuple accesses that increases cache misses and subsequent accesses to NVM. In contrast, the disk-oriented system performs poorly on high-skew workloads but its throughput improves as skew decreases. This is because a disk-oriented system uses locks to allow transactions to execute concurrently. Thus, if a high percentage of the transactions are accessing the same tuples, the resulting lock contention becomes a bottleneck.

NVM+DRAM Architecture

We use the same YCSB skew and workload mixes described above, but data size is fixed at 8X the available memory size, with the rest of the data residing in NVM. The results are seen in Figure 4. There are several interesting conclusions to draw from these results. The first is that the throughput of the two systems trend differently as skew changes. For the read-heavy workload, the anti-caching system has a 13X throughput advantage over the disk-oriented system when skew is high, but only a 1.3X throughput advantage when skew is low, with similar trends for the other workload mixes. The anti-caching system performs best when there is high skew since it needs to fetch fewer blocks and restart fewer transactions. In contrast, the disk-oriented system performs poorly on the high skew workloads due to high lock contention, but throughput increases as skew decreases.

Figure 4: NVM+DRAM Architecture – YCSB (a) Read-only (b) Write-only (c) Write-heavy.

Another interesting result is that both systems do not exhibit a major change in performance with different NVM latencies. This is significant, as it implies that at these latencies, neither system is bottlenecked by the I/O on the NVM. Instead, the decrease in performance is due to the overhead of fetching and evicting data from NVM. For the disk-oriented system, that overhead mostly comes from the buffer pool, while in the anti-caching system the overhead is from the cost of restarting transactions when evicted data is asynchronously fetched.


We explored two possible architectures for using non-volatile memory (i.e., NVM+DRAM and NVM-only architectures). Our analysis shows that memory-oriented systems are better-suited to take advantage of NVM and outperform their disk-oriented counterparts. However, in both architectures, the throughput of the memory-oriented systems decreases as workload skew is decreased while the throughput of the disk-oriented systems increases as workload skew is decreased. Because of this, we conclude that we need a new system for NVM with features of both disk-oriented and memory-oriented systems.

We conclude that the ideal system for both NVM-only and NVM+DRAM architectures will possess features of both memory-oriented and disk-oriented systems.

The project team on this work includes Justin DebrabantJoy ArulrajMichael StonebrakerStan ZdonikAndrew Pavlo, and Subramanya Dulloor.

This entry was posted in Big Data Architecture, Computer Architecture, DBMS, ISTC for Big Data Blog and tagged , , , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *

+ six = 14