In a joint collaboration between Carnegie Mellon University and Intel Labs, we explore the changes required in future database management systems to fully leverage the unique set of characteristics of non-volatile memory (NVM) technologies. The results of this work are described in a paper, “Let’s Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems,” to be presented at SIGMOD ’15 (May 31- June 4, 2015, Melbourne, Australia).
These new NVM devices are almost as fast as DRAM, but all writes to NVM are potentially persistent even after power loss. However, existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and in fact will degrade the performance of data-intensive applications.
To better understand these issues, we implemented three engines in a modular DBMS testbed that are based on different storage management architectures: (1) in-place updates, (2) copy-on-write updates, and (3) log-structured updates. We then developed NVM-aware variants of these architectures that leverage the persistence and byte-addressability properties of NVM in their storage and recovery methods.
NVM storage devices are currently prohibitively expensive and only support small capacities. For this reason, we use a NVM hardware emulator developed by Intel Labs in this paper. The emulator supports tunable read latencies and read/write bandwidths. This enables us to evaluate multiple hardware profiles that are not specific to a particular NVM technology.
We developed a lightweight DBMS to evaluate different storage architecture designs for OLTP workloads. We did not use an existing DBMS, as that would require significant changes to incorporate the storage engines into a single system. Although some DBMSs support a pluggable storage engine back-end (e.g., MySQL), modifying them to support NVM would still require significant changes. We also did not want to taint our measurements with features not relevant to our evaluation.
We implemented three storage engines that use different approaches for supporting durable updates to a database: (1) in-place (InP) updates engine, (2) copy-on-write (CoW) updates engine, and (3) log-structured (Log) updates engine. Each engine also supports both primary and secondary indexes.For each engine, we describe in the paper how they apply changes made by transactions to the database and then how they ensure durability after a crash. All of these engines are based on the architectures found in state-of-the-art DBMSs. That is, they use memory obtained using the allocator interface as volatile memory and do not exploit NVM’s persistence.
All of the engines described above are derived from existing DBMS architectures that are predicated on a two-tier storage hierarchy comprising volatile DRAM and a non-volatile HDD/SSD. These storage devices have distinct hardware constraints and performance properties. First, the read and write latency of non-volatile storage is several orders of magnitude higher than DRAM. Second, the DBMS accesses data on non-volatile storage at block-granularity, while with DRAM it accesses data at byte-granularity. Third, the performance gap between sequential and random accesses is greater for non-volatile storage compared to DRAM.
The traditional engines were designed to account for and reduce the impact of these differences. For example, they maintain two layouts of tuples depending on the storage device. Tuples stored in memory can contain non-inlined fields because DRAM is byte-addressable and handles random accesses efficiently. In contrast, fields in tuples stored on durable storage are inlined to avoid random accesses because they are more expensive. To amortize the overhead for accessing durable storage, these engines batch writes and flush them in a deferred manner.Many of these techniques, however, are unnecessary in a system with a NVM-only storage hierarchy. One of the main problems with the traditional in-place updates (InP) engine described is that it has a high rate of data duplication. When a transaction inserts a tuple, the engine records the tuple’s contents in the write-ahead log (WAL) and then again in the table storage area. The InP engine’s logging infrastructure also assumes that the system’s durable storage device has orders of magnitude higher write latency compared to DRAM. It therefore batches multiple log records and flushes them periodically to the WAL using sequential writes. This approach, however, increases the mean response latency as transactions need to wait for the group-commit operation.
Given this, we designed the NVM-optimized InP engine to avoid these issues. Now, when a transaction inserts a tuple, rather than copying the tuple to the WAL, the NVM-InP engine only records a non-volatile pointer to the tuple in the WAL. This is sufficient because both the pointer and the tuple referred to by the pointer are stored on NVM. Thus, the engine can use the pointer to access the tuple after the system restarts without needing to re-apply changes in the WAL. It also stores indexes as non-volatile B+trees that can be accessed immediately when the system restarts without rebuilding. More details on the NVM-optimized copy-on-write (CoW) engine and the log-structured (Log) updates engine can be found in the paper.
In this section, we present our analysis of the six different storage engine implementations. Our DBMS testbed allows us to evaluate the throughput, the number of reads/writes to the NVM device, the storage footprint, and the time that it takes to recover the database after restarting. We begin with an analysis of the impact of NVM’s latency on the performance of the storage engines. To obtain insights that are applicable for various NVM technologies, we run the YCSB and TPC-C benchmarks under three latency configurations on the emulator: (1) default DRAM latency configuration (160 ns), (2) a low NVM latency configuration that is 2× higher than DRAM latency (320 ns), and (3) a high NVM latency configuration that is 8× higher than DRAM latency (1280 ns).Our analysis shows that the NVM access latency has the most impact on the runtime performance of the engines, more so than the amount of skew or the number of modifications to the database in the workload. This difference due to latency is more pronounced with the NVM-aware variants; their absolute throughput is better than the traditional engines, but longer latencies cause their performance to drop more significantly. This behavior is because they are no longer bottlenecked by heavyweight durability mechanisms.
The NVM-aware engines also perform fewer store operations, which will help extend NVM device lifetimes. We attribute this to the reduction in redundant data that the engines store when a transaction modifies the database. Using the allocator interface with non-volatile pointers for internal data structures also allows them to have a smaller storage footprint. This in turn avoids polluting the CPU’s caches with unnecessary copying and transformation operations. It also improves the recovery times of the engines that use a WAL since they no longer record redo information.
Overall, we find that the NVM-InP engine performs the best across a wide set of workload mixtures and skew settings for all NVM latency configurations. The NVM-CoW engine did not perform as well for write-intensive workloads, but may be a better fit for DBMSs that support non-blocking read-only transactions. For the NVM-Log engine, many of its design assumptions are not copacetic for a single-tier storage hierarchy. The engine is essentially performing in-place updates like the NVM-InP engine but with additional overhead of maintaining its legacy components.
This paper explored the fundamentals of storage and recovery methods in OLTP DBMSs running on an NVM-only storage hierarchy. We implemented three storage engines in a modular DBMS testbed with different architectures: (1) in-place (InP) updates, (2) copy-on-write (CoW) updates, and (3) log-structured (Log) updates. We then developed optimized variants of each of these engines that better make use of NVM’s characteristics.
Our experimental analysis with two different OLTP workloads showed that our NVM-aware engines outperform the traditional engines by up to 5.5× while reducing the number of writes to the storage device by more than half on write-intensive workloads. We also demonstrated that our NVM-aware recovery protocols allow these engines to recover almost instantaneously after the DBMS restarts.
We found that the NVM access latency has the most impact on the runtime performance of the engines, more so than the workload skew or the number of modifications to the database in the workload. Our evaluation showed that the NVM-aware in-place updates engine achieved the best throughput among all the engines with the least amount of wear on the NVM devices.