Scientific data management made fast and easy

What is TileDB

TileDB is a new system for efficient management of scientific data, i.e., of massive quantities of data that are naturally represented by multi-dimensional arrays. Examples include graphs, genome sequences, matrices, geo-tagged data, and more. TileDB is MIT-licensed open-source software written in C++ for Ubuntu Linux, CentOS Linux and Mac OS X. The current release consists of the TileDB storage manager module, exposed as a C library, which makes it easy for programmers to write applications for diverse, complex, parallel, scientific data analytics. The internal TileDB mechanics and library are thoroughly explained on the Tutorials page. You can also preview our future development plans for TileDB on the Coming up page.

Why Use TileDB

TileDB addresses two important problems that are not well-served by existing array data management solutions: sparsity (i.e., when an array contains many zero or empty elements) and updates. TileDB uses flexible tiling to efficiently capture both dense and sparse arrays, and introduces a novel batch-write technique to manage updates. Both features lead to impressive performance gains over competing solutions.

Who Is Using TileDB

TileDB is being actively used in genomics. It currently serves as a core component of GenomicsDB, developed by the Intel Health and Life Sciences group, which is used by the Broad Institute for storing and processing thousands of whole exome and whole genome sequences, amounting to many terabytes of data.

Get Started

Get the current source code from the TileDB GitHub repo, and view the release history here. It is strongly recommended that you read the detailed tutorials on the internal mechanics of TileDB and the usage of its C API prior to using the software.

Intel logo MIT logo ISTC logo