Accelerated Linear Algebra on Big Data

By Jack DongarraUniversity of Tennessee Knoxville and Innovative Computing Laboratory

Often with Big Data come massive amounts of computations. For example, gene correlations may be analyzed with the Singular Value Decomposition as it is done in the GenMark benchmark. The SVD algorithm is a robust method that works even in the presence of errors in the input data but the trade-off is a large computational cost: O(n3). For large gene sequence collections n can get as high as tens of thousands, if not hundreds of thousands.

One way to address the problem is to throw hardware resources at the problem in the form of a large multicore cluster. The problem with this solution is two-fold. Multicore CPUs have had only modest increases in performance, which means only a slow increase in computational power over coming years. This leads to the other problem: the distributed memory cluster loses about a half of the peak performance due to the communicating or movement of data between the nodes. A well-established library that computes SVD on clusters is ScaLAPACK. We would like to reap the benefits of the recent progress in accelerated hardware.

Such an opportunity is offered by the coprocessor card from Intel under the moniker Xeon Phi (formerly Many Integrated Cores or MIC). These cards offer many-fold improvement in performance and memory bandwidth over server-grade multicore processors. And at the same time, they only require half or less the energy of the commodity CPUs, which offers an opportunity for greater density in the server room. The remaining piece of the puzzle here is the software, which is offered by the MAGMA library and its support for Xeon Phi [1]. Thanks to its design, MAGMA can handle multiple Xeon Phi devices connected to a single host server. We are in the process of porting modern SVD algorithms from the PLASMA library that will enable a high level of utilization of both the server multicore for the latency-bound portions of the software and multiple Xeon Phi cards for throughput-oriented sections.

References

[1] Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek, and StanimireTomov “Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi”. In Proceedings of PPAM 2013, Warsaw, Poland.

This entry was posted in Big Data Architecture, Computer Architecture, ISTC for Big Data Blog, Math and Algorithms and tagged , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


+ seven = 9