Matrixmap : programming abstraction and implementation of matrix computation for big data applications

Huangfu, Yaguang

Author:	Huangfu, Yaguang
Title:	Matrixmap : programming abstraction and implementation of matrix computation for big data applications
Advisors:	Cao, Jiannong (COMP)
Degree:	M.Phil.
Year:	2016
Subject:	Parallel programming (Computer science) Matrices. Computer algorithms. Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	x, 71 pages : color illustrations
Language:	English
Abstract:	Big data refers to information that exceeds the processing capacity of conventional database systems and is characterized by its volume, velocity and variety. Big data programming requires parallel programming systems to implement parallel programming models to scale up with flexibility. A parallel programming model is an abstraction which expresses the application logic, defines how to load data into data structures, and perform parallel operations on the structure. The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. Many common machine learning algorithms and graph algorithms can be implemented by matrix operations. However, existing parallel programming systems do not provide programming abstraction and efficient implementation for general matrix computations. For example, Data-Parallel programming systems such as Spark are inefficient to support matrix operations. Graph-Parallel programming systems such as GraphLab are for graph algorithms but do not support matrix operations. Large-scale matrix computation systems such as MadLINQ are specified for linear algebra operations, but do not support irregular matrix operations. In this thesis, we describe the design and implementation of MatrixMap, a unified and efficient data-parallel programming framework for general matrix computations. MatrixMap provides powerful yet simple abstraction, consisting of a distributed in-memory data structure called bulk key matrix and a computation interface defined by matrix patterns. Users can easily load data into bulk key matrices and program algorithms into parallel matrix patterns. Bulk key matrix is the fundamental data structure of MatrixMap, a scalable and constant distributed shared memory data structure, which stores vector-oriented data indexed by key and can keep data across matrix patterns. Matrix patterns can be programmed by user-defined lambda function. Mathematical matrix is the special case with key and value in number. We implement MatrixMap on a shared nothing cluster with multi-cores support. The BSP model is used to compute each pattern and to form an asynchronous computation pipeline of getting, computing and saving data. Furthermore, we leverage sparse matrices and BLAS (Basic Linear Algebra Subprograms) to speed up in-memory matrix computations. MatrixMap outperforms current state-of-the-art systems by employing three key techniques: matrix patterns with lambda functions for irregular and linear algebra matrix operations, asynchronous computation pipeline with optimized data shuffling strategies for specific matrix patterns, and in-memory data structure reusing data in iterations. Moreover, it can automatically handle the parallelization and distribute execution of programs on a large cluster. Based on MatrixMap, many example applications have been implemented and tested. The experiment results show that MatrixMap can be 12 times faster than Spark.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
b29254796.pdf	For All Users	564.87 kB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/8730