Author:  Huangfu, Yaguang 
Title:  Matrixmap : programming abstraction and implementation of matrix computation for big data applications 
Degree:  M.Phil. 
Year:  2016 
Subject:  Parallel programming (Computer science) Matrices. Computer algorithms. Hong Kong Polytechnic University  Dissertations 
Department:  Dept. of Computing 
Pages:  x, 71 pages : color illustrations 
Language:  English 
InnoPac Record:  http://library.polyu.edu.hk/record=b2925479 
URI:  http://theses.lib.polyu.edu.hk/handle/200/8730 
Abstract:  Big data refers to information that exceeds the processing capacity of conventional database systems and is characterized by its volume, velocity and variety. Big data programming requires parallel programming systems to implement parallel programming models to scale up with flexibility. A parallel programming model is an abstraction which expresses the application logic, defines how to load data into data structures, and perform parallel operations on the structure. The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. Many common machine learning algorithms and graph algorithms can be implemented by matrix operations. However, existing parallel programming systems do not provide programming abstraction and efficient implementation for general matrix computations. For example, DataParallel programming systems such as Spark are inefficient to support matrix operations. GraphParallel programming systems such as GraphLab are for graph algorithms but do not support matrix operations. Largescale matrix computation systems such as MadLINQ are specified for linear algebra operations, but do not support irregular matrix operations. In this thesis, we describe the design and implementation of MatrixMap, a unified and efficient dataparallel programming framework for general matrix computations. MatrixMap provides powerful yet simple abstraction, consisting of a distributed inmemory data structure called bulk key matrix and a computation interface defined by matrix patterns. Users can easily load data into bulk key matrices and program algorithms into parallel matrix patterns. Bulk key matrix is the fundamental data structure of MatrixMap, a scalable and constant distributed shared memory data structure, which stores vectororiented data indexed by key and can keep data across matrix patterns. Matrix patterns can be programmed by userdefined lambda function. Mathematical matrix is the special case with key and value in number. We implement MatrixMap on a shared nothing cluster with multicores support. The BSP model is used to compute each pattern and to form an asynchronous computation pipeline of getting, computing and saving data. Furthermore, we leverage sparse matrices and BLAS (Basic Linear Algebra Subprograms) to speed up inmemory matrix computations. MatrixMap outperforms current stateoftheart systems by employing three key techniques: matrix patterns with lambda functions for irregular and linear algebra matrix operations, asynchronous computation pipeline with optimized data shuffling strategies for specific matrix patterns, and inmemory data structure reusing data in iterations. Moreover, it can automatically handle the parallelization and distribute execution of programs on a large cluster. Based on MatrixMap, many example applications have been implemented and tested. The experiment results show that MatrixMap can be 12 times faster than Spark. 
Files  Size  Format 

b29254796.pdf  578.4Kb 


As a bona fide Library user, I declare that:  


By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms. 