Parallel Processing & Distributed Systems - Chapter 13: Matrix Multiplication - Thoai Nam
Outline
Sequential matrix multiplication
Algorithms for processor arrays
– Matrix multiplication on 2-D mesh SIMD model
– Matrix multiplication on hypercube SIMD model
Matrix multiplication on UMA
multiprocessors
Matrix multiplication on multicomputers
Sequential matrix multiplication
Algorithms for processor arrays
– Matrix multiplication on 2-D mesh SIMD model
– Matrix multiplication on hypercube SIMD model
Matrix multiplication on UMA
multiprocessors
Matrix multiplication on multicomputers
Bạn đang xem 20 trang mẫu của tài liệu "Parallel Processing & Distributed Systems - Chapter 13: Matrix Multiplication - Thoai Nam", để tải tài liệu gốc về máy hãy click vào nút Download ở trên.
File đính kèm:
- parallel_processing_distributed_systems_chapter_13_matrix_mu.pdf
Nội dung text: Parallel Processing & Distributed Systems - Chapter 13: Matrix Multiplication - Thoai Nam
- Matrix Multiplication Thoai Nam
- Sequential Matrix Multiplication Global a[0 l-1,0 m-1], b[0 m-1][0 n-1], {Matrices to be multiplied} c[0 l-1,0 n-1], {Product matrix} t, {Accumulates dot product} i, j, k; Begin for i:=0 to l-1 do for j:=0 to n-1 do t:=0; for k:=0 to m-1 do t:=t+a[i][k]*b[k][j]; endfor k; c[i][j]:=t; endfor j; endfor i; End. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -3-
- Matrix Multiplication on 2D-Mesh SIMD Model Gentleman(1978) has shown that multiplication of two n*n matrices on the 2-D mesh SIMD model requires 0(n) routing steps We will consider a multiplication algorithm on a 2- D mesh SIMD model with wraparound connections Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -5-
- Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Major phases b0,3 b0,2 b1,3 b0,1 b1,2 b2,3 a0,0 a0,1 a0,2 a0,3 a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 b0,0 b1,1 b2,2 b3,3 a1,0 a1,1 a1,2 a1,3 a a a a1,0 1,1 1,2 1,3 b1,0 b1,1 b1,2 b1,3 b1,0 b2,1 b3,2 a2,0 a2,1 a2,2 a2,3 a a a2,0 a2,1 2,2 2,3 b2,0 b2,1 b2,2 b2,3 b2,0 b3,1 a3,0 a3,1 a3,2 a3,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 b3,0 (b) Staggering all A’s elements (a) Initial distribution in row i to the left by i positions of matrices A and B and all B’s elements in col j upwards by i positions Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -7-
- Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) The rest steps of the algorithm from the viewpoint of processor P(1,2) b b2,2 3,2 a a a a a1,1 a1,2 a1,3 a1,0 1,2 1,3 1,0 1,1 b b3,2 0,2 b b0,2 1,2 b b1,2 2,2 (a) First scalar multiplication step (b) Second scalar multiplication step after elements of A are cycled to the left and elements of B are cycled upwards Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -9-
- Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Detailed Algorithm Global n, {Dimension of matrices} k ; Local a, b, c; Begin for k:=1 to n-1 do Stagger 2 matrices forall P(i,j) where 1 ≤ i,j < n do if i ≥ k then a:= fromleft(a); a[0 n-1,0 n-1] and b[0 n-1,0 n-1] if j ≥ k then b:=fromdown(b); end forall; endfor k; Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -11-
- Matrix Multiplication on 2D-Mesh SIMD Model (cont’d) Can we implement the above mentioned algorithm on a 2-D mesh SIMD model without wrapparound connection? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -13-
- Matrix Multiplication Algorithm for UMA Multiprocessors Algorithm using p processors Global n, {Dimension of matrices} a[0 n-1,0 n-1], b[0 n-1,0 n-1]; {Two input matrices} c[0 n-1,0 n-1]; {Product matrix} Local i,j,k,t; Begin forall Pm where 1 ≤ m ≤ p do for i:=m to n step p do for j:= 1 to n to t:=0; for k:=1 to n do t:=t+a[i,k]*b[k,j]; endfor j; c[i][j]:=t; endfor i; end forall; End. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -15-
- Matrix Multiplication Algorithm for Multicomputers We will study 2 algorithms on multicomputers – Row-Column-Oriented Algorithm – Block-Oriented Algorithm Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -17-
- Row-Column-Oriented Algorithm (cont’d) Why do we have to organize processes as a ring and make them use B’s rows in turn? Design strategy 7: – Eliminate contention for shared resources by changing the order of data access Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -19-
- Row-Column-Oriented Algorithm (cont’d) Example: Use 4 processes to mutliply two matrices A4*4 and B4*4 A B C A B C 2nd iteration A B C A B C Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -21-
- Row-Column-Oriented Algorithm (cont’d) Example: Use 4 processes to mutliply two matrices A4*4 and B4*4 A B C A B C 4th iteration (the last) A B C A B C Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM -23-