Parallel Processing & Distributed Systems - Chapter 11: Programming with Shared Memory - Nguyễn Quang Hùng

Introduction
 This section focuses on programming on shared
memory system (e.g SMP architecture).
 Programming mainly discusses on:
 Multi-processes: Unix/Linux fork(), wait()…
 Multithreads: IEEE Pthreads, Java Thread

54 trang thamphan 26/12/2022 2140

Download

Bạn đang xem 20 trang mẫu của tài liệu "Parallel Processing & Distributed Systems - Chapter 11: Programming with Shared Memory - Nguyễn Quang Hùng", để tải tài liệu gốc về máy hãy click vào nút Download ở trên.

File đính kèm:

parallel_processing_distributed_systems_chapter_11_programmi.pdf

Nội dung text: Parallel Processing & Distributed Systems - Chapter 11: Programming with Shared Memory - Nguyễn Quang Hùng

Programming with Shared Memory Nguyễn Quang Hùng
Introduction  This section focuses on programming on shared memory system (e.g SMP architecture).  Programming mainly discusses on:  Multi-processes: Unix/Linux fork(), wait()  Multithreads: IEEE Pthreads, Java Thread
Shared memory multiprocessor system  Based on SMP architecture.  Any memory location can be accessible by any of the processors.  A single address space exists, meaning that each memory location is given a unique address within a single range of addresses.  Generally, shared memory programming more convenient although it does require access to shared data to be controlled by the programmer (using critical sections: semaphore, lock, monitor ).
Shared memory multiprocessor using a crossbar switch
Several alternatives for programming shared memory multiprocessors  Using library routines with an existing sequential programming language.  Multiprocesses programming:  fork(), execv()  Multithread programming:  IEEE Pthreads library  Java Thread.  Using a completely new programming language for parallel programming - not popular.  High Performance Fortran, Fortran M, Compositional C++ .  Modifying the syntax of an existing sequential programming language to create a parallel programming language. Using an existing sequential programming language supplemented with compiler directives for specifying parallelism.  OpenMP.
FORK-JOIN construct Main program FORK Spawned processes FORK FORK JOIN JOIN JOIN JOIN
UNIX System Calls (2)  SPMD model: master-workers model. 1. 2. pid = fork(); 3. if (pid == 0) { 4. Code to be executed by slave process 5. } else { 6. Code to be executed by master process 7. } 8. if (pid == 0) exit(0); else wait(0); 9.
IEEE Pthreads (1)  IEEE Portable Operating System Interface, POSIX, sec. 1003.1 standard Executing a Pthread thread Main program Thread1 proc1( &arg ) pthread_create( &thread, NULL, proc1, &arg ); { . return( *status ); Pthread_join( thread1, *status); }
The pthread_join() function  #include  void pthread_exit(void *retval);  int pthread_join(pthread_t threadid, void retval);  The function pthread_join() is used to suspend the current thread until the thread specified by threadid terminates. The other thread’s return value will be stored into the address pointed to by retval if this value is not NULL.
Pthread detached threads Main program Parameter (attribute) specifies a detached thread pthread_create() Thread pthread_create() Thread Termination pthread_create() Thread Termination Termination
Thread cancellation  #include  int pthread_cancel(pthread_t thread);  int pthread_setcancelstate(int state, int *oldstate);  int pthread_setcanceltype(int type, int *oldtype);  void pthread_testcancel(void); • The pthread_cancel function allows the current thread to cancel another thread, identified by thread. • Cancellation is the mechanism by which a thread can terminate the execution of another thread. More precisely, a thread can send a cancellation request to another thread. Depending on its settings, the target thread can then either ignore the request, honor it immediately, or defer it till it reaches a cancellation point.
Thread pools  Master-Workers Model:  A master thread controls a collection of worker thread.  Dynamic thread pools.  Static thread pools.  Threads can communicate through shared locations or signals.
Statement execution order (2)  If two processes were to print messages, for example, the messages could appear in different orders depending upon the scheduling of processes calling the print routine.  Worse, the individual characters of each message could be interleaved if the machine instructions of instances of the print routine could be interleaved.
Thread-safe routines  Thread safe if they can be called from multiple threads simultaneously and always produce correct results.  Standard I/O thread safe:  printf(): prints messages without interleaving the characters.  NOT thread-safe functions:  System routines that return time may not be thread safe.  Routines that access shared data may require special care to be made thread safe.
SHARING DATA  Every processor/thread can directly access shared variables, data structures rather than having to the pass data in messages.  Solution for critical sections:  Lock  Mutex  Semaphore  Conditional variables  Monitor
Acsessing shared data  Accessing shared data needs careful control.  Consider two processes each of which is to add one to a shared data item, x. Necessary for the contents of the location x to be read, x + 1 computed, and the result written back to the location: Instruction Process 1 Process 2 x = x + 1; read x read x compute x + 1 compute x + 1 write to x write to x Time
Critical section  A mechanism for ensuring that only one process accesses a particular resource at a time is to establish sections of code involving the resource as so-called critical sections and arrange that only one such critical section is executed at a time  This mechanism is known as mutual exclusion.  This concept also appears in an operating systems.
Control of critical sections through busy waiting Processs 1 Process 2 while (lock == 1) do_nothing; while (lock == 1) do_nothing; lock = 1; Critical section Lock = 0; lock = 1; Critical section Lock = 0;
IEEE Pthreads example  Calculating sum of an array a[ ].  N threads created, each taking numbers from list to add to their sums. When all numbers taken, threads can add their partial results to a shared location sum.  The shared location global_index is used by each thread to select the next element of a[].  After index is read, it is incremented in preparation for the next element to be read. The result location is sum, as before, and will also need to be shared and access protected by a lock.
IEEE Pthreads example (3) 1. #include 2. #include 3. #define ARRAY_SIZE 1000 4. #define NUM_THREADS 10 5. // Global Variables, Shared data 6. int a[ ARRAY_SIZE ]; 7. int global_index = 0; 8. int sum = 0; 9. pthread_mutex_t mutex1; // mutually exclusive lock variable 10. pthread_t worker_threads[ NUM_THREADS ];
IEEE Pthreads example (5) 1. void master() { 2. int i; 3. // Initialize mutex 4. pthread_mutex_init( &mutex1, NULL ); 5. init_data(); 6. create_workers( NUM_THREADS ); 7. // Join threads 8. for (i = 0; i < NUM_THREADS ; i++ ) { 9. if ( pthread_join( worker_threads[i], NULL ) != 0 ) { 10. perror( "PThread join fails" ); 11. } 12. } 13. printf("The sum of 1 to %i is %d \n" , ARRAY_SIZE, sum ); 14. }
Java multithread programming  A class extends from java.lang.Thread class.  A class implements java.lang.Runnable interface. // A sample Runner class public class Runner extends Thread public static void main(String[] args) { { String name; Runner hung = new Runner("Hung"); public Runner(String name) { Runner minh = new Runner("Minh"); this.name = name; Runner ken = new Runner(“Ken"); } hung.start(); public void run() { minh.start(); int N = 10; ken.start(); for (int i = 0; i < N ; i++) { System.out.println("Hello World!"); System.out.println("I am "+ } this.name + "runner at “ + I + “ km."); } // End main thread.delay(100); } }
Language Constructs for Parallelism - Shared Data  Shared Data:  Par Construct  shared memory variables par { might be declared as shared S1; with, say, S2; shared int x; . . Sn; } par { proc1(); proc2(); }
Dependency analysis  To identify which processes could be executed together.  Example: can see immediately in the code forall (i = 0; i < 5; i++) a[i] = 0;  that every instance of the body is independent of other instances and all instances can be executed simultaneously.  However, it may not be that obvious. Need algorithmic way of recognizing dependencies, for a parallelizing compiler.
Example  Example: suppose the two statements are (in C)  a = x + y;  b = x + z;  We have  I1 = (x, y) O1 = (a)  I2 = (x, z) O2 = (b)  and the conditions  I1  O2 =   I2  O1 =   O1  O2 =   are satisfied. Hence, the statements a = x + y and b = x + z can be executed simultaneously.
Shared Memory Programming Performance Issues  Shared data in systems with caches  Cache coherence protocols  False Sharing:  Solution: compiler to alter the layout of the data stored in the main memory, separating data only altered by one processor into different blocks.  High performance programs should have as few as possible critical sections as their use can serialize the code.
Sequential consistency (2) Processors (Programs)