High Performance Computing 25 SP Programming SMP Platform
Sharing address space brings simplification.
Shared Address Space Programming Paradigm
Vary on mechanism for data sharing, concurrency models and support for synchronization.
- Process based model
- Lightweight processes and threads.
Thread
A thread is a single stream of control in the flow of a program.
Logical memory model of a thread:
All memory is globally accessible to every thread and threads are invoked as function calls.
Benefits of threads:
- Take less time to create a new thread than a new process.
- Less time to terminate a thread than a process.
The taxonomy of thread:
- User-Level Thread(ULT): The kernel is not aware of the existence of threads. All thread management is done by the application using a thread library. Thread switching does not require kernel mode privileges and scheduling is application specific.
- Kernel-Level Thread(KLT): All thread management is done by kernel. No thread library but an API to the kernel thread facility. Switching between threads requires the kernel and schedule on a thread basis.
- Combined ULT and KLT approaches. Combines the best of both approaches.
PThread Programming
Potential problems with threads:
- Conflicting access to shared memory.
- Race condition occur.
- Starvation
- Priority inversion
- Deadlock
Mutual Exclusion
Mutex locks: implementing critical sections and atomic operations.
Two states: locked and unlocked. At any point of time, only one thread can lock a mutex lock.
Producer-Consumer Work Queues
The producer creates tasks and inserts them into a work-queue.
The consumer threads pick up tasks from the task queue and execute them.
Locks represent serialization points since critical sections must be executed by threads one after the other.
Important: Minimize the size of critical sections.
Condition Variables for Synchronization
The pthread_mutex_trylock
alleviates the idling time but introduce the overhead of polling for availability of locks.
An interrupt driven mechanism as opposed to a polled mechanism as the availability is signaled.
A condition variable: a data object used for synchronizing threads. Block itself until specified data reaches a predefined state.
When a thread performs a condition wait, it's not runnable as not use any CPU cycles but a mutex lock consumes CPU cycles as it polls for the locks.
Common Errors: One cannot assume any order of execution, must be explicitly established by mutex, condition variables and joins.
MPI Programming
Low cost message passing architecture.
Mapping of MPI Processes:
MPI views the processes as a one-dimensional topology. But in parallel programs, processes are arranged in higher-dimensional topologies. So it is required to map each MPI process to a process in the higher dimensional topology.
Non-blocking Send and Receive:
MPI_ISend
and MPI_Irecv
functions allocate a request object and return a pointer to it.
OpenMP
A standard for directive based parallel programming.
Thread based parallelism and explicit parallelism.
Use fork-join model:
2021 - 2025 © Ricardo Ren ,由 .NET 9.0.2 驱动。
Build Commit # 0f346d9ded