High Performance Computing 25 SP Programming SMP Platform

总字数:2753字,预计阅读时间 04分 35秒。

Sharing address space brings simplification.

Shared Address Space Programming Paradigm

Vary on mechanism for data sharing, concurrency models and support for synchronization.

  • Process based model
  • Lightweight processes and threads.

Thread

A thread is a single stream of control in the flow of a program.

Logical memory model of a thread:

All memory is globally accessible to every thread and threads are invoked as function calls.

image-20250327200344104

Benefits of threads:

  • Take less time to create a new thread than a new process.
  • Less time to terminate a thread than a process.

The taxonomy of thread:

  • User-Level Thread(ULT): The kernel is not aware of the existence of threads. All thread management is done by the application using a thread library. Thread switching does not require kernel mode privileges and scheduling is application specific.
  • Kernel-Level Thread(KLT): All thread management is done by kernel. No thread library but an API to the kernel thread facility. Switching between threads requires the kernel and schedule on a thread basis.
  • Combined ULT and KLT approaches. Combines the best of both approaches.

image-20250403183104279

PThread Programming

Potential problems with threads:

  • Conflicting access to shared memory.
  • Race condition occur.
  • Starvation
  • Priority inversion
  • Deadlock

Mutual Exclusion

Mutex locks: implementing critical sections and atomic operations.

Two states: locked and unlocked. At any point of time, only one thread can lock a mutex lock.

Producer-Consumer Work Queues

The producer creates tasks and inserts them into a work-queue.

The consumer threads pick up tasks from the task queue and execute them.

Locks represent serialization points since critical sections must be executed by threads one after the other.

Important: Minimize the size of critical sections.

Condition Variables for Synchronization

The pthread_mutex_trylock alleviates the idling time but introduce the overhead of polling for availability of locks.

An interrupt driven mechanism as opposed to a polled mechanism as the availability is signaled.

A condition variable: a data object used for synchronizing threads. Block itself until specified data reaches a predefined state.

When a thread performs a condition wait, it's not runnable as not use any CPU cycles but a mutex lock consumes CPU cycles as it polls for the locks.

Common Errors: One cannot assume any order of execution, must be explicitly established by mutex, condition variables and joins.

MPI Programming

Low cost message passing architecture.

image-20250403191254323

Mapping of MPI Processes:

MPI views the processes as a one-dimensional topology. But in parallel programs, processes are arranged in higher-dimensional topologies. So it is required to map each MPI process to a process in the higher dimensional topology.

Non-blocking Send and Receive:

MPI_ISend and MPI_Irecv functions allocate a request object and return a pointer to it.

OpenMP

A standard for directive based parallel programming.

Thread based parallelism and explicit parallelism.

Use fork-join model:

image-20250403195750934

文章作者:Ricardo Ren
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议,诸位读者如有兴趣可任意转载,不必征询许可,但请注明“转载自 Ricardo's Blog ”。

2021 - 2025 © Ricardo Ren ,由 .NET 9.0.2 驱动。

Build Commit # 0f346d9ded

蜀ICP备2022004429号-1