High Performance Computing 25 SP Programming SMP Platform

2025-05-09

# 高性能计算

# 学习资料

总字数：2753字，预计阅读时间 04分 35秒。

Sharing address space brings simplification.

Shared Address Space Programming Paradigm

Vary on mechanism for data sharing, concurrency models and support for synchronization.

Process based model
Lightweight processes and threads.

Thread

A thread is a single stream of control in the flow of a program.

Logical memory model of a thread:

All memory is globally accessible to every thread and threads are invoked as function calls.

Benefits of threads:

Take less time to create a new thread than a new process.
Less time to terminate a thread than a process.

The taxonomy of thread:

User-Level Thread(ULT): The kernel is not aware of the existence of threads. All thread management is done by the application using a thread library. Thread switching does not require kernel mode privileges and scheduling is application specific.
Kernel-Level Thread(KLT): All thread management is done by kernel. No thread library but an API to the kernel thread facility. Switching between threads requires the kernel and schedule on a thread basis.
Combined ULT and KLT approaches. Combines the best of both approaches.

PThread Programming

Potential problems with threads:

Conflicting access to shared memory.
Race condition occur.
Starvation
Priority inversion
Deadlock

Mutual Exclusion

Mutex locks: implementing critical sections and atomic operations.

Two states: locked and unlocked. At any point of time, only one thread can lock a mutex lock.

Producer-Consumer Work Queues

The producer creates tasks and inserts them into a work-queue.

The consumer threads pick up tasks from the task queue and execute them.

Locks represent serialization points since critical sections must be executed by threads one after the other.

Important: Minimize the size of critical sections.

Condition Variables for Synchronization

The pthread_mutex_trylock alleviates the idling time but introduce the overhead of polling for availability of locks.

An interrupt driven mechanism as opposed to a polled mechanism as the availability is signaled.

A condition variable: a data object used for synchronizing threads. Block itself until specified data reaches a predefined state.

When a thread performs a condition wait, it's not runnable as not use any CPU cycles but a mutex lock consumes CPU cycles as it polls for the locks.

Common Errors: One cannot assume any order of execution, must be explicitly established by mutex, condition variables and joins.

MPI Programming

Low cost message passing architecture.

Mapping of MPI Processes:

MPI views the processes as a one-dimensional topology. But in parallel programs, processes are arranged in higher-dimensional topologies. So it is required to map each MPI process to a process in the higher dimensional topology.

Non-blocking Send and Receive:

MPI_ISend and MPI_Irecv functions allocate a request object and return a pointer to it.