High Performance Computing 25 SP CPU Architecture

总字数:2459字,预计阅读时间 04分 05秒。

How to use the newly available transistors?

Parallelsim:

Instruction Level Parallelism(ILP):

  • Implicit/transparent to users/programmers.
  • Instruction pipelining.
  • Superscalar execution.
  • Out of order execution.
  • Register renaming.
  • Speculative execution.
  • Branch prediction.

Task Level Parallelism(TLP):

  • Explicit to users/programmers.
  • Multiple threads or processes executed simultaneously.
  • Multi-core processors.

Data Parallelism:

  • Vector processors and SIMD.

Von Neumann Architecture: the stored-program concept. Three components: processor, memory and data path.

Bandwidth: the gravity of modern computer system.

Instruction Pipelining

Divide incoming instructions into a series of sequential steps performed by different processor unit to keep every part of the processor busy.

Superscalar execution can execute more than one instruction during a clock cycle.

Order of order execution.

Very long instruction word(VLIW): allows programs to explicitly specify instructions to execute at the same time.

EPIC: Explicit parallel instruction computing.

Move the complexity of instruction scheduling from the CPU hardware to the software compiler:

  • Check dependencies between instructions.
  • Assign instructions to the functional units.
  • Determine when instructions are initiated placed together into a single word.

image-20250313184421305

Comparisons between different architecture:

image-20250313184732892

Multi-Core Processor Gala

Symmetric multiprocessing(SMP): a multiprocessor computer hardware and software architecture.

Two or more identical processors are connected to a single shared main memory and have full access to all input and output devices.

Current trend: computer clusters, SMP computers connected with network.

Multithreading: exploiting thread-level parallelism.

Multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion duplicating only private state. A thread switch should be much more efficient than a process switch.

Hardware approaches to multithreading:

fine-grained multithreading:

  • Switches between threads on each clock.
  • Hide the throughput losses that arise from the both short and long stalls.
  • Disadvantages: slow down the execution of an individual thread.

Coarse-grained multithreading:

  • Switch threads only on costly stalls.
  • Limited in its ability to overcome throughput losses

Simultaneous multithreading(SMT):

  • A variation on fine-grained multithreading

image-20250313190913475

Data Parallelism: Vector Processors

Provides high-level operations that work on vectors.

Length of the array also varies depending on hardware.

SIMD and its generalization in vector parallelism approach improved efficiency by the same operation be performed on multiple data elements.

文章作者:Ricardo Ren
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议,诸位读者如有兴趣可任意转载,不必征询许可,但请注明“转载自 Ricardo's Blog ”。

2021 - 2025 © Ricardo Ren ,由 .NET 9.0.2 驱动。

Build Commit # a662ecc14b

蜀ICP备2022004429号-1