Wednesday, May 27, 2020

Multi-Processing


  • OpenACC is a directive-based programming model designed to provide a simple yet powerful approach to accelerators without significant programming effort. With OpenACC, a single version of the source code will deliver performance portability across the platforms.


The NVIDIA HPC SDK™ with OpenACC offers scientists and researchers a quick path to accelerated computing with less programming effort. By inserting compiler “hints” or directives into your C11, C++17 or Fortran 2003 code, with the NVIDIA OpenACC compiler you can offload and run your code on the GPU and CPU.

https://developer.nvidia.com/openacc


  • OpenACC is a user-driven directive-based performance-portable parallel programming model. It is designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model. The OpenACC specification supports C, C++, Fortran programming languages and multiple hardware architectures including X86 & POWER CPUs, and NVIDIA GPUs.

https://www.openacc.org/


  • OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran .OpenMP is designed for multi-processor/core, shared memory machines. The underlying architecture can be shared memory UMA or NUMA. 

http://hpc.mediawiki.hull.ac.uk/Programming/OpenMP


  • Sequential Program

When you run sequential program
Instructions executed on 1 core
Other cores are idle
Waste of available resources. We want all cores to be used to execute program.
What is OpenMP?
Defacto standard API for writing shared memory parallel applications in C, C++, and Fortran
OpenMP API consists of:
Compiler Directives
Runtime subroutines/functions
Environment variables
https://people.math.umass.edu/~johnston/PHI_WG_2014/OpenMPSlides_tamu_sc.pdf


  • Memory modelsParallel computing is about data processing.In practice, memory models determine how we write parallel programs

Two types:Shared memory modelDistributed memory mode
Shared MemoryAll CPUs have access to the (shared) memory
Distributed MemoryEach CPU has its own (local) memory, invisible to other CPUs


  • Hybrid Model

Shared-memory style within a node
Distributed-memory style across nodes
https://idre.ucla.edu/sites/default/files/intro-openmp-2013-02-11.pdf


  • Advantages of OpenMP

Simple programming model –Data decomposition and communication handled by compiler directives •Single source code for serial and parallel codes •No major overwrite of the serial code  •Portable implementation •Progressiveparallelization –Start from most critical or time consuming part of the code
OpenMP vs. MPI
OpenMP Basic Syntax
Loop Parallelism
Threads share the work in loop parallelism. •For example, using 4 threads under the default “static” scheduling, in Fortran: –thread 1 has i=1-250 –thread 2 has i=251-500, etc.
Loop Parallelism:  ordered and collapse
https://www.nersc.gov/assets/Uploads/XE62011OpenMP.pdf
  • HPCaaS

Designed for speed and simplicity, HPCaaS from Rescale on IBM Cloud™ enables you to execute your HPC jobs along with the associated data in a few easy clicks. You configure the workflow and job execution environment (for example, compute cores, memory and GPU options) and execute and monitor the work directly from the easy-to-use portal.
https://www.ibm.com/cloud/hpcaas-from-rescale