|
Berkeley UPC - Unified Parallel C
|
|
|
UPC Language Features |
|
Explicitly Parallel Execution Model
The UPC execution model is similar to that used by the message passing style
of programing (MPI or PVM). This model is called Single Program Multiple
Data (SPMD) and it is an explicitly parallel model. In UPC terms,
the execution vehicle for a program is called a thread. The language
defines a private variable - MYTHREAD - that can be used to distinguish
between the threads of an UPC program. An UPC program it is run in its entirety
independently on each thread. The language does not define any correspondence
between an UPC thread and its OS level counterparts, nor does it define
any mapping to physical CPU's. Because of this, UPC threads can be implemented
either as full-fledged OS processes or as threads (user or kernel level).
On a parallel system, an UPC program running with shared data will contain
at least one UPC thread per physical processor available.
To represent the amount of parallelism available to a program (i.e. number
of UPC threads), the language introduces a new variable - THREADS.
The value of this variable can be set in two ways: 1) at compile time, for
the cases where the amount of parallelism is a priori known and 2) at run-time
.
For more details see the UPC language specification.
Shared Address Space
UPC distinguishes the data available to a thread into shared
data and private data. This distinction is made through the usage
of a new C type-qualifier: shared. Data qualified as shared is accessible
from within any UPC thread, i.e. the same address on each thread refers to
the same physical memory location. Data that lacks the shared qualifier is
considered thread private data, i.e. the same address on each thread refers
to distinct physical memory locations. At the language level, there is no
syntactic difference between the accesses to a shared variable and the accesses
to a private variable.
The language also defines the "physical" association between shared data
items and UPC threads. This association, called affinity in UPC terms,
indicates that a particular thread "owns" a particular data item. From
the implementation point of view, affinity translates into storing data into
the physical memory of the CPU where the UPC thread is running.
When defining the affinity, the language distinguishes between scalar data
and array data. All scalar data (of any of the primitive C types, pointer
type or user defined aggregate type) has affinity with thread 0. For
arrays, the language allows for three affinity granularities:
- cyclic (per element) - successive elements of the array
have affinity with successive threads.
- blocked-cyclic (user-defined) - the array
is divided into user-defined size blocks and the blocks are cyclically distributed
among threads.
- blocked (run-time) - each thread has affinity to one
contiguous part of the array. The size of the contiguous part is determined
in such a way that the array is "evenly" distributed among threads.
To accommodate data affinity, UPC defines rules for arithmetic on pointers
to shared data items and provides language level primitives to inspect
the affinity of a shared data item.
For more details see the UPC language specification.
Synchronization Primitives
UPC makes no implicit assumptions about the interaction between threads.
All thread interaction is explicitly managed by the programmer through the
primitives provided by the language: locks, barriers, memory fences.
For more details see the UPC language specification.
Memory Consistency Model
To define the interaction between memory accesses to shared data, UPC provides
two user controlled consistency models. Each memory reference in a program,
can be annotated to be either strict or relaxed. In the
"strict" model, the program executes in a sequential consistency model (Lamport).
This means that it appears to all threads that the strict references
within the same thread appear in the program order, relative to all other
accesses. In the "relaxed" model, it appears to the issuing
thread that all shared references within the thread appear in the program
order.
The language allows the user to specify the access consistency on a per
variable basis or, at coarser granularity, on a per statement basis.
For more details see the UPC language specification.
Parallel Utility Libraries
UPC provides a number of utility libraries that encapsulate functionality commonly
required for writing parallel applications, and provide standardized interfaces
to capabilities frequently found in modern HPC hardware. These include:
- Collective Operations (eg broadcast, scatter, gather, exchange, reductions, scans, etc)
- Atomic Memory Operations
- Blocking and Non-Blocking Data Copy
- Parallel File System I/O
- High-Performance Wall-Clock Timers
For more details, see the UPC Required Library Specification
and the UPC Optional Library Specification.
This page last modified on Sunday, 25-Sep-2016 13:51:04 PDT