Technology Roadmap White Paper

NPACI Technology-Roadmap White Paper

The Planguages: an Approach to Parallel Technical Computing

1. Overview

The Planguages are parallel programming languages which extend Fortran and C for scientific and engineering applications. The aim is to

accelerate parallel software development,
improve the readability of parallel programs, and
achieve performance with independence from platform- and network-specific details.

The model is one of explicit parallelism with distributed memory.

The Planguages provide concise mathematical expressions for accessing off-process data in explicitly parallel codes. The translators for Pfortran and PC compile the notation into Fortran77 and C along with calls to the target data movement system, such as MPI and PVM. In this way code is portable and many low-level, error-prone details, such as managing message-passing tags, are automatically handled by the translators. To port Planguage codes often amounts to little more than a recompilation with the Pfortran and PC translators, followed by a compilation with the native Fortran and C compilers. Legacy codes are not problematic for the Planguages due to the low impact modification required.

The supersets of Fortran77 and C comprising Pfortan and PC have additional semantics which have been carefully chosen, making for a slim but expressive set of functionality. The additional operators are: @, {}. For example, CI Project Report a=b@q assigns "b" at process "q" to "a" at each local process. The global reduction a=+{b} sums "b" across all processes, assigning the result to "a" at all processes, whereas, a=MIN{b} assigns the minimum value of "b" at all processes to "a"; and a=F{b} uses some user-defined function or subroutine, F, to perform the reduction. In this way, conglomerates of send and receive subroutine calls, taking MPI as an example, can be concisely expressed as a statement that is closer to the logical intent of the program.

Several production computational chemistry programs parallelized with efortran are now in production use on massively parallel computers. These are described in section C.

2. Planguage Components

The translators for Pfortran and PC, pfc and pcc, have undergone a period of hardening during 1998. The base language definition described in the "Pfortran Reference Manual" (available from the Planguage web site) is fully implemented; the translators are robust and have been aggressively tested.

2.1. The Language and Translators

The Planguages take a minimal approach to extending the sequential languages they are based on. This has resulted in a well-thought out set of primitives. The project continues in this vein, weighing new syntax and functionality carefully. Currently, all language features are supported in the translators.

We are currently considering language extensions in parsimonious spirit. We have encountered applications in image processing and in computational chemistry where a mixed memory model is useful, that is, a model supporting non-replicated shared memory and distributed memory. (The current Planguage model is distributed memory.) At the moment we are considering syntactic avenues for expressing shared-memory data structures, while exploring implementation issues of the same.

There are some Planguage features which can benefit from data- and control-flow analysis of the program. xample, the Planguages permit the use of the locally defined variable, myProc - the logical process identifier, as source and destinations in off-process data access statements. At present, the translators require that the variable text "myProc" be explicit on the @-sign line. For example,

a@myProc = b@f(myProc)

is supported, whereas the following is not

p = myProc

a@p = b@f(myProc).

A bigger picture emerges such that variables that are not uniformly defined across processes cannot be used in the control flow where interprocess data movement statements are involved (uniform variables necessarily have the same value at all processes).

It is unusual that the condition described above limits the expression of algorithms, however, erroneous programs can arise due to programmers simply overlooking the uniformity requirement. On the other hand, for algorithms with a task parallel flavor to them, the ability to conduct sound communication among processes executing in portions of code not executed by all is sometimes useful (a scenario where non-uniformity can arise).

Code generation optimizations and more liberal placement of data-movement expressions can be accomplished with data- and control-flow analysis. We consider this the next major develop phase of the translators and consider it as a stage 5, technology transfer milestone (see section F).

In summary, the Planguages consist of a functional set of features which has been found quite adequate for expressing the parallelism in a number of production parallel codes. These have been implemented by the present translators with the caveats above regarding variable uniformity. The project is investigating additional language features which are likely to be useful in parallel code development: non-replicated shared-memory data structures is one area requiring language support; another is input and output. We are are actively researching both of these.

2.2. The Runtime Libraries and Data Movement Algorithms

The translation process used by the Planguage translators proceeds by processing Planguage source code that specifies off-process data accesses, which is translatored into inline data-movement code. The inline code can range from generation of the full data-exchange algorithm, to a single function call to a library.

To a large extent the data-movement algorithm can be developed as a kernel performing some exchange possibly along with some combine, for example, a reduction algorithm. The algorithms currently generated are robust, general algorithms. In cases where the user would like to implement specialized exchange algorithms, the Planguage API to the runtime library can be used, but this is generally discouraged. Thus, where the Planguage primitives are not adequate to express the algorithm (unusual, but possible) it is straightforward to escape from the Planguage notation, but with a potential portability penalty.

To take the runtime library into the next stage (see section F), we are in the process of increasing the breadth of algorithms generated by the Planguages to perform collective data movement and combines.

3. Selected Applications Developed with Pfortran

The several production-level scientific applications described in this section have been developed with Pfortran. Other projects include parallel SOR and graduate-student class projects.

3.1. Quantum Classical Molecular Dynamics

Chemical reactions involving bond formation and breaking are outside the purview of classical molecular dynamics simulations. Yet, to model all atoms quantum mechanically in large, biochemical systems is computationally prohibitive. The Quantum Classical Molecular Dynamics Code addresses this problem by treating a part of the modeled system quantum mechanically and the rest using classical molecular dynamics. The principles behind the QCMD code and an overview of the parallelization strategy using Pfortran are discussed in the references cited below.

References
P. Bala, P. Grochowski, B. Lesyng, and J. A. McCammon, "Quantum-Classical Molecular Dynamics and Its Computer Implementation," Computers & Chemistry, 1995.

P. Bala, T. Clark, P. Grochowski, B. Lesyng, K. Nowinski and J. A. McCammon, "Advanced simulations and visualization of enzymatic reactions using a combined Quantum-Classical Molecular Dynamics code," in the proceedings of the Applied Parallel Computing, 4th International Workshop, PARA'98, 1998, in Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, volume 1541, edited by B. Kaagstrom, J. Dongarra, E. Elmroth and J. Waniewski, pages 409-416, Springer-Verlag Berlin.

3.2. Molecular Dynamics

EulerGROMOS, the spatial decomposition of the molecular dynamics program GROMOS, was developed using Pfortran. EulerGROMOS was released in Spring 1994. Since, the program has been used in the simulation of the acetylcholinesterase dimer in water with approximately 130,000 atoms, which at the time represented the largest ever molecular dynamics simulation of a biological system. The computational chemistry work and computer science aspects of the project are summarized in following references.

References
Terry Clark, Reinhard v. Hanxleden, J. Andrew McCammon and L. Ridgway Scott, "Parallelization using decomposition for Molecular Dynamics," Proceedings of the Scalable High-Performance Computing Conference, Knoxville, Tennessee, May, 1994, pages 95-102, published by the IEEE Computer Society Press.

Stanislaw Wlodek, Terry Clark, L. Ridgway Scott, and J. Andrew McCammon, "Molecular Dynamics of Acetylcholinesterase Dimer Complexed with Tacrine," The Journal of the American Chemistry Society, volume 119, pages 9513-9522, 1997.

3.3. Brownian Dynamics

Reaction kinetics of diffusing substrates with enzymes can be modeled with a combination of Brownian dynamics and electrostatics. The electrostatics typically solve Poisson's equation for the molecular assembly's charge distribution, with solvent modeled implicitly (solvent is usually water). The University of Houston Brownian Dynamics program (UHBD) takes this approach. UHBD was developed by the J. Andrew McCammon group. This program was subsequently parallelized for distributed memory computers using Pfortran, and for the Kendall Square Research KSR1 using compiler directives.

References
B. Bagheri, A. Ilin and L. R. Scott, "Parallelizing UHBD for the iPSC-860," Proceedings of the Intel Supercomputer Users' Group 1993 Annual Users' Conference, pages 295-299, St. Louis, MO, October, 1993.

B. Bagheri, A. Ilin and L. R. Scott, "A Comparison of Shared and Distributed Memory Scalable Parallel Processors: 1. KSR Shared Memory," in the Proceedings of the Scalable High-Performance Computing Conference, pages 9-16, May, 1994. Knoxville, Tennessee, published by IEEE Computer Society Press.

4. Sites where Planguage Translators are Installed

The Planguage translators are installed at

Copernicus University, Torun, Poland
Department of Computer Science, University of Chicago
High Performance Computing Center, University of Houston
SDSC, University of California, San Diego
Wright Patterson Air Force Base, Material Science Laboratory, Dayton, Ohio

5. Documentation

Several documents are available for Pfortran and PC:

The PC Reference Manual (14 pages)
The Pfortran Reference Manual (94 pages)
The Pfortran Users Guide (23 pages).

These can be found on the web site at http://www.hpc.uh.edu/planguages/ and with the Planguage distribution.

In addition, Scott, Clark and Bagheri are completing a book on parallel computing which uses the Planguage model for algorithm development. The book is going to publishers in 1999.

6. NPACI Technology Roadmap Production Stages

Given the NPACI stage definitions, it appears the Planguage project is in stage 2, and moving into stage 3.

Stage 1: See installation sites and applications above.

Stage 2: Early Deployment

This is the currently established stage.

Stage 3: Pre-production

Milestones indicative of this stage:

a. Feed back from NPACI installation and users.

b. Further tune communication algorithms for NPACI platforms. (See section B3.)

Stage 4: Production Milestone indicative of this stage: further distribution.

Stage 5: Technology Transfer

Milestones to reach this stage are the following functionality.

a. Translators with data- and control-flow analysis.

b. Local & global variable analysis, or uniformity (section B1).

c. Incorporate non-replicated shared-memory (section B1).

d. Basic I/O support.

e. Subgroup support (section B1).

page updated January 22, 1999