Priority Project "POMPA"
Performance On Massively Parallel Architectures

Last updated: 29 Aug 2018

Project leader: Xavier Lapillonne (MeteoSwiss, Oliver Fuhrer before 2015)

Project resources

Project duration:      September 2010 – July 2018

FTEs (plan/used)      2.63/2.17  COSMO year 2016-2017

                                   1.03/1.01    COSMO year 2017-2018

 

Introduction

  1. Description
  2. Information
  3. Motivation
  4. Actions proposed
  5. Project tasks
  1. Links to other projects
  2. Risks
  3. Participants
  4. References

Description

Efficient use of high performance computing (HPC) systems is a key to enable higher resolution, larger computational domains, more complex physical parametrizations, or more ensemble members. Future HPC systems are projected to achieve higher performance via a massive increase in processor cores accompanied by a stagnant or decreasing clock frequency. It is the goal of this project to prepare the COSMO model code for these emerging massively parallel systems.

Information

Apart from the information that can be found on these (rather static) pages, please take a look at the Wiki pages and the hpcforge.com project.

Motivation

The huge computational demands of numerical weather prediction at very high resolutions or for large ensembles can only be met by high performance computers. In spite of ever increasing peak performance of these systems according to Moore's Law, the underlying architecture of high performance computers has drastically changed over the past years. Due to power constraints, the clock frequency no longer increases (sometimes even decreases). Instead, increase in performance is achieved by a massive increase in the number of processing units (cores). Several such cores reside on a single chip and share a part of the memory hierarchy. Several chips may reside on a node and share off-chip memory. Several nodes are clustered together into a large high performance computer.

In order to efficiently utilize such a system, software must be able to cope with the massive concurrency of these systems and be aware of the complex memory hierarchy (caches, non-uniform memory architectures (NUMA) system memory). The situation with the current COSMO code is the following:

As stated in section 10 of the COSMO Science Plan, "if this subject is not tackled within the next 3-4 years, the COSMO model will not be able to use new computer hardware efficiently in the future". In this project we propose to attack these issues in order to ensure a generic and portable efficiency of the COSMO model on clustered, massively parallel high performance computers of the future. Similar efforts have already started or are planned for other model systems (HIRLAM, WRF, UM, ECMWF IFS (George et al. 1999)).

Within the High Performance and High Productivity Computing (HP2C) initiative in Switzerland a project proposal ("Regional Climate and Weather Modelling on the Next Generations High-Performance Computers") to address this challenge has been accepted and has started in June 2010. This presents a unique opportunity to integrate and address performance on massively parallel architectures in the framework of COSMO priority project in order to profit from the synergy of the HP2C activities and ensure the flow of results into the operational applications of the COSMO model.

Actions proposed

Project tasks

Work has been separated into these distinct tasks:

Links to other projects or work packages

A large part of the resources (e.g. partly Task 3/4/6, entirely Task 5/7) for this project will be contributed by the project "Regional Climate and Weather Modeling on the Next Generations High-Performance Computers: Towards Cloud-Resolving Simulations" within the High Performance and High Productivity Computing initiative.
(see www.hp2c.ch/projects/cclm)

The DWD is developing a unified physics package for ICON and COSMO which will introduce blocked data structures into all physical parametrizations calls. This work will need to be coordinated with PP POMPA.

PP Conservative Dynamical Core (CDC) which will deliver a new implementation of the dynamical core. An early dialogue between the two projects has to be established since both PPs work on the dynamical core.

A pre-proposal has been submitted to the G8 call "Software towards Exascale Computing for Global Scale Issues" which would encompass a more aggressive porting of COSMO on GPUs than task 6 in collaboration with the Giorgia Institute of Technology (Atlanta, USA).

Risks

The work of task 5 and 6 may entail rewriting parts of the COSMO code. These tasks are a high-risk high-potential endeavour part of the HP2C initiative. It will be a challenge of this project to leverage the results of these tasks in the official COSMO code. This risk is considered as substantial.

The PP CDC will deliver a recommendation for a new dynamical core for COSMO. Activities for an implementation of this new dynamical core will start earliest beginning 2011. The performance bottlenecks and scaling properties of this new dynamical core might be different from the current one and there is a risk that results achieved in this project will not be portable to the new dynamical core. An early and regular exchange of information of the two projects should keep this risk under control.

Participants:

MeteoSwiss oliver.fuhrer@meteoswiss.ch, xavier.lapillonne@meteoswiss.ch
DWD ulrich.schaettler@dwd.de
CSCS wsawyer@cscs.ch, cordery@cscs.ch, nstring@cscs.ch, jgp@cscs.ch thomas.schulthess@cscs.ch
CASPUR lanucara@caspur.it
USAM ferri@meteoam.it, cheloni@meteoam.it
SCS tgysi@scs.ch, men.muheim@scs.ch
C2SM anne.roches@env.ethz.ch, isabelle.bey@env.ethz.ch

References

Christen, M., Schenk, O., Messmer, P., Neufeld, E., Burkhart, H. 2009 Accelerating Stencil-Based Computations by Increased Temporal Locality on Modern Multi- and Many-Core Architectures. Proceedings of the 2009 IPDPS conference.
George, D. D., G. Mozdzynski, D. Salmond 1999 Implementation and Performance of OpenMP in ECMWF's IFS Code, Proceedings of Fifth European SGI/Cray MPP Workshop.
Linford, J, Michalakes J., Sandu A., Vachharajani M. 2009 Multi-core acceleration of chemical kinetics for simulation and prediction, to appear in proceedings of the 2009 ACM/IEEE conference on supercomputing (SC'09), ACM
Michalakes J., J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, W. Wang 2005 The weather research and forecast model: Software architecture and performance, Proceedings of the 11th Workshop on the use of high performance computing in meteorology, 156-168
Michalakes J., J. Hacker, R. Loft, M. O. McCracken, A. Snavely, N. J. Wright, T. Spelce, B. Gorda, R. Walkup 2008 WRF nature run, Journal of Physics: Conference Series, 125, 12-22
Michalakes J., M. Vachharajani 2008 GPU acceleration of numerical weather prediction. Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium, 2308-14
Michalakes, J., J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, W. Wang 2005 The Weather Research and Forecast Model: Software Architecture and Performance. Proceedings of the Eleventh ECMWF Workshop on the Use of High Performance Computing in Meteorology. Eds. Walter Zwieflhofer and George Mozdzynski. World Scientific, 156-168
Micikevicius, P. 2009 3D finite difference computation on GPUs using CUDA, ACM International Conference Proceeding Series; Vol. 383; Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing pp. 79-84
Schattler, U., Krenzien E. 1997 The parallel 'Deutschland-Modell' – A message passing version for distributed memory computers, Parallel Computing, 23 (14), 2215-2226
Schattler, U., G. Doms, J. Steppeler 2000 Requirements and problems in parallel model development at DWD, Scientific Programming, 8 (1), 13-22
Shimokawabe, T., T. Aoki, J. Ishida, C. Muroi 2010 GPU Acceleration of the Meso-scale Atmospheric Model ASUCA. 12th International Spezialist Meeting on Next Generation Models on Climate Change and Sustainability for High Performance Computing Facilities. Ibaraki, Japan.