CMIP6 Participation Guidance for Modelers

Karl E. Taylor, Paul J. Durack, Mark Elkington, Eric Guilyardi, David Hassell, Michael Lautenschlager and Martina Stockhause

Document overview:

  1. Requirements and expectations
  2. Experiment design
  3. Forcing data sets
  4. Model output fields
  5. Model output requirements
  6. Software for preparing/checking output
  7. Archiving/publishing output
  8. Documentation process
  9. CMIP6 organization and governance

1. Requirements and expectations

Those groups who plan to participate in CMIP6 should (in roughly this order, although model documentation should be provided as early as possible):

Data managers can aslo register errata using the ES-DOC Errata Command Line Client if they wish to do so.

Further information about the service is available in the Errata Service Documentation.

2. Experiment design

The CMIP6 protocol and experiments are described in a special issue of Geoscientific Model Development with an overview of the overall design and scientific strategy provided in the lead article of that issue by Eyring et al. (2016)

3. Forcing data sets

In CMIP6 all models should adopt the same forcing datasets (and boundary conditions). Experts contacted by the CMIP Panel have prepared the forcing datasets, and a new “input4MIPs” activity has been initiated by PCMDI to encourage adherence to many of the same data standards imposed on obs4MIPs data and CMIP data. These datasets are being collected into a curated archive at PCMDI. All conforming datasets can be downloaded via the Earth System Grid Federation’s input4MIPs CoG. Any dataset not yet conforming to the input4MIPs specifications can be obtained from the individual preparing the dataset, as indicated in the input4MIPs summary sheet.

The input4MIPs summary sheet separately lists the CMIP6 datasets needed for the DECK and historical simulations and the datasets needed for the CMIP6-endorsed MIP experiments. The summary provides contact information, documentation of the data, and citation requirements. Included in the collection are, for example, datasets specifying emissions and concentrations of various atmospheric species, sea surface temperatures and sea ice (for AMIP), solar variability, and land cover characteristics. The current version of the official CMIP Panel forcing dataset collection is 6.2. Users of these datasets should consult the input4MIPs summary sheet before configuring and beginning any new simulation to ensure that they are using the latest versions available.

Some of the endorsed-MIP forcing datasets are still in preparation, but should be available soon. Any changes made to a released dataset will be documented in the summary.

4. Model output fields

The CMIP6 Data Request defines the variables that should be archived for each experiment and specifies the time intervals for which they should be reported. It provides much of the variable-specific metadata that should be stored along with the data. It also provides tools for estimating the data storage requirements for CMIP6.

Additional information about the data request is available at https://cmip6dr.github.io/Data_Request_Home

5. Model output requirements

CMIP6 model output requirements are similar to those in CMIP5, but changes have been made to accommodate the more complex structure of CMIP6 and its data request. Some changes will make it easier for users to find the data they need and will enable new services to be established providing, for example, model and experiment documentation and citation information.

As in CMIP5, all CMIP6 output will be stored in netCDF files with one variable stored per file. The requested output fields can be determined as described above, and as in CMIP5, the data must be “cmorized” (i.e., written in conformance with all the CMIP standards). The CMIP standards build on the CF-conventions, which define metadata that provide a description of the variables and their spatial and temporal properties. This facilitates analysis of the data by users who can read and interpret data from all models in the same way.

As described in section 6, it is recommended, but not required, that the CMOR software library be used to rewrite model output in conformance with the standards. In any case to ensure that a critical subset of the requirements have been met, a CMIP data checker (“PrePARE”) will be applied before data are placed in the CMIP6 data archive.

The CMIP6 data requirements are defined and discussed in the following documents:

Additional metadata requirements are imposed on a variable by variable basis as specified in the CMIP6 Data Request. Many of these are recognized by CMOR (through input via the CMIP6 CMOR Tables), which will ensure compliance.

Note that in the above, controlled vocabularies (CV’s) play a key role in ensuring uniformity in the description of data sets across all models. For all but variable-specific information, reference CV’s are being maintained by PCMDI against which all quality assurance checks will be performed. These CV’s will be relied on in constructing file names and directory structures, and they will enable faceted searches of the CMIP6 archive as called for in the search requirements document. Additional, variable-specific CVs are part of the CMIP6 Data Request. These CV’s are structured in a way that makes clear relationships between certain items appearing in separate CV’s. For example, the CV for model names (“source_id”) indicates which institutions are authorized to run each model, and the complete list of institutions is recorded in a CV for “institution_id”.

As indicated in the guidance specifications for output grids, weights should be provided to regrid all output to a few standard grids (e.g., 1x1 degree). All regridding information (weights, lats, lons, etc.) should be stored consistent with a standard format approved by the WIP. Specifications for the required standard format will be forthcoming.

CMIP6 output requirements that are critical for successful ingestion and access via ESGF will be enforced when publication of the data is initiated. The success of CMIP6 depends on making sure that even the requirements that can not be checked by ESGF are met. This is the responsibility of anyone preparing model output for CMIP6. A minimum set of requirements for publication of CMIP6 data will be met if a dataset passes the checks performed by the PrePARE software package described in the next section.

6. Software for preparing/checking output

To facilitate the production of model output files that meet the CMIP6 technical standards, a software library called “CMOR” (Climate Model Output Rewriter) has been developed and version 3 (CMOR3) is now available at this site, but read the installation instructions available here. This package was first used in CMIP3 and has been generalized and improved for each new CMIP phase. Use of CMOR is not mandatory, but past experience suggests that many common errors in model output files can be avoided by its use.

For those not using CMOR, some checks for compliance with CMIP specifications can be performed using a new code developed in support of CMIP6: the Pre-Publication Attribute Reviewer for ESGF (PrePARE). For information about tests performed by PrePARE, view the design requirements. PrePARE is included as part of the CMOR software suite and all files produced by CMOR are effectively checked by PrePARE, but PrePARE can be invoked without using CMOR to write the output.

In addition to PrePARE, tests for file compliance with the CF-conventions can be made using a tool called the CF-checker. Both PrePARE and the CF-checker will be run as part of the ESGF publication job stream, and only files passing all tests will be published and made available for download.

It should be noted if data are written using CMOR, additional checks will be performed that will, for example:

Additional codes useful in preparing model output for CMIP6 include:

7. Archiving/publishing output

The Earth System Grid Federation (ESGF) will facilitate the global distribution of CMIP6 output.

For CMIP6, the original copies of data will be availble through the data nodes, many of which will be installed and maintained by the modeling centers themselves. Certain ESGF data nodes (known as “Tier 1 nodes”) will serve as the primary access points to the data. A searchable record of model output: the access method and metadata, will be “published” to these nodes, and additionally, replicas of the data will be hosted on these nodes.

As part of “publication”, certain conformance checks are performed, metadata are recorded in a catalog where it can be accessed by the other data nodes, and versioning is managed. The data provider (modeling center) will need to closely coordinate and cooperate with the ESGF data manager(s) of a specific ESGF data node site. Here is a summary of the main steps and requirements in the procedure:

8. Documentation process

Given the wide variety of users and the need for traceability, the CMIP6 results will be fully documented and made accessible via the ES-DOC viewer and comparator interface (https://search.es-doc.org). Each CMIP6 model output file will include a global attribute called “further_info_url” which will link to a signpost web page which will provide simulation/ensemble information, model configuration details, current contact details, data citation details etc. Specifically, ES-DOC will include documentation of:

9. CMIP6 organization and governance

The CMIP Panel, which is a standing subcommittee of the WCRP’s Working Group on Climate Modeling provides overall guidance and oversight of CMIP activities. Notably it determines which MIPs will participate in each phase of CMIP using the established selection criteria listed in Table 1 of Eyring et al. (2016). On its webpages the CMIP Panel provides additional information that may be of interest to CMIP6 participants, but only the CMIP6 Guide (this document) provides definitive documentation of CMIP6 technical requirements.

The endorsed MIPs are managed by independent committees, but acceptance of endorsement obligates them to follow CMIP’s technical requirements. Thus across all MIPs, the modeling groups can prepare their model output following a common procedure.

The CMIP Panel has delegated responsibility for most of the technical requirements of CMIP to the WGCM Infrastructure Panel (WIP). The mission, rationale and Terms of Reference for the panel can be found here. The WIP has drafted a number of position papers summarizing CMIP6 requirements and specifications. Among these is the CMIP6 reference specifications for global attributes, filenames, directory structure and Data Reference Syntax (DRS). The WIP has also set up a CMIP Data Node Operations Team (CDNOT) to interface with data node managers responsible for serving CMIP6 data. This team provides a direct link from the panels establishing data node requirements to those implementing the requirements. Section 7 provides further information about data node operational requirements.

Information is under preparation describing the governance of the following:

Document version: 19 October 2022