Document overview:
Those groups who plan to participate in CMIP6 should (in roughly this order, although model documentation should be provided as early as possible):
Indicate your intention to participate by registering your institution and model following the instructions on the WCRP-CMIP github site. You will not be able to publish your model output (on ESGF) without first registering your institution and model. (To do this, anyone without a github account will have to create one). The currently registered institutions are listed in a “json” file and can be displayed in table form, and so are the currently registered models: “json” file and table
Request an account and then register contact information for person(s) responsible for entering and maintaining CMIP6 model output citation information in the citation GUI (Documentation of GUI). This data reference information should be provided before the data is published in the ESGF. Data references that are generated during the publication step will be used by web-based services being developed and maintained at DKRZ to ensure that data produced by your center is properly cited. Data users will be able to access citation information by: 1) following the URL stored as a global attribute (further_info_url) in each netCDF file, or 2) by following links to each dataset displayed by the ESGF search service.
To request an account, provide the following to Martina Stockhause (stockhause@dkrz.de):
As an example of information that will be recoverable through the citation service consider the input4MIPs data set which has been recorded at the citation service at https://doi.org/10.22033/ESGF/input4MIPs.2204.
If you are not yet included in the CMIP6-MODELGROUPS-SCI mail list, register your scientific contact with CMIP Panel Chair, Veronika Eyring (veronika.eyring@dlr.de)
Indicate your intention to participate in “endorsed MIPs” by signing up for the endorsed-MIP mailing lists of interest (click on each MIP of interest in the list) and also registering the information in the activity_participation field of your source_id (see first bullet above)
Perform required DECK, historical, and selected endorsed-MIP experiments, using the required, standard forcing datasets
Save all requested model output
Provide all required model documentation, including forcing information and a description of ensemble variants
Prepare and make available model output according to CMIP6 specifications (see sections 5, 6, and 7 below)
Correct published data when errors are discovered. This should be performed using the ES-DOC Errata Service. When an error is discovered, an ESGF data manager can use the webforms to clearly and concisely document the issue. Through the PID integration, this errata service will include all the datasets/files affected when documentation is completed correctly.
Data managers can aslo register errata using the ES-DOC Errata Command Line Client if they wish to do so.
Further information about the service is available in the Errata Service Documentation.
The CMIP6 protocol and experiments are described in a special issue of Geoscientific Model Development with an overview of the overall design and scientific strategy provided in the lead article of that issue by Eyring et al. (2016)
Each model participating in CMIP6 must contribute results from the four DECK experiments (piControl, AMIP, abrupt4xCO2, and 1pctCO2) and the CMIP6 historical simulation. See Eyring et al. (2016) where the experiment protocol is documented. These experiments are considered to define the ongoing (slowly evolving) “CMIP Activity” and are directly overseen by the CMIP Panel
In addition to the DECK and historical simulations, each modeling group may choose to contribute to any CMIP6 endorsed MIPs of interest, but for each MIP component, results must be provided from the full subset of “tier 1” experiments. See the GMD Special CMIP6 Issue for descriptions of each MIP and its experiment specifications. Each endorsed MIP is managed by an independent committee. The MIPs are identified as separate “CMIP6 Activities”, but their coordination and their endorsement as part of CMIP6 is the responsibility of the CMIP Panel. The process by which MIP activities become endorsed is described here and the criteria for endorsement are listed in Table 1 of Eyring et al. (2016). The official names of the currently endorsed CMIP6 MIPs are recorded in a “json” file
When called for by the experiment protocol, standard forcing data sets should be used. Any deviation from the standard forcing must be clearly documented.
Further documentation about CMIP6 experiments will be available from ES-DOC, and the reference controlled vocabularies used to define and identify these experiments are available in a “json” file and can be displayed in table form
In CMIP6 all models should adopt the same forcing datasets (and boundary conditions). Experts contacted by the CMIP Panel have prepared the forcing datasets, and a new “input4MIPs” activity has been initiated by PCMDI to encourage adherence to many of the same data standards imposed on obs4MIPs data and CMIP data. These datasets are being collected into a curated archive at PCMDI. All conforming datasets can be downloaded via the Earth System Grid Federation’s input4MIPs CoG. Any dataset not yet conforming to the input4MIPs specifications can be obtained from the individual preparing the dataset, as indicated in the input4MIPs summary sheet.
The input4MIPs summary sheet separately lists the CMIP6 datasets needed for the DECK and historical simulations and the datasets needed for the CMIP6-endorsed MIP experiments. The summary provides contact information, documentation of the data, and citation requirements. Included in the collection are, for example, datasets specifying emissions and concentrations of various atmospheric species, sea surface temperatures and sea ice (for AMIP), solar variability, and land cover characteristics. The current version of the official CMIP Panel forcing dataset collection is 6.2. Users of these datasets should consult the input4MIPs summary sheet before configuring and beginning any new simulation to ensure that they are using the latest versions available.
Some of the endorsed-MIP forcing datasets are still in preparation, but should be available soon. Any changes made to a released dataset will be documented in the summary.
The CMIP6 Data Request defines the variables that should be archived for each experiment and specifies the time intervals for which they should be reported. It provides much of the variable-specific metadata that should be stored along with the data. It also provides tools for estimating the data storage requirements for CMIP6.
Additional information about the data request is available at https://cmip6dr.github.io/Data_Request_Home
CMIP6 model output requirements are similar to those in CMIP5, but changes have been made to accommodate the more complex structure of CMIP6 and its data request. Some changes will make it easier for users to find the data they need and will enable new services to be established providing, for example, model and experiment documentation and citation information.
As in CMIP5, all CMIP6 output will be stored in netCDF files with one variable stored per file. The requested output fields can be determined as described above, and as in CMIP5, the data must be “cmorized” (i.e., written in conformance with all the CMIP standards). The CMIP standards build on the CF-conventions, which define metadata that provide a description of the variables and their spatial and temporal properties. This facilitates analysis of the data by users who can read and interpret data from all models in the same way.
As described in section 6, it is recommended, but not required, that the CMOR software library be used to rewrite model output in conformance with the standards. In any case to ensure that a critical subset of the requirements have been met, a CMIP data checker (“PrePARE”) will be applied before data are placed in the CMIP6 data archive.
The CMIP6 data requirements are defined and discussed in the following documents:
Additional metadata requirements are imposed on a variable by variable basis as specified in the CMIP6 Data Request. Many of these are recognized by CMOR (through input via the CMIP6 CMOR Tables), which will ensure compliance.
Note that in the above, controlled vocabularies (CV’s) play a key role in ensuring uniformity in the description of data sets across all models. For all but variable-specific information, reference CV’s are being maintained by PCMDI against which all quality assurance checks will be performed. These CV’s will be relied on in constructing file names and directory structures, and they will enable faceted searches of the CMIP6 archive as called for in the search requirements document. Additional, variable-specific CVs are part of the CMIP6 Data Request. These CV’s are structured in a way that makes clear relationships between certain items appearing in separate CV’s. For example, the CV for model names (“source_id”) indicates which institutions are authorized to run each model, and the complete list of institutions is recorded in a CV for “institution_id”.
As indicated in the guidance specifications for output grids, weights should be provided to regrid all output to a few standard grids (e.g., 1x1 degree). All regridding information (weights, lats, lons, etc.) should be stored consistent with a standard format approved by the WIP. Specifications for the required standard format will be forthcoming.
CMIP6 output requirements that are critical for successful ingestion and access via ESGF will be enforced when publication of the data is initiated. The success of CMIP6 depends on making sure that even the requirements that can not be checked by ESGF are met. This is the responsibility of anyone preparing model output for CMIP6. A minimum set of requirements for publication of CMIP6 data will be met if a dataset passes the checks performed by the PrePARE software package described in the next section.
To facilitate the production of model output files that meet the CMIP6 technical standards, a software library called “CMOR” (Climate Model Output Rewriter) has been developed and version 3 (CMOR3) is now available at this site, but read the installation instructions available here. This package was first used in CMIP3 and has been generalized and improved for each new CMIP phase. Use of CMOR is not mandatory, but past experience suggests that many common errors in model output files can be avoided by its use.
For those not using CMOR, some checks for compliance with CMIP specifications can be performed using a new code developed in support of CMIP6: the Pre-Publication Attribute Reviewer for ESGF (PrePARE). For information about tests performed by PrePARE, view the design requirements. PrePARE is included as part of the CMOR software suite and all files produced by CMOR are effectively checked by PrePARE, but PrePARE can be invoked without using CMOR to write the output.
In addition to PrePARE, tests for file compliance with the CF-conventions can be made using a tool called the CF-checker. Both PrePARE and the CF-checker will be run as part of the ESGF publication job stream, and only files passing all tests will be published and made available for download.
It should be noted if data are written using CMOR, additional checks will be performed that will, for example:
Additional codes useful in preparing model output for CMIP6 include:
Code to calculate nominal_resolution: For the common case of a regular spherical coordinate (latitude x longitude) global grid, the nominal_resolution can be calculated using a formula given in Appendix 2 of the CMIP6 netCDF global attributes document. For other grids, the nominal_resolution can be calculated with the following code:
conda install -c pcmdi nominal_resolutionThe Earth System Grid Federation (ESGF) will facilitate the global distribution of CMIP6 output.
For CMIP6, the original copies of data will be availble through the data nodes, many of which will be installed and maintained by the modeling centers themselves. Certain ESGF data nodes (known as “Tier 1 nodes”) will serve as the primary access points to the data. A searchable record of model output: the access method and metadata, will be “published” to these nodes, and additionally, replicas of the data will be hosted on these nodes.
As part of “publication”, certain conformance checks are performed, metadata are recorded in a catalog where it can be accessed by the other data nodes, and versioning is managed. The data provider (modeling center) will need to closely coordinate and cooperate with the ESGF data manager(s) of a specific ESGF data node site. Here is a summary of the main steps and requirements in the procedure:
Given the wide variety of users and the need for traceability, the CMIP6 results will be fully documented and made accessible via the ES-DOC viewer and comparator interface (https://search.es-doc.org). Each CMIP6 model output file will include a global attribute called “further_info_url” which will link to a signpost web page which will provide simulation/ensemble information, model configuration details, current contact details, data citation details etc. Specifically, ES-DOC will include documentation of:
The CMIP Panel, which is a standing subcommittee of the WCRP’s Working Group on Climate Modeling provides overall guidance and oversight of CMIP activities. Notably it determines which MIPs will participate in each phase of CMIP using the established selection criteria listed in Table 1 of Eyring et al. (2016). On its webpages the CMIP Panel provides additional information that may be of interest to CMIP6 participants, but only the CMIP6 Guide (this document) provides definitive documentation of CMIP6 technical requirements.
The endorsed MIPs are managed by independent committees, but acceptance of endorsement obligates them to follow CMIP’s technical requirements. Thus across all MIPs, the modeling groups can prepare their model output following a common procedure.
The CMIP Panel has delegated responsibility for most of the technical requirements of CMIP to the WGCM Infrastructure Panel (WIP). The mission, rationale and Terms of Reference for the panel can be found here. The WIP has drafted a number of position papers summarizing CMIP6 requirements and specifications. Among these is the CMIP6 reference specifications for global attributes, filenames, directory structure and Data Reference Syntax (DRS). The WIP has also set up a CMIP Data Node Operations Team (CDNOT) to interface with data node managers responsible for serving CMIP6 data. This team provides a direct link from the panels establishing data node requirements to those implementing the requirements. Section 7 provides further information about data node operational requirements.
Information is under preparation describing the governance of the following: