Parallel Coordinate Plot: Mean Climate, comparing CMIP5 & CMIP6 models
Generate a static image of Parallel coordinate plot for PMP mean climate metrics obtained from CMIP5 and CMIP6 models, and compare multi-model averaged statistics from each group.
Written by Jiwoo Lee (LLNL/PCMDI)
Last update: September 2022
1. Read data from JSON files
Input data for parallel coordinate plot is expected as a set a (stacked or list of) 2-d numpy array(s) with list of strings for x and y axes labels.
1.1 Download PMP output JSON files for CMIP models
[1]:
import glob
import os
import numpy as np
import requests
import pandas as pd
from pcmdi_metrics.graphics import download_archived_results
PMP output files downloadable from the PMP results archive.
[2]:
vars = ['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds', 'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'sfcWind',
'ta-200', 'ta-850', 'tas', 'tauu', 'ts', 'ua-200', 'ua-850', 'va-200', 'va-850', 'zg-500']
[3]:
json_dir = './json_files'
[4]:
mip = "cmip5"
exp = "historical"
data_version = "v20200429"
[5]:
for var in vars:
path = "metrics_results/mean_climate/"+mip+"/"+exp+"/"+data_version+"/"+var+"."+mip+"."+exp+".regrid2.2p5x2p5."+data_version+".json"
download_archived_results(path, json_dir)
metrics_results/mean_climate/cmip5/historical/v20200429/prw.cmip5.historical.regrid2.2p5x2p5.v20200429.json not exist in https://github.com/PCMDI/pcmdi_metrics_results_archive/tree/main/
metrics_results/mean_climate/cmip5/historical/v20200429/rsdscs.cmip5.historical.regrid2.2p5x2p5.v20200429.json not exist in https://github.com/PCMDI/pcmdi_metrics_results_archive/tree/main/
metrics_results/mean_climate/cmip5/historical/v20200429/ts.cmip5.historical.regrid2.2p5x2p5.v20200429.json not exist in https://github.com/PCMDI/pcmdi_metrics_results_archive/tree/main/
[6]:
mip = "cmip6"
exp = "historical"
data_version = "v20210811"
[7]:
for var in vars:
path = "metrics_results/mean_climate/"+mip+"/"+exp+"/"+data_version+"/"+var+"."+mip+"."+exp+".regrid2.2p5x2p5."+data_version+".json"
download_archived_results(path, json_dir)
Check JSON files
[8]:
json_list_1 = sorted(glob.glob(os.path.join(json_dir, '*.cmip5.' + exp + '*' + '.json')))
json_list_2 = sorted(glob.glob(os.path.join(json_dir, '*.cmip6.' + exp + '*' + '.json')))
[9]:
print('CMIP5 JSON files:')
for i, json_file in enumerate(json_list_1):
print(i+1, json_file.split('/')[-1])
print('CMIP6 JSON files:')
for i, json_file in enumerate(json_list_2):
print(i+1, json_file.split('/')[-1])
CMIP5 JSON files:
1 pr.cmip5.historical.regrid2.2p5x2p5.v20200429.json
2 psl.cmip5.historical.regrid2.2p5x2p5.v20200429.json
3 rlds.cmip5.historical.regrid2.2p5x2p5.v20200429.json
4 rltcre.cmip5.historical.regrid2.2p5x2p5.v20200429.json
5 rlus.cmip5.historical.regrid2.2p5x2p5.v20200429.json
6 rlut.cmip5.historical.regrid2.2p5x2p5.v20200429.json
7 rlutcs.cmip5.historical.regrid2.2p5x2p5.v20200429.json
8 rsds.cmip5.historical.regrid2.2p5x2p5.v20200429.json
9 rsdt.cmip5.historical.regrid2.2p5x2p5.v20200429.json
10 rstcre.cmip5.historical.regrid2.2p5x2p5.v20200429.json
11 rsut.cmip5.historical.regrid2.2p5x2p5.v20200429.json
12 rsutcs.cmip5.historical.regrid2.2p5x2p5.v20200429.json
13 sfcWind.cmip5.historical.regrid2.2p5x2p5.v20200429.json
14 ta-200.cmip5.historical.regrid2.2p5x2p5.v20200429.json
15 ta-850.cmip5.historical.regrid2.2p5x2p5.v20200429.json
16 tas.cmip5.historical.regrid2.2p5x2p5.v20200429.json
17 tauu.cmip5.historical.regrid2.2p5x2p5.v20200429.json
18 ua-200.cmip5.historical.regrid2.2p5x2p5.v20200429.json
19 ua-850.cmip5.historical.regrid2.2p5x2p5.v20200429.json
20 va-200.cmip5.historical.regrid2.2p5x2p5.v20200429.json
21 va-850.cmip5.historical.regrid2.2p5x2p5.v20200429.json
22 zg-500.cmip5.historical.regrid2.2p5x2p5.v20200429.json
CMIP6 JSON files:
1 pr.cmip6.historical.regrid2.2p5x2p5.v20210811.json
2 prw.cmip6.historical.regrid2.2p5x2p5.v20210811.json
3 psl.cmip6.historical.regrid2.2p5x2p5.v20210811.json
4 rlds.cmip6.historical.regrid2.2p5x2p5.v20210811.json
5 rltcre.cmip6.historical.regrid2.2p5x2p5.v20210811.json
6 rlus.cmip6.historical.regrid2.2p5x2p5.v20210811.json
7 rlut.cmip6.historical.regrid2.2p5x2p5.v20210811.json
8 rlutcs.cmip6.historical.regrid2.2p5x2p5.v20210811.json
9 rsds.cmip6.historical.regrid2.2p5x2p5.v20210811.json
10 rsdscs.cmip6.historical.regrid2.2p5x2p5.v20210811.json
11 rsdt.cmip6.historical.regrid2.2p5x2p5.v20210811.json
12 rstcre.cmip6.historical.regrid2.2p5x2p5.v20210811.json
13 rsut.cmip6.historical.regrid2.2p5x2p5.v20210811.json
14 rsutcs.cmip6.historical.regrid2.2p5x2p5.v20210811.json
15 sfcWind.cmip6.historical.regrid2.2p5x2p5.v20210811.json
16 ta-200.cmip6.historical.regrid2.2p5x2p5.v20210811.json
17 ta-850.cmip6.historical.regrid2.2p5x2p5.v20210811.json
18 tas.cmip6.historical.regrid2.2p5x2p5.v20210811.json
19 tauu.cmip6.historical.regrid2.2p5x2p5.v20210811.json
20 ts.cmip6.historical.regrid2.2p5x2p5.v20210811.json
21 ua-200.cmip6.historical.regrid2.2p5x2p5.v20210811.json
22 ua-850.cmip6.historical.regrid2.2p5x2p5.v20210811.json
23 va-200.cmip6.historical.regrid2.2p5x2p5.v20210811.json
24 va-850.cmip6.historical.regrid2.2p5x2p5.v20210811.json
25 zg-500.cmip6.historical.regrid2.2p5x2p5.v20210811.json
1.2 Extract data from JSON files
Use Metrics
class (that use read_mean_clim_json_files
function underneath) to extract data from the above JSON files.
Parameters
json_list
: list of string, where each element is for path/file for PMP output JSON files
Returned object includes
df_dict
: dictionary that has[stat][season][region]
hierarchy structure storing pandas dataframe for metric numbers (Rows: models, Columns: variables (i.e., 2d array)var_list
: list of string, all variables from JSON filesvar_unit_list
: list of string, all variables and its units from JSON filesvar_ref_dict
: dictonary for reference dataset used for each variableregions
: list of string, regionsstats
: list of string, statistics
[10]:
from pcmdi_metrics.graphics import Metrics
[11]:
library_cmip5 = Metrics(json_list_1, mip="cmip5")
[12]:
library_cmip6 = Metrics(json_list_2, mip="cmip6")
[13]:
season = 'ann'
stat = 'rms_xyt'
region = 'global'
Select only certain models (optional)
[14]:
selected_models = ['ACCESS', 'BCC', 'CESM', 'Can', 'FGOALS', 'FIO-ESM', 'GFDL', 'IPSL', 'MIROC', 'MPI', 'MRI', 'NorESM']
def selected_models_only(df, selected_models):
# Selected models only
model_names = df['model'].tolist()
for model_name in model_names:
drop_model = True
for keyword in selected_models:
if keyword in model_name:
drop_model = False
break
if drop_model:
df.drop(df.loc[df['model']==model_name].index, inplace=True)
df.reset_index(drop=True, inplace=True)
return df
# Selected models only
library_cmip5.df_dict[stat][season][region] = selected_models_only(library_cmip5.df_dict[stat][season][region], selected_models)
library_cmip6.df_dict[stat][season][region] = selected_models_only(library_cmip6.df_dict[stat][season][region], selected_models)
Merge data
[15]:
# merge dataframes
combined = library_cmip5.merge(library_cmip6)
Add rows of multi-model average statistics for each CMIP group
[16]:
# mean value of statistics from multi models in each CMIP
combined.df_dict[stat][season][region].loc['CMIP5 mean'] = library_cmip5.df_dict[stat][season][region].mean(numeric_only=True, skipna=True)
combined.df_dict[stat][season][region].loc['CMIP6 mean'] = library_cmip6.df_dict[stat][season][region].mean(numeric_only=True, skipna=True)
combined.df_dict[stat][season][region].at['CMIP5 mean', 'model'] = 'CMIP5 mean'
combined.df_dict[stat][season][region].at['CMIP6 mean', 'model'] = 'CMIP6 mean'
Customize variables to show
[17]:
var_list = sorted(combined.var_list)
# temporary
var_list.remove('sfcWind')
var_list.remove('ta-850')
var_list.remove('ua-850')
var_list.remove('ua-200')
var_list.remove('va-850')
var_list.remove('va-200')
var_list.remove('tauu')
print('var_list:', var_list)
var_list: ['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds', 'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'ta-200', 'tas', 'ts', 'zg-500']
Getting ready for plotting…
[18]:
data = combined.df_dict[stat][season][region][var_list].to_numpy()
model_names = combined.df_dict[stat][season][region]['model'].tolist()
metric_names = var_list
models_to_highlight = ['CMIP5 mean', 'CMIP6 mean']
print('data.shape:', data.shape)
print('len(metric_names): ', len(metric_names))
print('len(model_names): ', len(model_names))
data.shape: (46, 18)
len(metric_names): 18
len(model_names): 46
Add unit info
[19]:
units_all = 'prw [kg m-2], pr [mm d-1], psl [Pa], rlds [W m-2], rsdscs [W m-2], rltcre [W m-2], rlus [W m-2], rlut [W m-2], rlutcs [W m-2], rsds [W m-2], rsdt [W m-2], rstcre [W m-2], rsus [W m-2], rsut [W m-2], rsutcs [W m-2], sfcWind [m s-1], zg-500 [m], ta-200 [K], ta-850 [K], tas [K], ts [K], ua-200 [m s-1], ua-850 [m s-1], uas [m s-1], va-200 [m s-1], va-850 [m s-1], vas [m s-1], tauu [Pa]'
units_all.split(', ')
var_unit_list = []
for var in var_list:
found = False
for var_units in units_all.split(', '):
tmp1 = var_units.split(' [')[0]
#print(var, tmp1)
if tmp1 == var:
unit = '[' + var_units.split(' [')[1]
var_unit_list.append(var + '\n' + unit)
found = True
break
if found is False:
print(var, 'not found')
print('var_list:', var_list)
print('var_unit_list:', var_unit_list)
metric_names = var_unit_list
var_list: ['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds', 'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'ta-200', 'tas', 'ts', 'zg-500']
var_unit_list: ['pr\n[mm d-1]', 'prw\n[kg m-2]', 'psl\n[Pa]', 'rlds\n[W m-2]', 'rltcre\n[W m-2]', 'rlus\n[W m-2]', 'rlut\n[W m-2]', 'rlutcs\n[W m-2]', 'rsds\n[W m-2]', 'rsdscs\n[W m-2]', 'rsdt\n[W m-2]', 'rstcre\n[W m-2]', 'rsut\n[W m-2]', 'rsutcs\n[W m-2]', 'ta-200\n[K]', 'tas\n[K]', 'ts\n[K]', 'zg-500\n[m]']
2. Plot
[20]:
from pcmdi_metrics.graphics import parallel_coordinate_plot
Parameters
data
: 2-d numpy array for metricsmetric_names
: list, names of metrics for individual vertical axes (axis=1)model_names
: list, name of models for markers/lines (axis=0)models_to_highlight
: list, default=None, List of models to highlight as linesfig
:matplotlib.figure
instance to which the parallel coordinate plot is plotted. If not provided, use current axes or create a new one. Optional.ax
:matplotlib.axes.Axes
instance to which the parallel coordinate plot is plotted. If not provided, use current axes or create a new one. Optional.figsize
: tuple (two numbers), default=(15,5), image sizeshow_boxplot
: bool, default=False, show box and wiskers plotshow_violin
: bool, default=False, show violin plotviolin_colors
: tuple or list containing two strings for colors of violin. Default=(“lightgrey”, “pink”)title
: string, default=None, plot titleidentify_all_models
: bool, default=True. Show and identify all models using markersxtick_labelsize
: number, fontsize for x-axis tick labels (optional)ytick_labelsize
: number, fontsize for x-axis tick labels (optional)colormap
: string, default=’viridis’, matplotlib colormapnum_color
: integer, default=20, how many color to use.legend_off
: bool, default=False, turn off legendlogo_rect
: sequence of float. The dimensions [left, bottom, width, height] of the new Axes. All quantities are in fractions of figure width and height. Optional.logo_off
: bool, default=False, turn off PMP logomodel_names2
: list of string, should be a subset ofmodel_names
. If given, violin plot will be split into 2 groups. Optional.group1_name
: string, needed for violin plot legend if splited to two groups, for the 1st group. Default is ‘group1’.group2_name
: string, needed for violin plot legend if splited to two groups, for the 2nd group. Default is ‘group2’.comparing_models
: tuple or list containing two strings for models to compare with colors filled between the two lines.fill_between_lines
: bool, default=False, fill color between lines for models in comparing_modelsfill_between_lines_colors
: tuple or list containing two strings for colors filled between the two lines. Default=(‘green’, ‘red’)
Return
fig
: matplotlib component for figureax
: matplotlib component for axis
[21]:
cmip6_models = library_cmip6.df_dict[stat][season][region]['model'].tolist() + ['CMIP6 mean']
cmip6_models
[21]:
['ACCESS-CM2',
'ACCESS-ESM1-5',
'BCC-CSM2-MR',
'BCC-ESM1',
'CESM2',
'CESM2-FV2',
'CESM2-WACCM',
'CESM2-WACCM-FV2',
'CanESM5',
'FGOALS-f3-L',
'FGOALS-g3',
'FIO-ESM-2-0',
'GFDL-CM4',
'GFDL-ESM4',
'IPSL-CM6A-LR',
'MIROC6',
'MPI-ESM-1-2-HAM',
'MPI-ESM1-2-HR',
'MPI-ESM1-2-LR',
'MRI-ESM2-0',
'NorESM2-MM',
'CMIP6 mean']
[22]:
title = 'Mean Climate: Spatio-temporal performance (' + stat.upper() + '), ' + season.upper() + ', ' + region.title()
fig, ax = parallel_coordinate_plot(data, metric_names, model_names, models_to_highlight,
title=title,
figsize=(21, 7),
colormap='tab20',
xtick_labelsize=10,
logo_rect=[0.8, 0.8, 0.15, 0.15],
show_violin=True,
model_names2=cmip6_models,
group1_name='CMIP5 models',
group2_name='CMIP6 models',
show_boxplot=False,
)
# Add Watermark
ax.text(0.5, 0.5, 'Example', transform=ax.transAxes,
fontsize=100, color='black', alpha=0.2,
ha='center', va='center', rotation=25)
Passed a quick QC
Models in the second group: ['ACCESS-CM2', 'ACCESS-ESM1-5', 'BCC-CSM2-MR', 'BCC-ESM1', 'CESM2', 'CESM2-FV2', 'CESM2-WACCM', 'CESM2-WACCM-FV2', 'CanESM5', 'FGOALS-f3-L', 'FGOALS-g3', 'FIO-ESM-2-0', 'GFDL-CM4', 'GFDL-ESM4', 'IPSL-CM6A-LR', 'MIROC6', 'MPI-ESM-1-2-HAM', 'MPI-ESM1-2-HR', 'MPI-ESM1-2-LR', 'MRI-ESM2-0', 'NorESM2-MM', 'CMIP6 mean']
data.shape: (46, 18)
data.shape: (46, 18)
[22]:
Text(0.5, 0.5, 'Example')
[23]:
# Save figure as an image file
fig.savefig('mean_clim_parallel_coordinate_plot_cmip56.png', facecolor='w', bbox_inches='tight')
[24]:
title = 'Mean Climate: Spatio-temporal performance (' + stat.upper() + '), ' + season.upper() + ', ' + region.title()
fig, ax = parallel_coordinate_plot(data, metric_names, model_names, models_to_highlight,
title=title,
figsize=(21, 7),
colormap='tab20',
xtick_labelsize=10,
logo_rect=[0.8, 0.8, 0.15, 0.15],
show_violin=False,
model_names2=cmip6_models,
group1_name='CMIP5 models',
group2_name='CMIP6 models',
show_boxplot=False,
comparing_models=('CMIP5 mean', 'CMIP6 mean'),
fill_between_lines=True,
fill_between_lines_colors=('lightgreen', 'pink')
)
# Add Watermark
ax.text(0.5, 0.5, 'Example', transform=ax.transAxes,
fontsize=100, color='black', alpha=0.2,
ha='center', va='center', rotation=25)
Passed a quick QC
Models in the second group: ['ACCESS-CM2', 'ACCESS-ESM1-5', 'BCC-CSM2-MR', 'BCC-ESM1', 'CESM2', 'CESM2-FV2', 'CESM2-WACCM', 'CESM2-WACCM-FV2', 'CanESM5', 'FGOALS-f3-L', 'FGOALS-g3', 'FIO-ESM-2-0', 'GFDL-CM4', 'GFDL-ESM4', 'IPSL-CM6A-LR', 'MIROC6', 'MPI-ESM-1-2-HAM', 'MPI-ESM1-2-HR', 'MPI-ESM1-2-LR', 'MRI-ESM2-0', 'NorESM2-MM', 'CMIP6 mean']
data.shape: (46, 18)
data.shape: (46, 18)
[24]:
Text(0.5, 0.5, 'Example')