Parallel Coordinate Plot: Mean Climate

  • Generate a static image of Parallel coordinate plot using PMP, for mean climate metrics.

  • Author: Jiwoo Lee (2021-07)

  • Last update: 2024-03

1. Read data from JSON files

Input data for parallel coordinate plot is expected as a set a (stacked or list of) 2-d numpy array(s) with list of strings for x and y axes labels.

1.1 Provide PMP output JSON files

[1]:
import glob
import os
import numpy as np
import requests

PMP output files downloadable from the PMP results archive.

[2]:
vars = ['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds', 'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'sfcWind',
        'ta-200', 'ta-850', 'tas', 'tauu', 'ts', 'ua-200', 'ua-850', 'va-200', 'va-850', 'zg-500']

mip = "cmip6"
exp = "historical"
data_version = "v20230823"
json_dir = './json_files'

os.makedirs(json_dir, exist_ok=True)

for var in vars:
    url = "https://raw.githubusercontent.com/PCMDI/pcmdi_metrics_results_archive/main/" + \
          "metrics_results/mean_climate/"+mip+"/"+exp+"/"+data_version+"/"+var+"."+mip+"."+exp+".regrid2.2p5x2p5."+data_version+".json"
    r = requests.get(url, allow_redirects=True)
    filename = os.path.join(json_dir, url.split('/')[-1])
    with open(filename, 'wb') as file:
        file.write(r.content)
    print('Download completed:', filename)
Download completed: ./json_files/pr.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/prw.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/psl.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rlds.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rltcre.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rlus.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rlut.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rlutcs.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rsds.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rsdscs.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rsdt.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rstcre.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rsut.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/rsutcs.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/sfcWind.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/ta-200.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/ta-850.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/tas.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/tauu.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/ts.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/ua-200.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/ua-850.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/va-200.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/va-850.cmip6.historical.regrid2.2p5x2p5.v20230823.json
Download completed: ./json_files/zg-500.cmip6.historical.regrid2.2p5x2p5.v20230823.json

Uncompress PMP output archive file

Check JSON files

[3]:
json_list = sorted(glob.glob(os.path.join(json_dir, '*' + mip + '*' + data_version + '.json')))
for json_file in json_list:
    print(json_file.split('/')[-1])
pr.cmip6.historical.regrid2.2p5x2p5.v20230823.json
prw.cmip6.historical.regrid2.2p5x2p5.v20230823.json
psl.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rlds.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rltcre.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rlus.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rlut.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rlutcs.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rsds.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rsdscs.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rsdt.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rstcre.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rsut.cmip6.historical.regrid2.2p5x2p5.v20230823.json
rsutcs.cmip6.historical.regrid2.2p5x2p5.v20230823.json
sfcWind.cmip6.historical.regrid2.2p5x2p5.v20230823.json
ta-200.cmip6.historical.regrid2.2p5x2p5.v20230823.json
ta-850.cmip6.historical.regrid2.2p5x2p5.v20230823.json
tas.cmip6.historical.regrid2.2p5x2p5.v20230823.json
tauu.cmip6.historical.regrid2.2p5x2p5.v20230823.json
ts.cmip6.historical.regrid2.2p5x2p5.v20230823.json
ua-200.cmip6.historical.regrid2.2p5x2p5.v20230823.json
ua-850.cmip6.historical.regrid2.2p5x2p5.v20230823.json
va-200.cmip6.historical.regrid2.2p5x2p5.v20230823.json
va-850.cmip6.historical.regrid2.2p5x2p5.v20230823.json
zg-500.cmip6.historical.regrid2.2p5x2p5.v20230823.json

1.2 Extract data from JSON files

Use Metrics class (that use read_mean_clim_json_files function underneath) to extract data from the above JSON files.

Parameters

  • json_list: list of string, where each element is for path/file for PMP output JSON files

Returned object includes

  • df_dict: dictionary that has [stat][season][region] hierarchy structure storing pandas dataframe for metric numbers (Rows: models, Columns: variables (i.e., 2d array)

  • var_list: list of string, all variables from JSON files

  • var_unit_list: list of string, all variables and its units from JSON files

  • var_ref_dict: dictonary for reference dataset used for each variable

  • regions: list of string, regions

  • stats: list of string, statistics

[4]:
from pcmdi_metrics.graphics import Metrics
[5]:
library = Metrics(json_list)
[6]:
df_dict = library.df_dict
var_list = library.var_list
var_unit_list = library.var_unit_list
regions = library.regions
stats = library.stats
[7]:
print('var_list:', var_list)
print('var_unit_list:', var_unit_list)
print("len(var_list:", len(var_list))
print('regions:', regions)
print('stats:', stats)
var_list: ['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds', 'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'sfcWind', 'ta-200', 'ta-850', 'tas', 'tauu', 'ts', 'ua-200', 'ua-850', 'va-200', 'va-850', 'zg-500']
var_unit_list: ['pr [kg m-2 s-1]', 'prw [kg m-2]', 'psl [Pa]', 'rlds [W m-2]', 'rltcre [W m-2]', 'rlus [W m-2]', 'rlut [W m-2]', 'rlutcs [W m-2]', 'rsds [W m-2]', 'rsdscs [W m-2]', 'rsdt [W m-2]', 'rstcre [W m-2]', 'rsut [W m-2]', 'rsutcs [W m-2]', 'sfcWind [m s-1]', 'ta-200 [K]', 'ta-850 [K]', 'tas [K]', 'tauu [Pa]', 'ts [K]', 'ua-200 [m s-1]', 'ua-850 [m s-1]', 'va-200 [m s-1]', 'va-850 [m s-1]', 'zg-500 [m]']
len(var_list: 25
regions: ['SHEX_ocean', 'NHEX_land', 'ocean', 'NHEX', 'TROPICS', 'ocean_SHEX', 'land_NHEX', 'SHEX', 'SHEX_land', 'ocean_50S50N', 'global', 'land', 'NHEX_ocean']
stats: ['bias_xy', 'cor_xy', 'mae_xy', 'mean-obs_xy', 'mean_xy', 'rms_devzm', 'rms_xy', 'rms_xyt', 'rms_y', 'rmsc_xy', 'std-obs_xy', 'std-obs_xy_devzm', 'std-obs_xyt', 'std_xy', 'std_xy_devzm', 'std_xyt']
[8]:
df_dict['rms_xyt']['ann']['global']
[8]:
model run model_run pr prw psl rlds rltcre rlus rlut ... ta-200 ta-850 tas tauu ts ua-200 ua-850 va-200 va-850 zg-500
0 ACCESS-CM2 r1i1p1f1 ACCESS-CM2_r1i1p1f1 1.949 126.951 267.209 13.464 9.509 10.893 12.794 ... 2.803 1.616 2.237 0.038 2.368 4.969 1.557 2.097 0.961 26.936
1 ACCESS-ESM1-5 r1i1p1f1 ACCESS-ESM1-5_r1i1p1f1 1.911 127.011 263.105 10.953 8.150 10.449 12.022 ... 2.383 1.294 1.931 0.035 2.049 4.467 1.624 2.151 0.989 27.823
2 AWI-CM-1-1-MR r1i1p1f1 AWI-CM-1-1-MR_r1i1p1f1 1.758 127.072 223.345 11.130 8.484 8.574 9.879 ... 2.048 1.182 1.425 0.029 1.567 3.282 1.422 1.986 0.888 21.041
3 AWI-ESM-1-1-LR r1i1p1f1 AWI-ESM-1-1-LR_r1i1p1f1 2.021 127.125 249.276 14.488 9.693 12.189 12.794 ... 3.767 2.116 1.915 0.034 2.220 4.433 1.825 2.295 1.061 NaN
4 BCC-CSM2-MR r1i1p1f1 BCC-CSM2-MR_r1i1p1f1 1.863 127.155 316.509 13.636 8.158 11.373 11.124 ... 2.457 1.818 2.520 0.037 2.340 4.853 1.974 2.215 1.146 30.647
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
60 NorESM2-MM r1i1p1f1 NorESM2-MM_r1i1p1f1 1.204 127.056 223.097 11.446 6.701 9.952 8.217 ... NaN NaN 1.759 0.030 1.967 3.538 1.467 NaN 0.931 21.945
61 SAM0-UNICON r1i1p1f1 SAM0-UNICON_r1i1p1f1 1.612 126.982 226.741 13.173 9.219 10.983 11.228 ... 3.753 1.444 2.368 0.035 2.512 3.689 1.729 2.170 0.952 24.147
62 TaiESM1 r1i1p1f1 TaiESM1_r1i1p1f1 1.562 126.969 233.954 11.386 8.617 9.574 9.703 ... NaN NaN 2.027 0.040 2.207 NaN 1.618 NaN 0.895 NaN
63 UKESM1-0-LL r1i1p1f2 UKESM1-0-LL_r1i1p1f2 1.749 126.845 256.131 13.240 8.769 10.714 10.896 ... NaN NaN 2.237 0.033 2.346 3.757 1.404 1.905 0.882 NaN
64 UKESM1-1-LL r1i1p1f2 UKESM1-1-LL_r1i1p1f2 1.770 126.928 256.194 10.833 8.865 9.740 10.636 ... 2.333 1.145 1.766 0.033 2.064 3.909 1.407 1.920 0.886 NaN

65 rows × 28 columns

[9]:
# Simple re-order variables
if 'zg-500' in var_list and 'sfcWind' in var_list:
    var_list.remove('zg-500')
    idx_sfcWind = var_list.index('sfcWind')
    var_list.insert(idx_sfcWind+1, 'zg-500')

if 'ta-850' in var_list:
    var_list.remove('ta-850')

print("var_list:", var_list)
print("len(var_list:", len(var_list))
var_list: ['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds', 'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'sfcWind', 'zg-500', 'ta-200', 'tas', 'tauu', 'ts', 'ua-200', 'ua-850', 'va-200', 'va-850']
len(var_list: 24
[10]:
data = df_dict['rms_xyt']['ann']['global'][var_list].to_numpy()
model_names = df_dict['rms_xyt']['ann']['global']['model'].tolist()
#metric_names = ['\n['.join(var_unit.split(' [')) for var_unit in var_unit_list]
metric_names = var_list
models_to_highlight = ['E3SM-1-0', 'E3SM-1-1', 'E3SM-1-1-ECA', 'E3SM-2-0']
print('data.shape:', data.shape)
print('len(metric_names): ', len(metric_names))
print('len(model_names): ', len(model_names))
data.shape: (65, 24)
len(metric_names):  24
len(model_names):  65
[11]:
units_all = 'prw [kg m-2], pr [mm d-1], psl [Pa], rlds [W m-2], rsdscs [W m-2], rltcre [W m-2], rlus [W m-2], rlut [W m-2], rlutcs [W m-2], rsds [W m-2], rsdt [W m-2], rstcre [W m-2], rsus [W m-2], rsut [W m-2], rsutcs [W m-2], sfcWind [m s-1], zg-500 [m], ta-200 [K], ta-850 [K], tas [K], ts [K], ua-200 [m s-1], ua-850 [m s-1], uas [m s-1], va-200 [m s-1], va-850 [m s-1], vas [m s-1], tauu [Pa]'
units_all.split(', ')
var_unit_list = []

for var in var_list:
    found = False
    for var_units in units_all.split(', '):
        tmp1 = var_units.split(' [')[0]
        #print(var, tmp1)
        if tmp1 == var:
            unit = '[' + var_units.split(' [')[1]
            var_unit_list.append(var + '\n' + unit)
            found = True
            break
    if found is False:
        print(var, 'not found')

print('var_unit_list:', var_unit_list)

metric_names = var_unit_list
var_unit_list: ['pr\n[mm d-1]', 'prw\n[kg m-2]', 'psl\n[Pa]', 'rlds\n[W m-2]', 'rltcre\n[W m-2]', 'rlus\n[W m-2]', 'rlut\n[W m-2]', 'rlutcs\n[W m-2]', 'rsds\n[W m-2]', 'rsdscs\n[W m-2]', 'rsdt\n[W m-2]', 'rstcre\n[W m-2]', 'rsut\n[W m-2]', 'rsutcs\n[W m-2]', 'sfcWind\n[m s-1]', 'zg-500\n[m]', 'ta-200\n[K]', 'tas\n[K]', 'tauu\n[Pa]', 'ts\n[K]', 'ua-200\n[m s-1]', 'ua-850\n[m s-1]', 'va-200\n[m s-1]', 'va-850\n[m s-1]']
[12]:
df_dict['rms_xyt']['ann']['global'][var_list].columns
[12]:
Index(['pr', 'prw', 'psl', 'rlds', 'rltcre', 'rlus', 'rlut', 'rlutcs', 'rsds',
       'rsdscs', 'rsdt', 'rstcre', 'rsut', 'rsutcs', 'sfcWind', 'zg-500',
       'ta-200', 'tas', 'tauu', 'ts', 'ua-200', 'ua-850', 'va-200', 'va-850'],
      dtype='object')

2. Plot

[13]:
from pcmdi_metrics.graphics import parallel_coordinate_plot

Parameters

  • data: 2-d numpy array for metrics

  • metric_names: list, names of metrics for individual vertical axes (axis=1)

  • model_names: list, name of models for markers/lines (axis=0)

  • models_to_highlight: list, default=None, List of models to highlight as lines or marker

  • models_to_highlight_by_line: bool, default=True, highlight as lines. If False, as marker

  • models_to_highlight_colors: list, default=None, List of colors for models to highlight as lines

  • models_to_highlight_labels: list, default=None, List of string labels for models to highlight as lines

  • models_to_highlight_markers: list, matplotlib markers for models to highlight if as marker

  • models_to_highlight_markers_size: float, size of matplotlib markers for models to highlight if as marker

  • fig: matplotlib.figure instance to which the parallel coordinate plot is plotted. If not provided, use current axes or create a new one. Optional.

  • ax: matplotlib.axes.Axes instance to which the parallel coordinate plot is plotted. If not provided, use current axes or create a new one. Optional.

  • figsize: tuple (two numbers), default=(15,5), image size

  • show_boxplot: bool, default=False, show box and wiskers plot

  • show_violin: bool, default=False, show violin plot

  • violin_colors: tuple or list containing two strings for colors of violin. Default=(“lightgrey”, “pink”)

  • violin_label: string to label the violin plot, when violin plot is not splited. Default is None.

  • title: string, default=None, plot title

  • identify_all_models: bool, default=True. Show and identify all models using markers

  • xtick_labelsize: number, fontsize for x-axis tick labels (optional)

  • ytick_labelsize: number, fontsize for x-axis tick labels (optional)

  • colormap: string, default=’viridis’, matplotlib colormap

  • num_color: integer, default=20, how many color to use.

  • legend_off: bool, default=False, turn off legend

  • legend_ncol: integer, default=6, number of columns for legend text

  • legend_bbox_to_anchor: tuple, defulat=(0.5, -0.14), set legend box location

  • legend_loc: string, default=”upper center”, set legend box location

  • legend_fontsize: float, default=8, legend font size

  • logo_rect: sequence of float. The dimensions [left, bottom, width, height] of the new Axes. All quantities are in fractions of figure width and height. Optional.

  • logo_off: bool, default=False, turn off PMP logo

  • model_names2: list of string, should be a subset of model_names. If given, violin plot will be split into 2 groups. Optional.

  • group1_name: string, needed for violin plot legend if splited to two groups, for the 1st group. Default is ‘group1’.

  • group2_name: string, needed for violin plot legend if splited to two groups, for the 2nd group. Default is ‘group2’.

  • comparing_models: tuple or list containing two strings for models to compare with colors filled between the two lines.

  • fill_between_lines: bool, default=False, fill color between lines for models in comparing_models

  • fill_between_lines_colors: tuple or list containing two strings of colors for filled between the two lines. Default=(‘red’, ‘green’)

  • arrow_between_lines: bool, default=False, place arrows between two lines for models in comparing_models

  • arrow_between_lines_colors: tuple or list containing two strings of colors for arrow between the two lines. Default=(‘red’, ‘green’)

  • arrow_alpha: float, default=1, transparency of arrow (faction between 0 to 1)

  • vertical_center: string (“median”, “mean”)/float/integer, default=None, adjust range of vertical axis to set center of vertical axis as median, mean, or given number

  • vertical_center_line: bool, default=False, show median as line

  • vertical_center_line_label: str, default=None, label in legend for the horizontal vertical center line. If not given, it will be automatically assigned. It can be turned off by “off”

  • ymax: int or float, default=None, specify value of vertical axis top

  • ymin: int or float, default=None, specify value of vertical axis bottom

Return

  • fig: matplotlib component for figure

  • ax: matplotlib component for axis

[14]:
fig, ax = parallel_coordinate_plot(data, metric_names, model_names, models_to_highlight=models_to_highlight,
                                   title='Mean Climate: RMS_XYT, ANN, Global',
                                   figsize=(21, 7),
                                   colormap='tab20',
                                   xtick_labelsize=10,
                                   logo_rect=[0.8, 0.8, 0.15, 0.15])

fig.text(0.99, -0.45, 'Data version\n'+data_version, transform=ax.transAxes,
         fontsize=12, color='black', alpha=0.6, ha='right', va='bottom',)

# Save figure as an image file
fig.savefig('mean_clim_parallel_coordinate_plot_'+data_version+'.png', facecolor='w', bbox_inches='tight')

# Add Watermark
ax.text(0.5, 0.5, 'Example', transform=ax.transAxes,
        fontsize=100, color='black', alpha=0.2,
        ha='center', va='center', rotation=25)

# Save figure as an image file
fig.savefig('mean_clim_parallel_coordinate_plot_example.png', facecolor='w', bbox_inches='tight')
Passed a quick QC
data.shape: (65, 24)
data.shape: (65, 24)
../_images/examples_parallel_coordinate_plot_mean_clim_20_1.png