Classification Algorithm for Empirical Orthogonal Functions of Arctic Atmospheric Variability
Classifies model Empirical Orthogonal Functions (EOFs 2–4 of JFM sea-level pressure) as East Atlantic (EA) or Scandinavian (SCA) patterns using three independent methods and a consensus score.
The original code was developed at Los Alamos National Laboratory (LANL) as a contribution to PMP, the PCMDI Metrics Package (Lawrence Livermore National Laboratory; LLNL).
Contributions
Original developers
Martin Velez Pardo (Los Alamos National Laboratory, EES-14)
Alexandra Jonko (Los Alamos National Laboratory, EES-17)
PMP Implementation
Jiwoo Lee (Lawrence Livermore National Laboratory)
Kristin Chang (Lawrence Livermore National Laboratory)
Overview
The script compares each model’s EOF2, EOF3, and EOF4 against observational ground-truth control patterns (EA and SCA) derived from reanalysis data. Three methods independently classify each EOF, and a consensus layer aggregates their results.
Methods
1. Subspace projection (shift-tolerant)
Projects the EA and SCA control patterns onto the model’s EOF subspace (span{EOF2, EOF3, EOF4}) via weighted least squares, allowing for small phase shifts in latitude and longitude. For each EOF, the relative coefficient dominance across both fits determines whether it more closely represents EA or SCA.
2. Correlation + geographic tests
Scores each EOF on a 100-point scale combining four diagnostics: weighted pattern correlation with each control (0–40 points), correlation with the orthogonalised component unique to each pattern (0–40 points), and two geographic tests based on regional pressure anomalies over Greenland, Ireland, and Scandinavia (0–10 points each).
3. K-means clustering
Clusters all EOFs across models into three groups using area-weighted, sign-oriented, row-normalised features. Clusters are labelled EA, SCA, or OTHER by correlation with the controls. Each EOF receives a soft membership score indicating its affinity to each cluster type. Pre-computed cluster centers are saved to a JSON file so that subsequent runs classify new models without needing the full ensemble.
Consensus
For each EOF, the three methods each produce a label (EA or SCA) and a confidence score in [0, 1]. The consensus layer combines these into a single classification:
Label: simple majority vote (2/3 or 3/3 agree). If no majority exists, the label is UNCLASSIFIED.
Confidence: (fraction of methods agreeing) × (mean confidence of agreeing methods).
Quality: derived from the consensus confidence:
Quality
Confidence range
Warning?
robust
≥ 0.7
no
medium
0.5 – 0.7
no
weak
0.3 – 0.5
WARNING
very weak
< 0.3
WARNING
Because model EOFs may span a subspace that mixes the reference patterns, multiple EOFs can receive the same label when their projections are not cleanly separable.
Inputs
The script requires three types of input files, all in netCDF format:
Model EOF files — one file per EOF mode (2, 3, 4) per model, containing a 2-D (lat × lon) spatial pattern. File paths are discovered via glob patterns set in
EOF_GLOBS.Control patterns — EA and SCA reference patterns from a reanalysis (e.g., 20CR-V2). Paths set in
EA_CTRL_FILEandSCA_CTRL_FILE.K-means centers file (JSON, optional on first run) — pre-computed cluster centers. If the file does not exist on first run, the script trains from the full ensemble and saves the centers for future use. To run this part,
`scikit-learn<https://scikit-learn.org/>`__ is required to be installed in your environment. See here for the installation.
[1]:
# Get the sample input data from https://doi.org/10.5281/zenodo.19666396
# 1. Download the file and save it as data.tar.gz
!curl -L "https://zenodo.org/records/19666396/files/data.tar.gz?download=1" -o data.tar.gz
# 2. Extract the contents
!tar -xvzf data.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3378k 100 3378k 0 0 1945k 0 0:00:01 0:00:01 --:--:-- 1944k
data/
data/example_eofs/
data/kmeans_centers_20CR-V2_CMIP6.json
data/SCA_psl_EOF3_JFM_obs_1969-2012.nc
data/EA_psl_EOF2_JFM_obs_1969-2012.nc
data/example_eofs/EOF4_psl_JFM_cmip6_Example-CMCC-ESM2_piControl_r1i1p1f1_mo_atm_1850-2349.nc
data/example_eofs/EOF2_psl_JFM_cmip6_Example-CMCC-ESM2_piControl_r1i1p1f1_mo_atm_1850-2349.nc
data/example_eofs/EOF3_psl_JFM_cmip6_Example-CMCC-ESM2_piControl_r1i1p1f1_mo_atm_1850-2349.nc
Example Usage
[2]:
from pcmdi_metrics.variability_mode.eof_classification import eof_classification
[3]:
eof_classification(
ea_ctrl_file="data/EA_psl_EOF2_JFM_obs_1969-2012.nc",
sca_ctrl_file="data/SCA_psl_EOF3_JFM_obs_1969-2012.nc",
kmeans_centers_file="data/kmeans_centers_20CR-V2_CMIP6.json",
output_root="eof_classification"
)
============================================================
Controls : 20CR-V2
EA : data/EA_psl_EOF2_JFM_obs_1969-2012.nc (×+1)
SCA : data/SCA_psl_EOF3_JFM_obs_1969-2012.nc (×-1)
Domain: lat=(20.0, 90.0), lon=ALL
K-means: data/kmeans_centers_20CR-V2_CMIP6.json (apply saved)
============================================================
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Running with EXAMPLE data (single CMIP6 model).
Results are for demonstration only, not your own models.
To use your own data, set MODEL_EOF_DIR or pass eof_globs=...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[kmeans] loaded centers from data/kmeans_centers_20CR-V2_CMIP6.json (reanalysis=20CR-V2, cmip=CMIP6)
[output] wrote: eof_classification_consensus.tsv
[output] wrote: eof_classification_consensus.txt
[3]:
{'Example-CMCC-ESM2': {2: {'label': 'SCA',
'confidence': np.float64(0.905),
'quality': 'robust',
'methods': {'subspace': {'label': 'SCA', 'confidence': np.float64(0.992)},
'correlation': {'label': 'SCA', 'confidence': 0.725},
'kmeans': {'label': 'SCA', 'confidence': 0.999}}},
3: {'label': 'EA',
'confidence': np.float64(0.8),
'quality': 'robust',
'methods': {'subspace': {'label': 'EA', 'confidence': np.float64(0.763)},
'correlation': {'label': 'EA', 'confidence': 0.651},
'kmeans': {'label': 'EA', 'confidence': 0.987}}},
4: {'label': 'EA',
'confidence': np.float64(0.516),
'quality': 'medium',
'methods': {'subspace': {'label': 'EA', 'confidence': np.float64(0.797)},
'correlation': {'label': 'EA', 'confidence': 0.588},
'kmeans': {'label': 'EA', 'confidence': 0.162}}}}}
Output
Two files are produced in the current directory:
``eof_classification_consensus.tsv`` — tab-separated, machine-readable.
``eof_classification_consensus.txt`` — fixed-width, human-readable.
Each row represents one EOF of one model and contains:
Column |
Description |
|---|---|
Model |
Model name |
EOF |
EOF mode (EOF2, EOF3, or EOF4) |
Subspace_label |
Subspace method classification (EA / SCA / NA) |
Subspace_conf |
Subspace confidence [0, 1] |
Correlation_label |
Correlation method classification |
Correlation_conf |
Correlation confidence [0, 1] |
Kmeans_label |
K-means classification |
Kmeans_conf |
K-means confidence [0, 1] |
Consensus_label |
Majority-vote label (EA / SCA / UNCLASSIFIED) |
Consensus_conf |
Consensus confidence [0, 1] |
Quality |
robust / medium / weak / very weak |
Warning |
WARNING flag if quality is weak or very weak |
[4]:
import pandas as pd
from IPython.display import display, Markdown
# Display note
with open('eof_classification_consensus.tsv') as f:
for line in f:
if line.startswith('#'):
display(Markdown(line.strip('# \n')))
else:
break
# Display the table
df = pd.read_csv('eof_classification_consensus.tsv', sep='\t', comment='#')
df['Warning'] = df['Warning'].fillna('')
display(df)
NOTE: Confidence scores range from 0 to 1; values closer to 1 indicate stronger resemblance to the assigned pattern. The consensus label is by simple majority vote. The consensus confidence = (fraction of methods agreeing) x (mean confidence of agreeing methods). Quality levels: robust >= 0.7, medium >= 0.5, weak >= 0.3, very weak < 0.3. Because model EOFs may span a subspace that mixes the reference patterns, multiple EOFs can receive the same label when their projections are not cleanly separable.
| Model | EOF | Subspace_label | Subspace_conf | Correlation_label | Correlation_conf | Kmeans_label | Kmeans_conf | Consensus_label | Consensus_conf | Quality | Warning | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Example-CMCC-ESM2 | EOF2 | SCA | 0.99 | SCA | 0.72 | SCA | 1.00 | SCA | 0.91 | robust | |
| 1 | Example-CMCC-ESM2 | EOF3 | EA | 0.76 | EA | 0.65 | EA | 0.99 | EA | 0.80 | robust | |
| 2 | Example-CMCC-ESM2 | EOF4 | EA | 0.80 | EA | 0.59 | EA | 0.16 | EA | 0.52 | medium |