Classification Algorithm for Empirical Orthogonal Functions of Arctic Atmospheric Variability

Classifies model Empirical Orthogonal Functions (EOFs 2–4 of JFM sea-level pressure) as East Atlantic (EA) or Scandinavian (SCA) patterns using three independent methods and a consensus score.

The original code was developed at Los Alamos National Laboratory (LANL) as a contribution to PMP, the PCMDI Metrics Package (Lawrence Livermore National Laboratory; LLNL).

Contributions

Original developers

  • Martin Velez Pardo (Los Alamos National Laboratory, EES-14)

  • Alexandra Jonko (Los Alamos National Laboratory, EES-17)

PMP Implementation

  • Jiwoo Lee (Lawrence Livermore National Laboratory)

  • Kristin Chang (Lawrence Livermore National Laboratory)

Overview

The script compares each model’s EOF2, EOF3, and EOF4 against observational ground-truth control patterns (EA and SCA) derived from reanalysis data. Three methods independently classify each EOF, and a consensus layer aggregates their results.

Methods

1. Subspace projection (shift-tolerant)

Projects the EA and SCA control patterns onto the model’s EOF subspace (span{EOF2, EOF3, EOF4}) via weighted least squares, allowing for small phase shifts in latitude and longitude. For each EOF, the relative coefficient dominance across both fits determines whether it more closely represents EA or SCA.

2. Correlation + geographic tests

Scores each EOF on a 100-point scale combining four diagnostics: weighted pattern correlation with each control (0–40 points), correlation with the orthogonalised component unique to each pattern (0–40 points), and two geographic tests based on regional pressure anomalies over Greenland, Ireland, and Scandinavia (0–10 points each).

3. K-means clustering

Clusters all EOFs across models into three groups using area-weighted, sign-oriented, row-normalised features. Clusters are labelled EA, SCA, or OTHER by correlation with the controls. Each EOF receives a soft membership score indicating its affinity to each cluster type. Pre-computed cluster centers are saved to a JSON file so that subsequent runs classify new models without needing the full ensemble.

Consensus

For each EOF, the three methods each produce a label (EA or SCA) and a confidence score in [0, 1]. The consensus layer combines these into a single classification:

  • Label: simple majority vote (2/3 or 3/3 agree). If no majority exists, the label is UNCLASSIFIED.

  • Confidence: (fraction of methods agreeing) × (mean confidence of agreeing methods).

  • Quality: derived from the consensus confidence:

    Quality

    Confidence range

    Warning?

    robust

    ≥ 0.7

    no

    medium

    0.5 – 0.7

    no

    weak

    0.3 – 0.5

    WARNING

    very weak

    < 0.3

    WARNING

Because model EOFs may span a subspace that mixes the reference patterns, multiple EOFs can receive the same label when their projections are not cleanly separable.

Inputs

The script requires three types of input files, all in netCDF format:

  1. Model EOF files — one file per EOF mode (2, 3, 4) per model, containing a 2-D (lat × lon) spatial pattern. File paths are discovered via glob patterns set in EOF_GLOBS.

  2. Control patterns — EA and SCA reference patterns from a reanalysis (e.g., 20CR-V2). Paths set in EA_CTRL_FILE and SCA_CTRL_FILE.

  3. K-means centers file (JSON, optional on first run) — pre-computed cluster centers. If the file does not exist on first run, the script trains from the full ensemble and saves the centers for future use. To run this part, `scikit-learn <https://scikit-learn.org/>`__ is required to be installed in your environment. See here for the installation.

[1]:
# Get the sample input data from https://doi.org/10.5281/zenodo.19666396

# 1. Download the file and save it as data.tar.gz
!curl -L "https://zenodo.org/records/19666396/files/data.tar.gz?download=1" -o data.tar.gz

# 2. Extract the contents
!tar -xvzf data.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3378k  100 3378k    0     0  1945k      0  0:00:01  0:00:01 --:--:-- 1944k
data/
data/example_eofs/
data/kmeans_centers_20CR-V2_CMIP6.json
data/SCA_psl_EOF3_JFM_obs_1969-2012.nc
data/EA_psl_EOF2_JFM_obs_1969-2012.nc
data/example_eofs/EOF4_psl_JFM_cmip6_Example-CMCC-ESM2_piControl_r1i1p1f1_mo_atm_1850-2349.nc
data/example_eofs/EOF2_psl_JFM_cmip6_Example-CMCC-ESM2_piControl_r1i1p1f1_mo_atm_1850-2349.nc
data/example_eofs/EOF3_psl_JFM_cmip6_Example-CMCC-ESM2_piControl_r1i1p1f1_mo_atm_1850-2349.nc

Example Usage

[2]:
from pcmdi_metrics.variability_mode.eof_classification import eof_classification
[3]:
eof_classification(
    ea_ctrl_file="data/EA_psl_EOF2_JFM_obs_1969-2012.nc",
    sca_ctrl_file="data/SCA_psl_EOF3_JFM_obs_1969-2012.nc",
    kmeans_centers_file="data/kmeans_centers_20CR-V2_CMIP6.json",
    output_root="eof_classification"
)
============================================================
Controls : 20CR-V2
EA  : data/EA_psl_EOF2_JFM_obs_1969-2012.nc  (×+1)
SCA : data/SCA_psl_EOF3_JFM_obs_1969-2012.nc (×-1)
Domain: lat=(20.0, 90.0), lon=ALL
K-means: data/kmeans_centers_20CR-V2_CMIP6.json (apply saved)
============================================================

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Running with EXAMPLE data (single CMIP6 model).
Results are for demonstration only, not your own models.
To use your own data, set MODEL_EOF_DIR or pass eof_globs=...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

[kmeans] loaded centers from data/kmeans_centers_20CR-V2_CMIP6.json (reanalysis=20CR-V2, cmip=CMIP6)
[output] wrote: eof_classification_consensus.tsv
[output] wrote: eof_classification_consensus.txt
[3]:
{'Example-CMCC-ESM2': {2: {'label': 'SCA',
   'confidence': np.float64(0.905),
   'quality': 'robust',
   'methods': {'subspace': {'label': 'SCA', 'confidence': np.float64(0.992)},
    'correlation': {'label': 'SCA', 'confidence': 0.725},
    'kmeans': {'label': 'SCA', 'confidence': 0.999}}},
  3: {'label': 'EA',
   'confidence': np.float64(0.8),
   'quality': 'robust',
   'methods': {'subspace': {'label': 'EA', 'confidence': np.float64(0.763)},
    'correlation': {'label': 'EA', 'confidence': 0.651},
    'kmeans': {'label': 'EA', 'confidence': 0.987}}},
  4: {'label': 'EA',
   'confidence': np.float64(0.516),
   'quality': 'medium',
   'methods': {'subspace': {'label': 'EA', 'confidence': np.float64(0.797)},
    'correlation': {'label': 'EA', 'confidence': 0.588},
    'kmeans': {'label': 'EA', 'confidence': 0.162}}}}}

Output

Two files are produced in the current directory:

  • ``eof_classification_consensus.tsv`` — tab-separated, machine-readable.

  • ``eof_classification_consensus.txt`` — fixed-width, human-readable.

Each row represents one EOF of one model and contains:

Column

Description

Model

Model name

EOF

EOF mode (EOF2, EOF3, or EOF4)

Subspace_label

Subspace method classification (EA / SCA / NA)

Subspace_conf

Subspace confidence [0, 1]

Correlation_label

Correlation method classification

Correlation_conf

Correlation confidence [0, 1]

Kmeans_label

K-means classification

Kmeans_conf

K-means confidence [0, 1]

Consensus_label

Majority-vote label (EA / SCA / UNCLASSIFIED)

Consensus_conf

Consensus confidence [0, 1]

Quality

robust / medium / weak / very weak

Warning

WARNING flag if quality is weak or very weak

[4]:
import pandas as pd
from IPython.display import display, Markdown

# Display note
with open('eof_classification_consensus.tsv') as f:
    for line in f:
        if line.startswith('#'):
            display(Markdown(line.strip('# \n')))
        else:
            break

# Display the table
df = pd.read_csv('eof_classification_consensus.tsv', sep='\t', comment='#')
df['Warning'] = df['Warning'].fillna('')
display(df)

NOTE: Confidence scores range from 0 to 1; values closer to 1 indicate stronger resemblance to the assigned pattern. The consensus label is by simple majority vote. The consensus confidence = (fraction of methods agreeing) x (mean confidence of agreeing methods). Quality levels: robust >= 0.7, medium >= 0.5, weak >= 0.3, very weak < 0.3. Because model EOFs may span a subspace that mixes the reference patterns, multiple EOFs can receive the same label when their projections are not cleanly separable.

Model EOF Subspace_label Subspace_conf Correlation_label Correlation_conf Kmeans_label Kmeans_conf Consensus_label Consensus_conf Quality Warning
0 Example-CMCC-ESM2 EOF2 SCA 0.99 SCA 0.72 SCA 1.00 SCA 0.91 robust
1 Example-CMCC-ESM2 EOF3 EA 0.76 EA 0.65 EA 0.99 EA 0.80 robust
2 Example-CMCC-ESM2 EOF4 EA 0.80 EA 0.59 EA 0.16 EA 0.52 medium