Library Digital Collections

Data from: Towards Implementing AI Post-processing in Weather and Climate: Proposed Actions from the Oxford 2019 Workshop

View Collection Items

Collections »

Data from: Towards Implementing AI Post-processing in Weather and Climate: Proposed Actions from the Oxford 2019 Workshop

About this collection


1 digital object.


We present an open-access experimental testbed database of clean weather and climate data on which traditional methods have been implemented, in order to set benchmarking points for the rapid development of new machine learning methods. Descriptions of the clean weather data is included below, but can also be found in the article: "Towards Implementing AI Post-processing in Weather and Climate: Proposed Actions from the Oxford 2019 Workshop"

1 MJO Ensemble Forecasts

We also include a database of climate variability modes identified from six separate operational weather forecast models for more than a decade worth of forecasts. The Madden-Julian Oscillation (MJO - Madden and Julian 1977, 1994), a dominant intraseasonal mode of variability in the Tropics and a significant source of predictability globally on subseasonal timescales, has been identified using statistical techniques on forecast variables. We use the zonal winds at 850 hPa, 200 hPa and outgoing longwave radiation from both the forecast models and observations to diagnose the MJO and evaluate its forecast skill.

2 PNA Ensemble Forecasts

Similarly, to the MJO Ensemble Forecast the Pacific North American pattern which is a large-scale weather pattern over the Pacific Northwest region has been identified using the geopotential height field (Wallace and Gutzler 1981) in both observations and model forecasts. These datasets are provided as benchmark datasets for training post-processing algorithms to improve forecasts of these large scale modes of variability and concomitantly subseasonal forecast skill of other related weather patterns.

3 Global Forecast System (GFS) Integrated Vapor Transport

GFS predictions at a 0.5‐degree horizontal spatial resolution on 64 vertical levels for daily 0000 and 1200 UTC model initializations are utilized to calculate the forecasted magnitude of integrated vapor transport (IVT). IVT is a combined momentum and thermodynamic metric which integrates specific humidity and u and v components of the wind speed from 1000 to 300 hpa. Here we present three forecast lead times of 1-day, 2-days, and 1 week from 2006 to 2018. This includes ~8000 data fields for every forecast lead time or ~24,000 forecasted fields across all lead times. The region of interest spans coastal North America and the Eastern Pacific from 180°W to 110°W longitude, and 10°N to 60°N latitude. IVT from the National Aeronautics and Space Administration's Modern-Era Retrospective Analysis for Research and Applications version 2 (MERRA-2) reanalysis is also packaged and serves as ground truth data. MERRA-2 data are resolved on a 0.625 x 0.5 degree grid and interpolated to 21 pressure levels between 1000 and 300 hpa for IVT calculation (Gelaro et al. 2017). For consistency, GFS predictions are then remapped to this grid resolution using a 1st and second order conservative remapping scheme. Further details can be found in Chapman et al 2019.

4 ECMWF Two-Meter Temperature Ensemble over Germany

An example of short-range forecasts and verifying observations is a dataset of temperature observations at 537 stations over Germany and predictors derived from the ECMWF ensemble prediction system from 2007 to 2016. Predictors are mean and standard deviation of 48-hour ahead 50-member ECMWF ensemble forecasts of temperature and optimally interpolated to station locations. The corresponding observations (valid at 00UTC) are obtained from surface synoptic observations stations operated by the German weather service (DWD). Details (including a list of predictors) are available in Rasp and Lerch (2018).

5 UK surface road conditions

This dataset contains numerical weather prediction forecasts from all models in the UK Met Office's Road Surface Temperature (MORST) forecasting system, along with corresponding road network temperature observations from Highways England. Data are provided for four random sites (location undisclosed) and spans 98 days from mid-December 2018 to late March 2019 on an hourly forecast lead basis from 0 to 168 hours. Ground truth data are provided by the road surface temperature observed at the road network weather station for the concurrent forecast time. The dataset spans 2342 forecasting hours for each of the four sites. Spanning all lead times and owing to the fact that a multitude of forecasts are made for each hour by the time it is observed, the dataset spans over 1.34 million forecasts.

Date Collected
  • 2018 to 2020
Date Issued
  • 2020
Technical Details

The data is contained in Zarr structures ( and the authors suggest the Zarr and Xarray ( Python packages for opening and reading the data.



PNA and MJO dataset collected and collated by the Tigge Museum archive University of Tsukuba, Japan ( The TIGGE Museum is operated for a promotion of utilization of the TIGGE data by Dr. Mio Matsueda. UK surface road conditions contains Highways England and OS data © Crown copyright and database rights [2018-2019].



View formats within this collection

  • English

Identifier: William E. Chapman:

Related Resources