README for High Performance Computing Application I/O Traces

Cite as: Wang, Chen; Snir, Marc; Mohror, Kathryn (2020). High Performance Computing Application I/O Traces. In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0Z899X4

Description:
The dataset comprises trace files from high performance computing (HPC) simulations. The trace files contain records of every I/O operation executed by a simulation application run, including I/O operations from HDF5, MPI-IO, and POSIX and all of the parameters supplied to those operations, e.g. file name, offset, and flags. The traces are generated by executing a simulation application that is linked with the Recorder tracing tool (https://github.com/uiuc-hpc/Recorder). The Recorder trace tool intercepts the I/O calls made by the application, records the I/O trace record, and then calls the intended I/O call so that the operation executes.

Scope And Content:
For each application execution, Recorder generates a set of trace files. The files are in a binary format (in this collection, .ift containing trace records and .mt containing trace metadata) that is optimized to reduce the overhead of trace collection on the application. The tool recorder2text.c in Recorder/tools can be used to parse and generate a human readable text file from the binary traces. There are also scripts in the Recorder/tools/reporter directory that can be used to perform different kinds of analysis on the traces, or can serve as examples of how to parse and analyze the binary trace data. The detailed documentation on how to read and visualize Recorder traces can be found at: https://github.com/uiuc-hpc/Recorder

Version update. On 2021-02-02, trace files from 4 applications (HACC-IO, MILC-QCD, pF3D-IO, and VPIC-IO) were added to this dataset. In addition, runs from 1024 scale processes were added to the data for all 17 applications.

Contents of application trace folders:

1. I/O Traces of FLASH

FLASH4.4
├── 1024p
│   ├── Sedov_2d_ug_fbs
│   └── Sedov_2d_ug_nofbs
└── 64p
    ├── Sedov_2d_ug_fbs
    └── Sedov_2d_ug_nofbs

Sedov_2d_ug_fbs contains the 2D Sedov explosion simulation with fixed block size. Block size is fixed (16x16) for each process. The simulation runs for 100 time steps and checkpoints are written out at a frequency of 20 time steps. HDF5 is used for writing checkpoints and plot files. Using fixed block size enables MPI collective I/O.

Sedov_2d_ug_nofbs contains the 2D Sedov explosion simulation with variable block size. The problem size is 512x512. The simulation runs for 100 time steps and checkpoints are written out at a frequency of 20 time steps. HDF5 is used for writing checkpoints and plot files. This run uses MPI independent MPI I/O.

2. I/O Traces of LAMMPS

LAMMPS-20Mar3
├── 1024p
│   ├── LJ_2D_ADIOS2
│   ├── LJ_2D_HDF5
│   ├── LJ_2D_MPIIO
│   ├── LJ_2D_NetCDF
│   └── LJ_2D_POSIX
└── 64p
    ├── LJ_2D_ADIOS2
    ├── LJ_2D_HDF5
    ├── LJ_2D_MPIIO
    ├── LJ_2D_NetCDF
    └── LJ_2D_POSIX

LJ_2D_HDF5: 2D LJ flow simulation using HDF5 for I/O. The simulation runs 100 steps. Coordinates are written out at every 20 steps using HDF5.

LJ_2D_Adios: 2D LJ flow simulation using Adios2 for I/O. The simulation runs 100 steps. Coordinates are written out at every 20 steps using Adios2.

LD_2D_MPIIO: 2D LJ flow simulation using MPI-I/O for I/O. The simulation runs 100 steps. Coordinates are written out at every 20 steps using MPI-I/O.

LJ_2D_NetCDF: 2D LJ flow simulation using NetCDF for I/O. The simulation runs 100 steps. Coordinates are written out at every 20 steps using NetCDF.

LD_2D_POSIX: 2D LJ flow simulation using POSIX calls for I/O. The simulation runs 100 steps. Coordinates are written out at every 20 steps using POSIX calls.

3. I/O Traces of NWChem

NWChem-6.8.1
├── 1024p
│   └── qmd
└── 64p
    └── h2o_scf

64p/h2o_scf: The Water SCF calculation. The simulation runs for 1000 steps and uses only POSIX for I/O. 

1024p/qmd: 3-Carboxybenzisoxazole Gas-phase Dynamics at 500 K. Only uses POSIX for I/O.

NWChem simulations made a large number of MPI calls which produced huge trace files. Due to space limitation, the uploaded traces include only the POSIX calls.

4. I/O Traces of Chombo

Chombo-3.2.7
├── 1024p
│   └── AMRPoisson_HDF5
└── 64p
    └── AMRPoisson_HDF5

Variable-coefficient AMR Poisson Solve. The initial grid size is 128x128x128. It runs for 100 steps and the final result is saved in a HDF5 file.

5. I/O Traces of Nek5000

Nek5000-v19.0-rc1
├── 1024p
│   └── expansion_POSIX
└── 64p
    └── eddy_uv_POSIX

64p/eddy_uv_POSIX: 2D Eddy solutions in doubly-periodic domain. The simulation runs 100 steps and outputs results at every 20 steps. This run only uses POSIX for I/O. 

1024p/expansion_POSIX: 3D pipe with sudden expansion. The simulation runs 50 steps and outputs at every 25 steps. This run only uses POSIX for I/O.

6. I/O Traces of ParaDiS

ParaDis.v2.5.1.1
├── 1024p
│   ├── Copper_HDF5
│   └── Copper_POSIX
└── 64p
    ├── Copper_HDF5
    └── Copper_POSIX

Copper_HDF5: Simulation of dislocations in a sample copper with HDF5 for I/O. This simulation runs for 100 steps and and output plot files, flux data, and checkpoint files at every 20 steps.

Copper_POSIX: Simulation of dislocations in a sample copper with POSIX for I/O. This simulation runs for 100 steps and and output plot files, flux data, and checkpoint files at every 20 steps.

7. I/O Traces of VASP

VASP-5.4.4/
├── 1024p
│   └── H2O
└── 64p
    ├── GaAs
    └── H2O

GaAs: Simulation of elastic properties and energies for zinc-blende GaAs at a given volume and pressure. VASP does not use high-level I/O libraries. The traces for this simulation include only POSIX calls due to the space limitation.

H2O: Relaxation of an H2O molecule, an example from the VASP wiki (https://www.vasp.at/wiki/index.php/H2O). This simulation made a relatively small number of calls, so MPI calls are included in the trace files.

8. I/O Traces of LBANN

LBANN-1.0.0
├── 1024p
│   └── Autoencoder_cifar10_POSIX
└── 64p
    └── Autoencoder_cifar10_POSIX

Train and test Autoencoder using the Cifar 10 dataset. We run the training process for 2 epoches. Mini batch size is set to 128. Each process uses only one thread for I/O.

9. I/O Traces of GAMESS

GAMESS-0.92
├── 1024p
│   └── dft_etoh_POSIX
└── 64p
    └── dft_etoh_POSIX

The closed shell functional test on a C1 conformer of ethyl alcohol. Only POSIX I/O was used. The output files include N/2 DFT grid files (dft-etoh.F22.*) and N/2 AOINT files (dft-etoh.F08.*), where N is the number of processes (either 64 or 1024).

10. I/O Traces of GTC

GTCv0.92
├── 1024p
│   └── gtc_1024p_POSIX
└── 64p
    └── gtc_64p_POSIX

The NERSC-8 base case with 64 or 1024 MPI processes. Only POSIX I/O was used. Run 100 steps, micell = 100 and mecell = 100.

11. I/O Traces of QMCPACK

QMCPACK-3.9.2
├── 1024p
│   └── H2O
└── 64p
    └── H2O

Simple example of molecular H2O, which does VMC followed by DMC. For both VMC and DMC, we run 100 steps to warm up. For DMC we run a total of 100 steps and output at every 20 steps.

12. I/O Traces of ENZO

enzo-dev-20200723/
├── 1024p
│   └── CollapseTestNonCosmological_HDF5
└── 64p
    └── CollapseTestNonCosmological_HDF5

Traces from a Non-cosmological Collapse test. This simulation runs 5 steps (36 cycles) and outputs data at every step. For each data dump step, a new directory named DD000* is created, and each rank writes to a independent restart file using HDF5.

13. I/O Traces of MILC-QCD

MILC-QCD-7.8.1
├── 1024p
│   └── clover_dynamical
│       ├── save_parallel
│       └── save_serial
└── 64p
    └── clover_dynamical
        ├── save_parallel
        └── save_serial

A build-in example in MILC-QCD code base (https://github.com/milc-qcd/milc_qcd/tree/master/clover_dynamical). For 64 processes run, the mesh size was set to 32x32x32x16, whereas in 1024 processes run, it was 64x64x64x16.

"save_parallel" refers to the output file was wrotten by all processes collaboratively.
"save_serial" refers to that only rank 0 wrote the output file.

14. I/O traces of MACSio

MACSio
├── 1024p
│   ├── Ale3d_silo
│   └── Ares_silo
└── 64p
    └── Ale3d_silo


64p/Ale3d_silo: Simulation of I/O behaviours of ALE3D. 5 dumping steps. Important command line parameters for MACSio: --part_dim 3 --vars_per_part 50 --part_size 100K --part_type unstructured --avg_num_parts 1

1024p/Ale3d_silo: Simulation of I/O behaviours of ALE3D. 2 dumping steps. Important command line parameters for MACSio: --avg_num_parts 1 --part_size 100K --part_type unstructured --part_dim 3 --vars_per_part 50 --num_dumps 2 --parallel_file_mode MIF 8

1024p/Ares_silo: Simulation of I/O behaviours of Ares. 2 dumping steps. Important command line parameters for MACSio --interface silo --avg_num_parts 4 --part_size 100K --part_type rectilinear --part_dim 2 --vars_per_part 200 --num_dumps 2 --parallel_file_mode MIF 8

15. I/O Traces of pF3D-IO

pF3D-IO/
├── 1024p
│   └── pF3D
└── 64p
    └── pF3D

This I/O benchmark mimics the commands issued when pF3D writes a checkpoint dump. pF3D is normally run with one MPI process per core and has been run with over 3 million processes. The total output of our run was about 2GB per process.

16. I/O Traces of VPIC-IO

VPIC-IO-0.1/
├── 1024p
│   ├── nonuni
│   └── uni
└── 64p
    ├── nonuni
    └── uni

VPIC-IO is part of the parallel I/O kernels project (https://code.lbl.gov/projects/piok/). It simulates the I/O behaviour of the VPIC code. VPIC-IO uses the H5Part API to create a file, write eight variables, and close the file.

uni: Uses H5Part calls to perform file writes, which internally calls HDF5. "uni" refers to each MPI process writing same number of particles.

nonuni: Uses H5Part calls to perform file writes, which internally calls HDF5. "nonuni" refers to each MPI process writing	different number of particles.

17. I/O Traces of HACC-IO

HACC-IO-1.0/
├── 1024p
│   ├── MPI-Indep
│   ├── MPI-Shared
│   └── POSIX
└── 64p
    ├── MPI-Indep
    ├── MPI-Shared
    └── POSIX


HACC-IO Benchmark (https://github.com/glennklockwood/hacc-io). Each rank wrote and then read numparticles * 38 bytes worth of data. Both MPI and POSIX can be used for I/O.