Dataset Title:
Computational Fluid Dynamics Simulation Data of Spatial Deposition

Please cite as:
Fernandez-Godino, M. Giselle; Lucas, Donald D.; Gunawardena, Nipun (2023). Computational Fluid Dynamics Simulation Data of Spatial Deposition. In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0D50N50

Corresponding author:
M. Giselle Fernández-Godino
fernandez48@llnl.gov

Primary associated publication:
Fernández-Godino, M.G., Lucas, D.D. & Kong, Q. Predicting wind-driven spatial deposition through simulated color images using deep autoencoders. Sci Rep 13, 1394 (2023). https://doi.org/10.1038/s41598-023-28590-4

Description of contents:
The dataset consists of two folders. The files used for training are stored inside the folder "train." The files used for testing are stored inside the folder "test." 
There are 16,000 simulations in total, divided into 15,000 training cases and 1,000 test cases. 
The file “inputs_15k_train.npy” contains a matrix of shape (15000,4), where the rows correspond to the first 15,000 simulations and columns are s_x,s_y,w_u,w_v (source location in x, source location in y, wind speed in x, and wind speed in y). Similarly, the file “inputs_1k_test.npy” contains a matrix of shape (1000,4), where the rows correspond to the last 1,000 simulations, and columns are also s_x,s_y,w_u,w_v.
The file “RGB_deposition_15k_train.npy” contains a matrix of shape (15000,1000,1000,3) corresponding to (training case, height, width, RGB), respectively. Similarly, the file “RGB_deposition_1k_test.npy” contains a matrix of shape (1000,1000,1000,3) corresponding to (test case, height, width, RGB), respectively.


Methods:
This dataset’s physics problem is a two-dimensional, spatial pattern formed from a pollutant that has been released into the atmosphere and dispersed for up to an hour while undergoing deposition to the surface. The pollutant’s release location (s_x,s_y) is assumed to occur anywhere in a two-dimensional domain of 5000 m × 5000 m. The release is initialized from a small bubble that is centered five meters above the surface, has a radius of five meters, and has internal momentum that causes it to expand radially and rise to a height of about 100 meters within the initial minute of simulation time. The same bubble source was used for all the simulations as a simplification. Only the (s_x,s_y) coordinates of the locations of the bubble source are relevant. All the realizations used unit mass releases, and the resulting deposition patterns can be scaled proportionately for other mass amounts. The time scale of the simulated data represents the cumulative mass deposited on the surface for one hour. The pollutant is blown in a direction controlled by the large-scale atmospheric inflow winds expressed as wind speed (w_s), which varies from 0.5 to 15 m/s, and wind direction (w_d), which can be anywhere in the interval [0,360) degrees following standard mathematical convention. The files “inputs_15k_train.npy” and “inputs_1k_test.npy”, however, includes w_u=w_s cos〖(w〗_d) and w_v=w_s sin〖(w〗_d), the wind velocity components projected onto the x and y axes. We assume that the spatial patterns were collected by a hypothetical imaging device that records the magnitude of the logarithm of deposition as a red, green, and blue (RGB) color image with channels containing integer values ranging from 0 to 255. The goal is to predict a deposition image given its associated release location and wind velocity (four scalar quantities). In other words, we are interested in the following mapping: [s_x,s_y,w_u,w_v]→[height×width×RGB channel]. See [1].

The data is obtained from simulations and later post-processed to make it adequate for machine learning training. Given large-scale winds as an inflow boundary condition, the CFD code Aeolus [2] uses millions of grid cells to simulate fluid flow and material transport in complex, three-dimensional environments at high resolution, accounting for turbulence from structures, terrain features, and obstacles and predicting deposition on the ground and other surfaces. Megapixel deposition images were obtained by processing the output of Aeolus simulations, which were run using a resolution of (x,y,z)=1000×1000×100 cells, each cell representing 5 m × 5 m × 5 m. Within Aeolus, pollutant concentration and deposition values are calculated by releasing and transporting Lagrangian particles of specified masses and sizes within the flow field. Particles that intersect the ground or other surfaces through turbulence or gravitational settling are removed from the atmosphere and recorded as deposition having units of mass per area. The releases were modeled as small, rising bubbles of mass carried by the winds about a minute into the simulations. Note that the actual deposition values are not given in this dataset. The entire dataset, created by running Aeolus multiple times, contains 16,000 deposition images. The data images are stored as [number of images, height, width, RGB channels]= [16,000, 1000, 1000, 3]. Each megapixel image shows the spatial deposition pattern of a unique release scenario in Aeolus changing source location and inflow wind, [s_x,s_y,w_u,w_v], using Latin hypercube sampling technique within the design of experiment. The data can potentially be augmented for different wind directions by rotating the spatial plume pattern to predict deposition patterns. This augmentation is not always possible in practice due to terrain-based asymmetries in transport and dispersion. The Python rainbow colormap is used to create the RGB images for training and testing the autoencoder. As previously noted, RGB pixel colors are associated with the logarithm of the deposition values.

References:
[1] Fernández-Godino, M. G., Lucas, D. D., & Kong, Q. Predicting wind-driven spatial deposition through simulated color images using deep autoencoders. Scientific Reports, 2023 13(1), 1394, https://doi.org/10.1038/s41598-023-28590-4
[2] Gowardhan, A., D. McGuffin, D. D. Lucas, S. Neuscamman, O. Alvarez, and L. Glascoe, Large Eddy Simulations of Turbulent and Buoyant Flows in Urban and Complex Terrain Areas Using the Aeolus Model, Atmosphere 2021, 12(9), 1107, https://doi.org/10.3390/atmos12091107.

Data dictionary:
Each *.npy file contains an array. The train and test input file arrays have a shape of (15000,4) and (1000,4), respectively. The train and test RGB file (output) arrays have a shape of (15000,1000,1000,3) and (1000,1000,1000,3), respectively. 

Technical details:
Python 3.9.15
Numpy 1.23.4

License:
Creative Commons Attribution 4.0 International Public License.