HDF5 utilities
my own Fortran high level interface for HDF5
|
This library is to provide a high level interface into HDF5. This is, of course, based off HDF5's own High-Level module (HDLT) but customized to my own needs.
The aim of this library is to:
This is just a short summary to provide a basic understandig of the module. For a more detailed explaination, look at the code comments. If you have doxygen installed, make docs
will produce an html output in docs/html/index.html
.
subroutine | inputs | description |
---|---|---|
hdf_open_file | file_id , filename , STATUS , ACTION | Opens file and return identifier |
hdf_close_file | file_id | Closes a hdf5 file |
hdf_write_dataset | loc_id , dset_name , data | write out array (or scalar) |
hdf_read_dataset | loc_id , dset_name , data | read in array (or scalar) |
hdf_create_dataset | loc_id , dset_name , dset_dims , dset_type | creates an empty dataset |
hdf_write_vector_to_dataset | loc_id , dset_name , offset , vector | write a vector to leading edge of dataset |
hdf_read_vector_from_dataset | loc_id , dset_name , offset , vector | read a 1d array from from leading edge of dataset |
hdf_write_attribute | loc_id , obj_name , attr_name , data | write out attribute array (or scalar) |
hdf_read_attribute | loc_id , obj_name , attr_name , data | read in attribute array (or scalar) |
hdf_create_group | loc_id , group_name | Create a new group |
hdf_open_group | loc_id , group_name , group_id | Opens a group and returns the identifier |
hdf_close_group | group_id | Close a group by identifier |
hdf_get_rank | loc_id , dset_name , rank | Get the rank of a dataset |
hdf_get_dims | loc_id , dset_name , dims | get the dimensions of a dataset |
hdf_exists | loc_id , obj_name | Check if location exists |
hdf_set_print_messages | val_print_messages | Sets the value of hdf_print_messages |
Here is a simple example of writing to a new HDF5 file:
The rank, dimension, and datatypes of the datasets in the HDF5 file correspond to the arrays passed to it.
Here is a simple examle of reading from an HDF5 file:
The rank, dimension, and datatypes array should match that of the datasets in the HDF5 file, otherwise the HDF5 library will bring up and error.
Note that HDF5 will convert the datatype on reading. So based on the above example
This example will read in from a double precision dataset into an integer, real, and double precision array. Note that the integer array will be the floor of the double/real array.
For writing, just convert the data array before writing using the built in int & real functions
An advanced and very useful feature of the HDF5 library is chunking and compression. I have implemented a subset of this that I think is enough for my needs. When writing the dataset, just pass the optional filter parameter (options are 'none', 'szip', 'gzip', and 'gzip+shuffle'). Also, optionally there is the chucks parameter, which defined the size of data chunks in the dataset. A great thing is, like converting datatypes, when reading no additional care needs to be taken because the HDF5 library will sort it out for you (see note below):
Note: that the efficiency of the compression and the writing/reading speed depend heavily on you data and you use cases. For instance, one of my uses is to write out 3D GCM data, and then read in select vertical profiles for an interpolation. When I chunk by vertical profile (1,1,103), the compression was very inefficient. When I chunk by horizontal slab (90,45,1), the compression is the about same as if I just chunk the whole set together. So I have to determine for myself the tradoff between storage and speed. (In my case I chunk by horizontal slab so that each chunk is below the 1MB cache size, and read in a whole dataset and do multiple interpolations, until moving onto the next dataset).
Attribute are just like data sets, but they can be attached to datasets, group, or the file root. The suggested use is for small items like the code version or run parameters:
The rank, dimension, and datatypes of the attribute in the HDF5 file correspond to the arrays passed to it. Further, there is the hdf5_utils::hdf_read_attribute
subroutine.
Groups are a nice way to organize datasets and attributes within the HDF5 file. They work like directories/folder on a computer. When using group, you can access their contents using absolute or relative paths.
Notice in the relative path case, we can pass the group_id
to read/write subroutines instead of the file_id
. This can be used for convenient access to nested objects.
There are two subroutine, hdf_get_rank
and hdf_get_dims
, to get the rank (tensor rank, number of dimensions) of a dataset and the dimensions of a dataset. These can be use, for example, to allocate an array before reading in the data
It is sometimes convenient to write/read only single column of a multidimensional array.
NOTE: FORTRAN is a column oriented programming language. This means that elements of each column is stored contiguously in memory, and therefore has fastest access. The HDF5 library is written in C, which is a row oriented language. Since the FORTRAN library is just a wrapper around the C library, the array written out to the HDF5 file is the transpose of the array we see in FORTRAN. So we are actually reading in a 'row' along the last dimension in the dataset in the HDF5 file as a column vector in FORTRAN. Keep this in mind if you read access this data in another language, like C or Python.
For instance, if a 2d array is a set of data vectors that we wish to perform a long operation on, then we can read in one vector at a time and treat it before moving onto the next one.
Copyright (c) 2017-2019 Justin Erwin Copyright (c) 2020 Han Luo