P. Greenfield, M. Droettboom, E. Bray. ASDF: A new data format for astronomy, Astronomy and Computing, Available online 26 June 2015.
We present the case for developing a successor format for the immensely successful FITS format. We first review existing alternative formats and discuss why we do not believe they provide an adequate solution. The proposed format is called the Advanced Scientific Data Format (ASDF) and is based on an existing text format, YAML, that we believe removes most of the current problems with the FITS format. An overview of the capabilities of the new format is given along with specific examples. This format has the advantage that it doesn’t limit the size of attribute names (akin to FITS keyword names) nor place restrictions on the size or type of values attributes have. Hierarchical relationships are explicit in the syntax and require no special conventions. Finally, it is capable of storing binary data within the file in its binary form. At its basic level, the format proposed has much greater applicability than for just astronomical data.
Keywords: FITS; File formats; Standards; World coordinate system
Note that this article appears to be part of a larger data formats issue being put together by Astronomy & Computing. See also Thomas et al 2015, 'Learning from FITS: Limitations in use in modern astronomical research'
It's excellent that people are thinking about a new common file format for astronomy. FITS is currently running up against multiple usability problems, particularly regarding the expression of metadata and world coordinate systems.
As someone who's interested, but hasn't actually worked in the data storage field, I've long thought that HDF5 would be an ideal off-the-shelf format for astronomy to adopt. The HDF5 group has put a lot of effort into making HDF5 performant, with chunking / distributed data and parallel I/O. These aren't in ASDF, but I think they could be valuable as cameras become larger. HDSv5 is an example of an astronomy data format implemented on top of HDF5.
However, Greenfield et al discourage the adoption of HDF5 for several reasons, of which I highlight a few:
- HDF5 is not formally an archival format. The HDF5 group's is the only software package that actually reads HDF5. Thus the format is essentially specified by the software. Now, the HDF5 library is essentially BSD licensed, so if you believe in the archivability of open source software, HDF5 isn't a terrible format.
- HDF5 doesn't have human readable text metadata, in the same sense than one can
less a FITS header.
- Apparently HDF5 is not flexible enough to support the metadata required for arbitrary WCS. Namely, HDF5 doesn't have arbitrary mapping types (like a dictionary you might find in Python).
The ASDF format is fairly clever and practical. It uses YAML for metadata, and they give examples of how complex WCS transformations can be described. A nice feature is that the same WCS can be used to describe multiple images (e.g., an exposure and a noise map). One can also embed 'complex' data like plain text tables in the YAML section.
Following the YAML metadata are zero or more binary data sections for images and tables.
There's an implementation of ASDF available in Python on GitHub.
Numpy is used to support binary data,
jsonschema drives the metadata and
Astropy drives the unit and WCS support.
I need to study the format more deeply, but at first blush it seems smart. My real concern is in the implementation. I think we really need a C (or why not, a Rust/Go) library implementation that has all the performance features of HDF5. This isn't a blocker for ASDF to be an archival format, but feels necessary given big file sizes in astronomy, and that it is already common with HDF5.
What does everyone else think?