Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access table metadata without loading data #166

Open
LibrEars opened this issue Feb 20, 2023 · 8 comments
Open

Access table metadata without loading data #166

LibrEars opened this issue Feb 20, 2023 · 8 comments

Comments

@LibrEars
Copy link

LibrEars commented Feb 20, 2023

Hello =)

Is it possible to access only the metadata of e.g. Qtables stored in the .asdf file-format without loading the data itself into memory? I tried to use QTable.read() with a find_table key-word but tree["data"]["meta"] will result in an error since the meta-data alone does not represent a table.

In the asdf documentation they state that "Array data remains unloaded until it is explicitly accessed". Therefore I thought it might be possible.

Here is a try using the example-data (QTable) from #118 :

# For the export of example data see issue 118
import asdf

# Is this only loading the meta-data into memory?
with asdf.open("Nr42_fluxgenerator.asdf") as af:
    meta = af["data"].meta

Or is a simple

meta = QTable.read("Nr42_fluxgenerator.asdf").meta 

already doing the job without reading the table data into memory? Otherwise, would it be useful to be implemented into astropy somehow?

@WilliamJamieson
Copy link
Contributor

Hello =)

Is it possible to access only the metadata of e.g. Qtables stored in the .asdf file-format without loading the data itself into memory?

Yes and you correctly summarize this below:

In the asdf documentation they state that "Array data remains unloaded until it is explicitly accessed". Therefore I thought it might be possible.

Here is a try using the example-data (QTable) from #118 :

# For the export of example data see issue 118
import asdf

# Is this only loading the meta-data into memory?
with asdf.open("Nr42_fluxgenerator.asdf") as af:
    meta = af["data"].meta

This should enable you to read the metadata while lazy-loading the array data (assuming you aren't using some exotic file storage system).

Or is a simple

meta = QTable.read("Nr42_fluxgenerator.asdf").meta 

already doing the job without reading the table data into memory? Otherwise, would it be useful to be implemented into astropy somehow?

Without a much deeper investigation, I am not sure if this will also lazy-load the table's arrays when you need them. Just looking at the code in asdf-astropy, I think this should also lazy load the table meaning the arrays won't get loaded by this access, but it depends on if the astropy interface needs to actually access all the data passed to it or not.

@taldcroft do you know if this call

return column_class(
data=data,
name=node["name"],
description=node.get("description"),
unit=node.get("unit"),
meta=node.get("meta"),
)
will perform any accesses to the data passed in or not? If it does not actually access the data in the array itself, then the data in the array will remain on disk until one attempts to actually access values in it.

@perrygreenfield
Copy link
Member

@LibrEars can you supply the yaml header of this file for us to inspect? (Or the whole file, but it may be too large to easily transport or have proprietary info you don't want to send). This can be done by using the command line utility asdftool explode <filename> which will split the file up into the YAML header and individual array files.

@LibrEars
Copy link
Author

Thank you for your responses!

it is the same type of file as used for the example in #118 (you could use the code there to generate it. I can not upload to GitHub since .asdf is not supported for upload)

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.13.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.13.0}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://astropy.org/astropy/extensions/astropy-1.0.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.2.2}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.2.2}
data: !<tag:astropy.org:astropy/table/table-1.0.0>
  colnames: [Curren, Fluxgenerated_flux]
  columns:
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 A
    value: !core/ndarray-1.0.0
      source: 0
      datatype: float64
      byteorder: little
      shape: [20]
  - !unit/quantity-1.1.0
    unit: !unit/unit-1.0.0 flx
    value: !core/ndarray-1.0.0
      source: 1
      datatype: float64
      byteorder: little
      shape: [20]
  meta: {Experimentalist: LibrEars, measurement_type: flux_of_fluxgenerator, nr: 42,
    pix: 7, temperature: 37, time: 'Wed Oct  5 09:59:08 2022', voltage: -2}
  qtable: true
...

@perrygreenfield
Copy link
Member

Thanks @LibrEars. Can you give an example of the metadata from the table itself? If there is information there that isn't appearing in the above, then I'd say the current implementation can't avoid loading the data. We would have to look at whether it is possible to reorganize the storage of the tables so that it is possible.

@LibrEars
Copy link
Author

By calling meta = QTable.read("Nr42_fluxgenerator.asdf").meta I only expect to receive the dictionary from the yaml header:

print(meta)

will yield

{'Experimentalist': 'LibrEars', 'measurement_type': 'flux_of_fluxgenerator', 'nr': 42, 'pix': 7, 'voltage': -2, 'time': 'Tue Feb 21 17:41:00 2023', 'temperature': 37}

@perrygreenfield
Copy link
Member

I'm not sure I understand the requirements entirely. Is it essential that the metadata for the table actually be in the meta attribute of the QTable? Based on the other issue, the focus seemed to be to make the file easy to access for those not wanting to use Python. For those, they won't be using QTable. As a work around, one can create an ASDF node for the table that has both a meta and a qtable attribute where meta has been populated with the qtable.meta contents (I don't know if you need the metadata attached to the QTable or not; you may, depending on the code using the tables). In saving the table this way, you will have duplicated metadata information. I don't know if that is a problem.

@perrygreenfield
Copy link
Member

Another workaround is to open the ASDF file using the _force_raw_types=True option. This will return the tree without converting any of the contents to Python objects except the basic ones used by yaml. In that you would be able to access the meta attribute of the data without reading (I think, this would need to be checked) the actual data. The drawback of this is that the ASDF file would have to be reopened should one decide one wants to get the data.

@LibrEars
Copy link
Author

thank you @perrygreenfield for the response. I have written a python module that first imports all the meta-data of measurements into a table. Like this it allows to filter measurements for specific meta-data (like the measurement type, experimentalist or experiment settings like temperature etc. ) and group them in order to import data only as a subgroup and to plot it in a later step. The source will be published soon and I will link it as reference.
Many different file formats were used in my lab for different measurement types and the idea is to introduce asdf as standard format for the future since it natively handles meta data, physical units and mathematical operations. Depending on the measurement type the amount of data might be high and a lazy-loading in python of the meta-data would make sense for the first step of importing all meta-data.

With #118 I only want to make sure that an adaption of asdf would not exclude users who wish to use their own software that is not able to read the binary part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants