Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with data export from example file #58

Open
danolson1 opened this issue May 15, 2023 · 2 comments
Open

Problems with data export from example file #58

danolson1 opened this issue May 15, 2023 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@danolson1
Copy link

When I import the EnzymeML_Template_Example.xlsm file into an EnzymeMLDocument using pyenzyme, and then try to look at the data, I get a dataframe with the "absorbance" and "concentration" data concatenated on top of each other. I have a couple of questions about this:

  1. Is this the expected behavior? I would have expected to get a dataframe that roughly corresponded to the input data from the excel template file. In this case, one column for time, and two columns for the pyruvate (species s0) data, one corresponding to concentration, and one to absorbance.
  2. If you have a datatype for absorbance, is there any place to store information about the absorbance wavelength?
  3. From an EnzymeMLDocument object, how do I find the data type? I would have expected this information to be accessible from the measurement_dict object.

Regards,
Dan

@JR-1991
Copy link
Member

JR-1991 commented May 16, 2023

Hi Dan! Thanks for submitting the issue and your questions. Happy to answer your questions:

Is this the expected behavior? I would have expected to get a dataframe that roughly corresponded to the input data from the excel template file. In this case, one column for time, and two columns for the pyruvate (species s0) data, one corresponding to concentration, and one to absorbance.

This is expected behavior but has been implemented in aid of the modeling platforms we are communicating to. I am happy to add a flag that disables this behavior and results in species columns side by side.

If you have a datatype for absorbance, is there any place to store information about the absorbance wavelength?

To this point, there is no place to add the wavelength of an absorbance to EnzymeML, but this is a current work in progress and will be implemented soon.

From an EnzymeMLDocument object, how do I find the data type? I would have expected this information to be accessible from the measurement_dict object.

The data_type information is tied to the Replicate object, which is a container for the measured values of a species. The Measurement object on the other hand represents a set of Replicates and initial concentrations. Hence, you can access the individual data types by getting the replicates. Here is an example that uses the EnzymeML_template_example.xlsm spreadsheet:

# Get the measurement with the id "m1"
measurement = enzmldoc.getMeasurement("m1")

# Get the reactant with the id "s0"
s1 = measurement.getReactant("s0")

# Finally, get all replicates and print their data types
for replicate in s1.replicates:
    print(replicate.data_type)

# Out:
#     DataTypes.ABSORPTION
#     DataTypes.CONCENTRATION

Would you prefer the DataFrame export to filter certain data types? This way there wouldn't be a mix up of different types.

All the best,
Jan

@JR-1991 JR-1991 self-assigned this May 16, 2023
@JR-1991 JR-1991 added the question Further information is requested label May 16, 2023
@danolson1
Copy link
Author

I think that only concentration data should be exported by default, since that is the data that is most likely to be used by people other than the creator of the EnzymeML document.

Without wavelength information, the absorbance data does not seem particularly useful. I agree that it should be saved for archival purposes, but this is one more reason to exclude it from the default export.

If the absorbance data is exported, it should be exported as a separate column. To me, having one column that contains both the expected concentration data, and the unexpected (and differently scaled) absorbance data, seems very confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants