Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing Synthetic Data with ESGPT #113

Open
sujaybanerjee opened this issue Jun 17, 2024 · 1 comment
Open

Processing Synthetic Data with ESGPT #113

sujaybanerjee opened this issue Jun 17, 2024 · 1 comment

Comments

@sujaybanerjee
Copy link

In this section, when I run this code block, I get an error.

import subprocess

command = """
PYTHONPATH=$(pwd):$PYTHONPATH ./scripts/build_dataset.py
--config-path="$(pwd)/sample_data/"
--config-name=dataset
"hydra.searchpath=[$(pwd)/configs]" """

command_out = subprocess.run(command, shell=True, capture_output=True)
print(command_out.stdout.decode())

if command_out.returncode == 1:
print("Command Errored!")

print(command_out.stderr.decode())

Here is the error message I get:

“$ PYTHONPATH=$(pwd):$PYTHONPATH python3 ./scripts/build_dataset.py --config-path="$(pwd)/sample_data/" --config-name=dataset "hydra.searchpath=[$(pwd)/configs]"
Error executing job with overrides: []
Traceback (most recent call last):
File "/home/user/EventStreamGPT/./scripts/build_dataset.py", line 364, in main
ESD = Dataset(config=config, input_schema=dataset_schema)
File "/home/user/EventStreamGPT/EventStream/data/dataset_base.py", line 550, in init
events_df, dynamic_measurements_df = self.build_event_and_measurement_dfs(
File "/home/user/EventStreamGPT/EventStream/data/dataset_base.py", line 259, in build_event_and_measurement_dfs
cls._process_events_and_measurements_df(
File "/home/user/EventStreamGPT/EventStream/data/dataset_polars.py", line 356, in _process_events_and_measurements_df
if len(df.columns) > 4:
File "/home/user/.local/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 411, in columns
return self._ldf.columns()
polars.exceptions.ComputeError: failed to determine supertype of cat and i64

This error occurred with the following context stack:
[1] 'select' failed
[2] 'with_columns' input failed to resolve
[3] 'drop' input failed to resolve
[4] 'with_columns' input failed to resolve
[5] 'drop' input failed to resolve
[6] 'filter' input failed to resolve
[7] 'filter' input failed to resolve
[8] 'with_columns' input failed to resolve
[9] 'drop' input failed to resolve
[10] 'filter' input failed to resolve
[11] 'select' input failed to resolve
[12] 'unique' input failed to resolve
[13] 'with row index' input failed to resolve

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.”

I am using Python 3.10.12 and polar 0.20.26. I was wondering how to fix this.

@mmcdermott
Copy link
Owner

Hi @sujaybanerjee -- this is a polars version issue. The main branch of ESGPT is only guaranteed with polars up to 0.18.15, as is specified in the pyproject.toml file. Can you try this on the dev branch, which supports a much more recent version of polars?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants