Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error While Running the Synthesis Command #1

Open
chandinir opened this issue Jul 25, 2023 · 17 comments
Open

Error While Running the Synthesis Command #1

chandinir opened this issue Jul 25, 2023 · 17 comments

Comments

@chandinir
Copy link
Collaborator

Hi! I wanted to flag this error I've been running into while attempting to run the synthesize command ($ censyn --s synthesize.cfg) on my computer. I've attached. few screenshots below which detail the issue, which is part of a much larger error message. It seems as though it is related to non-numeric data being present in the data file given. Let me know if you have any insight into why this could be happening or need any more information from me! Thank you!

Screenshot 2023-07-25 at 2 47 48 PM Screenshot 2023-07-25 at 2 47 35 PM Screenshot 2023-07-25 at 2 47 22 PM
@jjbam
Copy link
Collaborator

jjbam commented Jul 25, 2023

Hello! Several members of our team are also running into the same issue with running the Synthesize command.

Best,
James

@rrod515
Copy link
Contributor

rrod515 commented Jul 26, 2023

Can you paste the full backtrace? We need to see the CenSyn code that is ultimately causing the errors. You may need to set the number of cores in the synthesis config file to 1 to see all the errors.

@jjbam
Copy link
Collaborator

jjbam commented Jul 26, 2023

Screenshot 2023-07-26 102445
Screenshot 2023-07-26 102531
Screenshot 2023-07-26 102555
Screenshot 2023-07-26 102704
Screenshot 2023-07-26 102717

@jjbam jjbam closed this as completed Jul 26, 2023
@jjbam jjbam reopened this Jul 26, 2023
@rrod515
Copy link
Contributor

rrod515 commented Jul 26, 2023

Seems like a Pandas issue. What version are you using?

In the future you can copy/paste the error text directly in here and format as a code block as such:

In [2]: print(1/0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-2-2fc232d1511a> in <module>
----> 1 print(1/0)

ZeroDivisionError: division by zero

@rrod515
Copy link
Contributor

rrod515 commented Jul 26, 2023

You might also try forgoing the parquet conversion and using CSV as the input (the latest CenSyn version supports this). So change references to .parquet in the synthesis config to .csv.

@chandinir
Copy link
Collaborator Author

chandinir commented Jul 26, 2023

Hi Rolando! I'm not sure about the other team but I am using Python 3.8.17. I tried using the CSV as the input but I am still getting the error mentioned above. Here's the full error:


(censynenv) chandiniramesh@Chandinis-MacBook-Pro-2 censyn % censyn --s synthesize.cfg
2023-07-26 09:27:15,308 MainProcess root INFO     Loading Features file conf/features_PUMS-P.json
/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/jsonschema/validators.py:1296: DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future.
  cls = validator_for(schema)
2023-07-26 09:27:15,369 MainProcess root INFO     Create Features size is 134
2023-07-26 09:27:15,377 MainProcess root INFO     Loading data file data/ss16pil.csv
2023-07-26 09:27:22,672 MainProcess root INFO     Load Data size 126334 rows by 128 columns
Traceback (most recent call last):
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1490, in array_func
    result = self.grouper._cython_operation(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 959, in _cython_operation
    return cy_op.cython_operation(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 657, in cython_operation
    return self._cython_op_ndim_compat(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 497, in _cython_op_ndim_compat
    return self._call_cython_op(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 541, in _call_cython_op
    func = self._get_cython_function(self.kind, self.how, values.dtype, is_numeric)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 173, in _get_cython_function
    raise NotImplementedError(
NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 1692, in _ensure_numeric
    x = float(x)
ValueError: could not convert string to float

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 1696, in _ensure_numeric
    x = complex(x)
ValueError: complex() arg is a malformed string

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/bin/censyn", line 8, in <module>
    sys.exit(command_line_start())
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/__main__.py", line 7, in command_line_start
    cen_process.execute()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/programs/censyn.py", line 47, in execute
    self._process.execute()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/synthesis/synthesize.py", line 167, in execute
    self._data_df = self.report.time_function('Generate-data-time',
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/report/report.py", line 297, in time_function
    to_return = function(**kwargs)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/programs/censyn_base.py", line 256, in generate_data
    calc_s = feat.calculate_feature_data(data_df=data_df)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/features/feature.py", line 251, in calculate_feature_data
    return data_calculator.execute(data_df)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_data_calculator.py", line 92, in execute
    cur_result = obj_parser.execute()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_parser.py", line 64, in execute
    self.parse()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_parser.py", line 51, in parse
    tree = self._access_func()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 821, in _read_NumericExpression
    address3 = self._read_NumericValue()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 2864, in _read_NumericValue
    address1 = self._read_AddFactor()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 2979, in _read_AddFactor
    address1 = self._read_NumericFactor()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 3223, in _read_NumericFactor
    address1 = self._read_NumericFunctions()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 3303, in _read_NumericFunctions
    address0 = self._read_GroupByNumericFunc()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 5588, in _read_GroupByNumericFunc
    address0 = self._read_GroupByNumericFunc3()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 5635, in _read_GroupByNumericFunc3
    address0 = self._read_GroupByMeanFunc()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks_peg.py", line 6231, in _read_GroupByMeanFunc
    address0 = self._actions.groupby_mean_func(self._input, index1, self._offset, elements0)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks.py", line 1336, in groupby_mean_func
    self.operation_groupby(elements=elements)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/censyn/checks/checks.py", line 349, in operation_groupby
    gb1 = gb_op()
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1855, in mean
    result = self._cython_agg_general(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1507, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1503, in grouped_reduce
    applied = sb.apply(func)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 329, in apply
    result = func(self.values, **kwargs)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1503, in array_func
    result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1457, in _agg_py_fallback
    res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 994, in agg_series
    result = self._aggregate_series_pure_python(obj, func)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 1015, in _aggregate_series_pure_python
    res = func(group)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1857, in <lambda>
    alt=lambda x: Series(x).mean(numeric_only=numeric_only),
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/generic.py", line 11556, in mean
    return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/generic.py", line 11201, in mean
    return self._stat_function(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/generic.py", line 11158, in _stat_function
    return self._reduce(
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/series.py", line 4670, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 96, in _f
    return f(*args, **kwargs)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 158, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 421, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 727, in nanmean
    the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
  File "/Users/chandiniramesh/opt/anaconda3/envs/censynenv/lib/python3.8/site-packages/pandas/core/nanops.py", line 1699, in _ensure_numeric
    raise TypeError(f"Could not convert {x} to numeric") from err
TypeError: Could not convertto numeric

@rrod515
Copy link
Contributor

rrod515 commented Jul 26, 2023

What version of the pandas package are you using?

@chandinir
Copy link
Collaborator Author

Ah sorry! the pandas version is Version: 2.0.3

@rrod515
Copy link
Contributor

rrod515 commented Jul 26, 2023

That's probably the problem. I am on 1.3.3, so we're a major version off. Can you try a conda environment that has python=3.10 and pandas=1.3.3 as I have?

@chandinir
Copy link
Collaborator Author

What version of numpy do you have? I seem to be running into a problem with that when I designate the pandas version as 1.3.3.

@rrod515
Copy link
Contributor

rrod515 commented Jul 26, 2023

numpy 1.21.2 pypi_0 pypi

@rrod515
Copy link
Contributor

rrod515 commented Jul 27, 2023

Was able to recreate with a new environment with pandas=2.0.3, so it's likely a major version issue. Forcing pandas < 2.0.0 in the conda create step should probably fix things.

@rrod515
Copy link
Contributor

rrod515 commented Jul 27, 2023

The issue stems from "CalculateModel" models that are used to create new variables that are groupby means of other variables, for example:

-            "model_params": {
-                "expr": "mean(groupby(OCCP), 'PINCP')"
-            }

Removing variables from the configs that use these models will move past this particular error.

Pandas changed some default argument setting for some of the groupby functions, so that may be the source of the error (in particular the numeric_only option for groupby means). Unfortunately, fixing that writ large for the package will probably mean digging into the config parsing, in particular the checks/checks_peg.peg file.

After removing the culprit variables, you will still get an error related to pandas no longer accepting sets as colunm listings. I will be making a pull request that fixes this and allows things to run.

@chandinir
Copy link
Collaborator Author

I see - thank you Rolando! I was also actually able to run it using your version of pandas and numpy, but with python version 3.8

@rrod515
Copy link
Contributor

rrod515 commented Jul 27, 2023

Great! Can you see if the changes I made in the pull request still allow things to run on 3.8? If so then I feel fine with merging them into the main branch.

@chandinir
Copy link
Collaborator Author

Yup! I re-ran it again with those changes and looks like it ran fine with python 3.8

@rrod515
Copy link
Contributor

rrod515 commented Jul 27, 2023

Okay, leaving this open to give others a chance to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants