Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing various errors on the file classification_with_grn_and_vsn #2011

Closed

Conversation

Humbulani1234
Copy link
Contributor

Dataset preparation errors


The example file from structured_data classification_with_grn_and vsn.py I think it is using the wrong dataset, i.e., the data_url: https://archive.ics.uci.edu/static/public/20/census+income.zip leads to a download of an incorrect dataset. The correct data_url, I believe should be: https://archive.ics.uci.edu/static/public/117/census+income+kdd.zip

To extract the downloaded .tar.gz file, created during keras.utils.get_file, a fix has been added.

A fix was also added to clean up the directory that the files where extracted to during download in order to run the script again without errors:

Additionally, the original script has the code snippet:

train_data_path = os.path.join(
    os.path.expanduser("~"), ".keras", "datasets", "adult.data"
)
test_data_path = os.path.join(
    os.path.expanduser("~"), ".keras", "datasets", "adult.test"
)

The above snippet doesn't account for the directory created during keras.utils.get_file extraction process census+income+kdd.zip which leads to an incorrect path for both train_data_path and test_data_path, and a fix has been added.

Additional training errors


After covering the above dataset's preparation process, the script also has an additional error encountered during model training, detailed below and an attempted solution provided:

2024-12-19 21:02:15.350619: W tensorflow/core/framework/op_kernel.cc:1816] OP_REQUIRES failed at cast_op.cc:122 : UNIMPLEMENTED: Cast string to float is not supported
2024-12-19 21:02:15.350683: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: UNIMPLEMENTED: Cast string to float is not supported
Traceback (most recent call last):
  File "/home/humbulani/tensorflow-env/keras-io-master/examples/structured_data/classification_with_grn_and_vsn.py", line 513, in <module>
    model.fit(
  File "/home/humbulani/tensorflow-env/env/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/humbulani/tensorflow-env/env/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 5983, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.UnimplementedError: Exception encountered when calling Functional.call().

{{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:CPU:0}} Cast string to float is not supported [Op:Cast] name:

Attempted solution:

I believe I have precisely traced the error to the following, here is a pdb script:

> /home/humbulani/tensorflow-env/env/lib/python3.11/site-packages/keras/src/models/functional.py(245)_convert_inputs_to_tensors()
-> converted = []
(Pdb) p self._inputs
[<KerasTensor shape=(None,), dtype=float32, sparse=False, name=age>, <KerasTensor shape=(None,), dtype=float32, sparse=False, name=capital_gains>, <KerasTensor shape=(None,), dtype=float32, sparse=False, name=capital_losses>, ...]

(Pdb) p flat_inputs
[<tf.Tensor: shape=(265,), dtype=float32, numpy=
array([63., 52.,  2., 45.,  0., 43., 67., 26., 29., 53., 31., 59., 57.,...>, <tf.Tensor: shape=(265,), dtype=string, numpy=
array([b' Not in universe', b' Private', b' Not in universe', b' Private',
       b' Not in universe', b' Private', b' Not in universe',...>...]

The function _convert_inputs_to_tensors creates a zip iterator pairing together flat_inputs and self._inputs, and as per the pdb output above the first element (age) from flat_inputs and self._inputs has float32 dtype, however the second element (capital_gains) has a float32 dtype and a string dtype causing the discrepancy, and hence the error.

The main issue is that inputs datatype to the method Functional.call is a OrderedDict and in the function _standardize_inputs the line flat_inputs = tree.flatten(inputs) is not actually ordering/sorting the OrderedDict as per doc for the function tree.flatten. This contributes to the mismatch between self._inputs, the models inputs, and flat_inputs. Hence a fix has been provided in the script function process to convert features to dict.

Fix provided, and I think tree.flatten functionality must be assessed and rectified.

Environment


Tensorflow == 2.16.2
Keras == 3.7.0
Python == 3.11.10

@Humbulani1234
Copy link
Contributor Author

Find generated .ipynb and .md files.

@Humbulani1234
Copy link
Contributor Author

Will resend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants