Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation Failure when Schema Contains Column with Empty List of Validations Objects #63

Open
lguntde opened this issue Aug 27, 2021 · 3 comments

Comments

@lguntde
Copy link

lguntde commented Aug 27, 2021

I built a schema based on the list of columns I knew my DataFrame would contain. An number of these don't require validation beyond a check that they exist in the DataFrame (example: column containing comment field of unspecified format). I built a schema wherein I specified these columns as follows:

schema = Schema([
... ,
Column('name',[]),
...
])

I then ran schema.validate(df) and received the following error:

Exception has occurred: AttributeError
'str' object has no attribute 'get_errors'

This traces back to:
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/pandas_schema/column.py", line 27, in
return [error for validation in self.validations for error in validation.get_errors(series, self)]

Since the column instance in question has no validations to iterate over, it makes sense, that this would fail.

My suggestion would be to include a check in the code that simply returns [] if no validations are present.

@multimeric
Copy link
Owner

Seems like a reasonable request, I would accept a PR for this behaviour.

@multimeric
Copy link
Owner

Hmm on second thoughts the issue goes deeper than this, I think. If you had no validations this should return an empty list anyway. For example:

>>> [b for a in [] for b in a]
[]

However you must actually have a string or several strings in your validations list. Please look into that in your code and report it here.

@ajithprabhakar
Copy link

ajithprabhakar commented Apr 7, 2022

I am also getting a similar error, is there any plan for a fix for this issue? My use case is a dynamic validation

We have multiple saved schemas for different files, as the files are uploaded we create schema dynamically by adding columns based on the schema saved on DB and then run the validation.
A CSV will be generated with all the validation errors and presented to the user.

It will be really great if you could fix this issue ASAP

Here is the stack trace

AttributeError Traceback (most recent call last)
<command-3981061697201222> in
----> 1 errors = schema.validate(sourceDf)

/databricks/python/lib/python3.7/site-packages/pandas_schema/schema.py in validate(self, df, columns)
84 # Iterate over each pair of schema columns and data frame series and run validations
85 for series, column in column_pairs:
---> 86 errors += column.validate(series)
87
88 return sorted(errors, key=lambda e: e.row)

/databricks/python/lib/python3.7/site-packages/pandas_schema/column.py in validate(self, series)
25 :return: An iterable of ValidationError instances generated by the validation
26 """
---> 27 return [error for validation in self.validations for error in validation.get_errors(series, self)]

/databricks/python/lib/python3.7/site-packages/pandas_schema/column.py in (.0)
25 :return: An iterable of ValidationError instances generated by the validation
26 """
---> 27 return [error for validation in self.validations for error in validation.get_errors(series, self)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants