-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TypeError in write_csv function #69
Labels
bug
Something isn't working
Comments
does fixing this issue mean we need a new release to get it out? |
Yes but for pypi that is not too bad |
mdingemanse
pushed a commit
that referenced
this issue
Sep 4, 2024
Bug description: write_csv encounters TypeError when metadata is provided in dict format. Solution: edited write_csv function that features press_element" function that loops over each element and handles lists, dicts, and strings, or "return as is". Closes #69 Error message: TypeError Traceback (most recent call last) in <cell line: 2>() 1 # Save the corpus as a .csv file locally ----> 2 Dutch_corpus.write_csv(path = "Dutch_corpus.csv") 8 frames [/usr/local/lib/python3.10/dist-packages/sktalk/corpus/write/writer.py](https://46yu2lzt3ep-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240830-060131_RC00_669227538#) in (x) 52 norm = pd.json_normalize(data=metadata, sep="_") 53 df = pd.DataFrame(norm) ---> 54 df[:] = np.vectorize(lambda x: ', '.join( 55 x) if isinstance(x, list) else x)(df) 56 return df TypeError: sequence item 0: expected str instance, dict found Key Changes: Added process_element function: This function handles three cases: List: Joins the elements with ', '. Dictionary: Converts the dictionary to a JSON string using json.dumps. Alternatively, you could convert the dictionary to a custom string format, e.g., by joining key-value pairs with a colon. Other types: Returns the value as-is. Replaced lambda with process_element: The np.vectorize now applies this more robust function to each element in the DataFrame. This approach should resolve the TypeError by correctly handling cases where elements in the DataFrame are dictionaries.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug description:
write_csv encounters TypeError when metadata is provided in dict format.
Solution:
edited write_csv function that features
press_element
function that loops over each element and handles lists, dicts, and strings, or "return as is".Error message:
TypeError Traceback (most recent call last)
in <cell line: 2>()
1 # Save the corpus as a .csv file locally
----> 2 Dutch_corpus.write_csv(path = "Dutch_corpus.csv")
8 frames
/usr/local/lib/python3.10/dist-packages/sktalk/corpus/write/writer.py in (x)
52 norm = pd.json_normalize(data=metadata, sep="_")
53 df = pd.DataFrame(norm)
---> 54 df[:] = np.vectorize(lambda x: ', '.join(
55 x) if isinstance(x, list) else x)(df)
56 return df
TypeError: sequence item 0: expected str instance, dict found
Key Changes:
Added process_element function: This function handles three cases:
List: Joins the elements with ', '.
Dictionary: Converts the dictionary to a JSON string using json.dumps. Alternatively, you could convert the dictionary to a custom string format, e.g., by joining key-value pairs with a colon.
Other types: Returns the value as-is.
Replaced lambda with process_element: The np.vectorize now applies this more robust function to each element in the DataFrame.
This approach should resolve the TypeError by correctly handling cases where elements in the DataFrame are dictionaries.
The text was updated successfully, but these errors were encountered: