Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug description:
write_csv encounters TypeError when metadata is provided in dict format.
Solution:
edited write_csv function that features press_element" function that loops over each element and handles lists, dicts, and strings, or "return as is".
Closes #69
Error message:
TypeError Traceback (most recent call last)
in <cell line: 2>()
1 # Save the corpus as a .csv file locally
----> 2 Dutch_corpus.write_csv(path = "Dutch_corpus.csv")
8 frames
/usr/local/lib/python3.10/dist-packages/sktalk/corpus/write/writer.py in (x)
52 norm = pd.json_normalize(data=metadata, sep="_")
53 df = pd.DataFrame(norm)
---> 54 df[:] = np.vectorize(lambda x: ', '.join(
55 x) if isinstance(x, list) else x)(df)
56 return df
TypeError: sequence item 0: expected str instance, dict found
Key Changes:
Added process_element function: This function handles three cases:
List: Joins the elements with ', '.
Dictionary: Converts the dictionary to a JSON string using json.dumps. Alternatively, you could convert the dictionary to a custom string format, e.g., by joining key-value pairs with a colon.
Other types: Returns the value as-is.
Replaced lambda with process_element: The np.vectorize now applies this more robust function to each element in the DataFrame.
This approach should resolve the TypeError by correctly handling cases where elements in the DataFrame are dictionaries.