Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- updated _metadata_to_df function #70

Merged
merged 3 commits into from
Sep 4, 2024
Merged

Conversation

liesenf
Copy link
Contributor

@liesenf liesenf commented Sep 4, 2024

Bug description:
write_csv encounters TypeError when metadata is provided in dict format.

Solution:
edited write_csv function that features press_element" function that loops over each element and handles lists, dicts, and strings, or "return as is".
Closes #69

Error message:

TypeError Traceback (most recent call last)
in <cell line: 2>()
1 # Save the corpus as a .csv file locally
----> 2 Dutch_corpus.write_csv(path = "Dutch_corpus.csv")

8 frames
/usr/local/lib/python3.10/dist-packages/sktalk/corpus/write/writer.py in (x)
52 norm = pd.json_normalize(data=metadata, sep="_")
53 df = pd.DataFrame(norm)
---> 54 df[:] = np.vectorize(lambda x: ', '.join(
55 x) if isinstance(x, list) else x)(df)
56 return df

TypeError: sequence item 0: expected str instance, dict found

Key Changes:
Added process_element function: This function handles three cases:
List: Joins the elements with ', '.
Dictionary: Converts the dictionary to a JSON string using json.dumps. Alternatively, you could convert the dictionary to a custom string format, e.g., by joining key-value pairs with a colon.
Other types: Returns the value as-is.
Replaced lambda with process_element: The np.vectorize now applies this more robust function to each element in the DataFrame.
This approach should resolve the TypeError by correctly handling cases where elements in the DataFrame are dictionaries.

@liesenf liesenf added the bug Something isn't working label Sep 4, 2024
Copy link

sonarqubecloud bot commented Sep 4, 2024

@mdingemanse
Copy link

mdingemanse commented Sep 4, 2024

looks like a simple enough change — but your branch is out of date with main. Want to merge/rebase?

/edit I have reviewed & approved, feel free to merge this PR

@mdingemanse mdingemanse merged commit 6fbe8f9 into main Sep 4, 2024
9 checks passed
@bvreede
Copy link
Contributor

bvreede commented Sep 4, 2024

Great quick fix here!

The addition of a playground/ folder with a copied notebook is less standard though — consider removing this...

@liesenf
Copy link
Contributor Author

liesenf commented Sep 5, 2024

@bvreede Thanks for spotting - deleted

I was having trouble to remember the setup you introduced back in the days with a notebook for developing that loads the package from a branch.

Was that in the private repo scikit-talk_benchmarking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix TypeError in write_csv function Streamline sample Colab notebook & avoid (suppress?) errors
3 participants