Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module to write to csv file #47

Merged
merged 40 commits into from
Dec 20, 2023
Merged

Module to write to csv file #47

merged 40 commits into from
Dec 20, 2023

Conversation

Morrizzzzz
Copy link
Contributor

@Morrizzzzz Morrizzzzz commented Nov 23, 2023

Closes #35

@Morrizzzzz Morrizzzzz marked this pull request as draft November 23, 2023 13:15
@bvreede bvreede mentioned this pull request Dec 1, 2023
6 tasks
@bvreede bvreede marked this pull request as ready for review December 19, 2023 14:14
@bvreede bvreede requested a review from carschno December 19, 2023 14:16
@bvreede
Copy link
Contributor

bvreede commented Dec 19, 2023

@carschno — if you want to try the new functionality (.write_csv()), take a look at the example notebook in the docs/ folder for demo code!

Copy link
Collaborator

@carschno carschno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor suggestions.

self,
utterances: list["Utterance"],
metadata: Optional[dict] = None,
suppress_warnings: bool = False # noqa: F821
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the noqa necessary here? I don't see why an F821 (undefined name) would be triggered here.

@property
def metadata_df(self):
"""Return the conversation metadata as a pandas dataframe."""
if not hasattr(self, "_metadata_df"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such cases, I prefer to set the respective attribute to None in __init__():

self._metadata_df = None

Consequently, this check can be replaced:

Suggested change
if not hasattr(self, "_metadata_df"):
if self._metadata_df is None:

I find this a bit more explicit.

@property
def metadata_df(self):
"""Return the corpus metadata as a pandas dataframe."""
if not hasattr(self, "_metadata_df"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in conversation.py, I would explicitly set self._metadata_df = None in __init__.py, and replace this check with:

Suggested change
if not hasattr(self, "_metadata_df"):
if self._metadata_df is None:

@property
def utterance_df(self):
"""Return the corpus utterances as a pandas dataframe."""
if not hasattr(self, "_utterance_df"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, preferably check for None and set explicitly in the constructor.

"""
_path = Path(path).with_suffix(".json")

object_dict = self.asdict()

with open(_path, "w", encoding='utf-8') as file:
json.dump(object_dict, file, indent=4)
print("Object saved to", _path)

def write_csv(self, path: str = "./file.csv"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor detail, but I find it slightly confusing that file.csv is the default, whereas that literal filename will never be used.
Given that two files are required, I would perhaps split the argument into two for making their respective purposes explicit: directory and file prefix (and optionally suffix too). With reasonably defaults, this should not add more load to the caller, e.g.:

Suggested change
def write_csv(self, path: str = "./file.csv"):
def write_csv(self, directory: str = ".", prefix: str = "file", suffix: str = ".csv"):

In the implementation, you can now construct the different files like this:

(Path(directory) / "_".join(prefix, specifier)).with_suffix(suffix)

with specifier being "metadata" or "utterances".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment! I like to keep the workflow for csv saving and json saving similar, so I'm keeping the argument structure, but I've opted to use the given filename for the utterance db, and add a _metdata suffix only for the metadata csv file.

tests/corpus/writer/test_writer.py Outdated Show resolved Hide resolved
Copy link

@bvreede bvreede merged commit b77821f into main Dec 20, 2023
8 checks passed
@bvreede bvreede deleted the csv_writer_version2 branch December 20, 2023 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add csv reader/parser & writer
3 participants