-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module to write to csv file #47
Conversation
@carschno — if you want to try the new functionality ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor suggestions.
sktalk/corpus/conversation.py
Outdated
self, | ||
utterances: list["Utterance"], | ||
metadata: Optional[dict] = None, | ||
suppress_warnings: bool = False # noqa: F821 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the noqa
necessary here? I don't see why an F821
(undefined name) would be triggered here.
sktalk/corpus/conversation.py
Outdated
@property | ||
def metadata_df(self): | ||
"""Return the conversation metadata as a pandas dataframe.""" | ||
if not hasattr(self, "_metadata_df"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such cases, I prefer to set the respective attribute to None
in __init__()
:
self._metadata_df = None
Consequently, this check can be replaced:
if not hasattr(self, "_metadata_df"): | |
if self._metadata_df is None: |
I find this a bit more explicit.
sktalk/corpus/corpus.py
Outdated
@property | ||
def metadata_df(self): | ||
"""Return the corpus metadata as a pandas dataframe.""" | ||
if not hasattr(self, "_metadata_df"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in conversation.py
, I would explicitly set self._metadata_df = None
in __init__.py
, and replace this check with:
if not hasattr(self, "_metadata_df"): | |
if self._metadata_df is None: |
sktalk/corpus/corpus.py
Outdated
@property | ||
def utterance_df(self): | ||
"""Return the corpus utterances as a pandas dataframe.""" | ||
if not hasattr(self, "_utterance_df"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, preferably check for None
and set explicitly in the constructor.
""" | ||
_path = Path(path).with_suffix(".json") | ||
|
||
object_dict = self.asdict() | ||
|
||
with open(_path, "w", encoding='utf-8') as file: | ||
json.dump(object_dict, file, indent=4) | ||
print("Object saved to", _path) | ||
|
||
def write_csv(self, path: str = "./file.csv"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor detail, but I find it slightly confusing that file.csv
is the default, whereas that literal filename will never be used.
Given that two files are required, I would perhaps split the argument into two for making their respective purposes explicit: directory and file prefix (and optionally suffix too). With reasonably defaults, this should not add more load to the caller, e.g.:
def write_csv(self, path: str = "./file.csv"): | |
def write_csv(self, directory: str = ".", prefix: str = "file", suffix: str = ".csv"): |
In the implementation, you can now construct the different files like this:
(Path(directory) / "_".join(prefix, specifier)).with_suffix(suffix)
with specifier
being "metadata"
or "utterances"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment! I like to keep the workflow for csv saving and json saving similar, so I'm keeping the argument structure, but I've opted to use the given filename for the utterance db, and add a _metdata
suffix only for the metadata csv file.
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
Closes #35