Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Class version #126

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft

WIP: Class version #126

wants to merge 2 commits into from

Conversation

dalonsoa
Copy link
Collaborator

This PR suggest some exploratory work implementing the same functionality (the readers in this case) using classes and subclasses.

The reason for this change is:

  • Reduces some redundancy of the code, in particular when reading headers.
  • Has a slightly better separation of concerns.
  • Enables using the header to validate or do something else with the tabular data more easily in the future, implementing dedicated methods within each subclass.

No need to do a thorough review, just gathering some feedback on the approach, pros, cons and things to consider.

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have much to offer, but I think this makes sense and could make things clearer. Is the plan to keep the API the same and have e.g. read_to_dataframe be a wrapper around a DataFrameReader class or whatever, or would users be using DataFrameReader directly?

One small idea for the read_data() method: if you have it returning an Iterable rather than a list then that would mean some implementations could return data one line at a time (e.g. it would work for your ListReader one), which would be better for a really big file.

In terms of the API in general, I personally think it would be nicer to pass in file objects instead of paths (or in addtion to!), which would also mean only taking one pass at the data. This is the same way that e.g. the PyYAML library and others do things. (I suggested it for writers in #28.) Aside from meaning users can pass in other file like objects, it also means the caller is responsible for opening and closing the file, which seems like a better separation of concerns. Just a (somewhat off-topic) thought.

@AdrianDAlessandro
Copy link
Contributor

I think this approach helps with keeping things consistent and not repeating code. There could be a more more functional way to do this, but I think this is maybe clearer than that would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants