-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add JSONConverter
Component
#8397
Conversation
Pull Request Test Coverage Report for Build 11030215143Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
self, | ||
jq_schema: Optional[str] = None, | ||
content_key: Optional[str] = None, | ||
extra_meta_fields: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra_meta_fields: Optional[List[str]] = None, | |
extra_meta_fields: Optional[Set[str]] = None, |
""" | ||
Utility function to extract text and metadata from a JSON file. | ||
""" | ||
file_content = source.data.decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wrap this in exception handling and handle decoding errors - malformed inputs are a common enough occurrence that it makes sense to rethrow/warn with an expanded error message (w/t the file path).
@component.output_types(documents=List[Document]) | ||
def run( | ||
self, | ||
sources: List[Union[str, Path, ByteStream]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjrl the strings in here are meant to be paths like other converters. If there needs be in the future we can treat them as JSON strings too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know! I don't think there is any need for us to work directly with JSON strings right now. File paths is fine.
Pls add this to pydocs too 🙏 |
Looks great @silvanocerza! My main wish was that this component could also serve as basically a I think the only request/question I have is it possible to set one content key and then say all other fields should automatically be treated as metadata fields? We often have the case that data is messy (go figure) and not all files we are ingesting have the same metadata keys (but the same content key) and we'd want to make sure we grab all of them and needing to manually specify could be cumbersome. |
Going to push some docstring updates real quick 👀 |
Related Issues
Proposed Changes:
Add a new
JSONConverter
Component.Usage:
How did you test it?
I added unit tests.
Notes for the reviewer
N/A
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.