-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Add import/export feature #9
base: main
Are you sure you want to change the base?
rfc: Add import/export feature #9
Conversation
WalkthroughThis update introduces an Import/Export feature for DataHub. Users can export datasets in CSV format and import CSV files through the UI. The feature adds options in the SearchExtendedMenu dropdown to export individual datasets or all datasets within a container. An export modal collects CSV file details, and metadata is retrieved via GraphQL queries. For importing, the papaparse library processes CSV files and a new GraphQL mutation (upsertDataset) along with related input types manages dataset upsertion. Documentation has been updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI
participant GraphQL
participant CSV_Module
User->>UI: Clicks "Export" option in SearchExtendedMenu
UI->>User: Displays export modal for CSV file details
User->>UI: Submits CSV file name and export options
UI->>GraphQL: Fetches dataset metadata via GraphQL queries
GraphQL-->>UI: Returns dataset metadata
UI->>CSV_Module: Generates CSV file with required columns
CSV_Module-->>User: Provides CSV file download
sequenceDiagram
participant User
participant UI
participant Papaparse
participant GraphQL
User->>UI: Selects CSV file for import
UI->>Papaparse: Parses CSV file using papaparse library
Papaparse-->>UI: Returns parsed dataset information
UI->>GraphQL: Sends upsertDataset mutation(s) with dataset details
GraphQL-->>UI: Confirms upsertion of datasets
UI-->>User: Displays import result
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
active/0000-import-export-feature/README.md (11)
1-4
: Metadata and RFC Details UpdateThe header section includes placeholder metadata (start date, RFC PR link, and implementation PR references). Ensure these fields are updated with the actual dates and links once available.
36-46
: CSV Columns Explanation – Typo CorrectionThe explanation of CSV columns is thorough. Note: In line 39, the word “assset” should be corrected to “asset.”
- - asset_type: What type of assset is contained in the row. This is either a dataset or schema field. + - asset_type: What type of asset is contained in the row. This is either a dataset or schema field.
48-52
: Export Button Description – CapitalizationIn line 50, the phrase “using a react effect” should capitalize “React” as it refers to the framework.
- This is done using a react effect, which greys out the button unless the URL of the current page contains the word "container". + This is done using a React effect, which greys out the button unless the URL of the current page contains the word "container".🧰 Tools
🪛 LanguageTool
[grammar] ~50-~50: “React” is a proper noun and needs to be capitalized.
Context: ...cannot be pressed. This is done using a react effect, which greys out the button unle...(A_GOOGLE)
53-57
: Modal Fields ClarityThe description of fields in the export modal is clear. Consider adding a note regarding the assumptions about the container structure and potential limitations for data sources that do not follow it.
70-73
: GraphQL Queries Header ConsistencyEnsure consistent capitalization by using “GraphQL” instead of “GraphQl” (e.g., in line 70).
- #### GraphQl queries + #### GraphQL queries
254-261
: File Input for ImportThe JSX snippet for the file upload input is concise. For improved accessibility, consider adding attributes like
aria-label
or a visually hidden label.
293-301
: Import Field Mappings – Typo CorrectionsThe defaults for fields on import are clearly listed. Important: In lines 295 and 298, the CSV field is referred to as
resrource
which should be corrected toresource
.- - `name`: The name is extracted from the dataset URN stored in the `resrource` CSV field. + - `name`: The name is extracted from the dataset URN stored in the `resource` CSV field.- - `platformUrn`: The platform URN is extracted from the dataset URN stored in the `resrource` CSV field. + - `platformUrn`: The platform URN is extracted from the dataset URN stored in the `resource` CSV field.🧰 Tools
🪛 LanguageTool
[uncategorized] ~295-~295: Loose punctuation mark.
Context: ...d with these values on import: -name
: The name is extracted from the dataset ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~298-~298: Loose punctuation mark.
Context: ...aName: An empty string. -
platformUrn`: The platform URN is extracted from the ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~299-~299: Loose punctuation mark.
Context: ...d in theresrource
CSV field. -type
:SchemaFieldDataType.Null
- `nativeDat...(UNLIKELY_OPENING_PUNCTUATION)
320-323
: Clarification on External ExtensionIn line 322, change “a extension” to “an extension” since “extension” begins with a vowel sound.
- It's also notable that a extension for DataHub does exist which adds very similar functionality... + It's also notable that an extension for DataHub does exist which adds very similar functionality...🧰 Tools
🪛 LanguageTool
[misspelling] ~322-~322: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...th remediating. It's also notable that a extension for DataHub does exist which ...(EN_A_VS_AN)
328-333
: Future Work and CSV Enhancements – Typo CorrectionIn the future work section, specifically line 330, “uased” should be corrected to “used.” Additionally, consider clarifying how new CSV columns will impact the UX and data integrity.
- ... it could also be uased to store the sub types of datasets. + ... it could also be used to store the sub types of datasets.🧰 Tools
🪛 LanguageTool
[typographical] ~330-~330: The conjunction “so that” does not have a comma in front.
Context: ... corresponding columns to the CSV schema, so that we can populate those fields. Notably, ...(SO_THAT_UNNECESSARY_COMMA)
[misspelling] ~330-~330: This word is normally spelled as one.
Context: ...mn, it could also be uased to store the sub types of datasets. Additionally, the `glossa...(EN_COMPOUNDS_SUB_TYPES)
334-335
: Dataset-level Export RefactoringThere is a spelling mistake in line 334 where “laters” should be “layers.” Additionally, consider varying the phrasing here versus other similar statements in the document to reduce repetition.
- ... designed to only work with data sources with two laters of containers in DataHub. + ... designed to only work with data sources with two layers of containers in DataHub.🧰 Tools
🪛 LanguageTool
[style] ~334-~334: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...low, the dataset-level export will also need to be refactored to be more flexible, as a...(REP_NEED_TO_VB)
336-340
: Unresolved Questions – Style ImprovementThe unresolved questions section raises important points about flexibility and side effects. To improve readability, consider rephrasing some repetitive language (e.g., reducing repeated uses of “as such” and similar constructs).
🧰 Tools
🪛 LanguageTool
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...and as such, this component will likely need to be refactored to be more flexible. We w...(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...refactored to be more flexible. We will need to determine what shape the component shou...(REP_NEED_TO_VB)
[style] ~340-~340: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...rough GraphQL as a side effect. It will need to be evaluated whether this is an accepta...(REP_NEED_TO_VB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
active/0000-import-export-feature/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
active/0000-import-export-feature/README.md
[grammar] ~50-~50: “React” is a proper noun and needs to be capitalized.
Context: ...cannot be pressed. This is done using a react effect, which greys out the button unle...
(A_GOOGLE)
[uncategorized] ~295-~295: Loose punctuation mark.
Context: ...d with these values on import: - name
: The name is extracted from the dataset ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~298-~298: Loose punctuation mark.
Context: ...aName: An empty string. -
platformUrn`: The platform URN is extracted from the ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~299-~299: Loose punctuation mark.
Context: ...d in the resrource
CSV field. - type
: SchemaFieldDataType.Null
- `nativeDat...
(UNLIKELY_OPENING_PUNCTUATION)
[misspelling] ~322-~322: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...th remediating. It's also notable that a extension for DataHub does exist which ...
(EN_A_VS_AN)
[typographical] ~330-~330: The conjunction “so that” does not have a comma in front.
Context: ... corresponding columns to the CSV schema, so that we can populate those fields. Notably, ...
(SO_THAT_UNNECESSARY_COMMA)
[misspelling] ~330-~330: This word is normally spelled as one.
Context: ...mn, it could also be uased to store the sub types of datasets. Additionally, the `glossa...
(EN_COMPOUNDS_SUB_TYPES)
[style] ~334-~334: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...low, the dataset-level export will also need to be refactored to be more flexible, as a...
(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...and as such, this component will likely need to be refactored to be more flexible. We w...
(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...refactored to be more flexible. We will need to determine what shape the component shou...
(REP_NEED_TO_VB)
[style] ~340-~340: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...rough GraphQL as a side effect. It will need to be evaluated whether this is an accepta...
(REP_NEED_TO_VB)
🔇 Additional comments (14)
active/0000-import-export-feature/README.md (14)
5-9
: Clear Feature SummaryThe feature title and summary clearly state the goal (import/export datasets via CSV) and the current status. Ensure that any further refinements or updates to the implementation are documented in this section.
11-14
: Motivation Section ClarityThe motivation section explains the rationale behind mimicking Collibra’s functionality and how it benefits enterprise users. Consider adding details on expected user benefits or quantitative metrics if available.
15-21
: Requirements Section VerificationThe requirements are explicitly listed for export and import functionality. This clarity is helpful. Ensure that any limitations (for example container-based assumptions) are clearly communicated in both documentation and UI, as noted later.
22-25
: Non-Requirements SectionThe non-requirements section is succinct and clearly specifies what is out of scope (i.e. REST API). No changes needed.
26-28
: Detailed Design IntroductionThe introduction to the detailed design outlines the overall approach for adding three options in the dropdown. This sets the stage well for the detailed sections that follow.
30-34
: CSV Columns DefinitionThe CSV block listing the column names is clear. Verify that the list of columns exactly matches the expected fields during both export and import.
58-68
: Detailed Export ProcessThe step-by-step outline of the export process is detailed and easy to follow. Ensure that error handling (e.g., when no datasets are found) is robust in the implementation.
74-159
: GraphQL Query: getDatasetByUrnThe query is comprehensive and correctly structured to fetch dataset metadata. Verify that client-side code consuming this query handles pagination (50 datasets per query) as described.
160-252
: GraphQL Query: getTableThis complementary query for fetching dataset details looks consistent with the first query. Ensure both queries are maintained together as updates in one may require changes in the other.
262-291
: New GraphQL Mutation and Input TypesThe new
upsertDataset
mutation and its corresponding input types are well defined. Check that the trailing comma inglobalTagUrns: [String],
(line 273) is acceptable per your GraphQL schema style guidelines.
304-309
: User Guide Section (“How we teach this”)This section outlines the essentials for a user guide effectively. Consider detailing common pitfalls or visual guides for using the feature.
310-313
: GraphQL Documentation Update ReminderThe note regarding updating GraphQL documentation is insightful. Keep the documentation in sync with schema changes to avoid confusion.
314-319
: Drawbacks and Alternatives DiscussionThe discussion on drawbacks and alternatives is comprehensive. Consider ensuring consistent styling for “React” across the document when referring to the UI framework.
324-327
: Rollout/Adoption Strategy ClarityThe rollout strategy is concise and confirms that no migration is needed. This clarity is beneficial.
OSS Review Recommendation: Proceed |
rfc: rename file and add PR link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
active/0009-import-export-feature/README.md (7)
1-4
: Review Header and PR Details:
The header clearly lists the start date, RFC PR URL, and the placeholder for implementation PR(s). Please consider formatting the bare URL on line 2 as a proper markdown link (e.g., using angle brackets or the link text syntax) to adhere to markdownlint guidelines. Also, if an implementation PR exists, including its link would improve clarity.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
2-2: Bare URL used
null(MD034, no-bare-urls)
58-69
: Export Process Steps Overview:
The step-by-step description of the export process is well laid out and easy to follow. It might be beneficial to include notes on error handling or fallback behavior if a step fails (for example, if the GraphQL query returns an unexpected result).
74-253
: GraphQL Queries Implementation:
The two GraphQL queries (getDatasetByUrn and getTable) are well-structured and clearly presented. One point to consider is the performance handling for containers with more than 50 datasets—monitoring and possibly refactoring the pagination and multiple CSV downloads might be useful in the future.
254-261
: File Input for Import:
The hidden file input element is implemented correctly. However, using inline styles (i.e., setting opacity directly) might limit maintainability. Consider moving these styles to an external CSS class, which would promote consistency and easier updates in the future.
293-301
: Field Defaults for Import Details:
The explanation of how missing CSV fields are supplemented with default values (e.g.,version: 0
,schemaName: ""
, etc.) is useful. Please review the punctuation and formatting in this section (specifically around lines 295–299) to remove any loose punctuation and ensure consistency.🧰 Tools
🪛 LanguageTool
[uncategorized] ~295-~295: Loose punctuation mark.
Context: ...d with these values on import: -name
: The name is extracted from the dataset ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~298-~298: Loose punctuation mark.
Context: ...aName: An empty string. -
platformUrn`: The platform URN is extracted from the ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~299-~299: Loose punctuation mark.
Context: ...ed in theresource
CSV field. -type
:SchemaFieldDataType.Null
- `nativeDat...(UNLIKELY_OPENING_PUNCTUATION)
330-330
: Typographical Correction Needed:
There appears to be a typographical error on line 330 where "uased" is used instead of "used." A simple fix is needed here.🧰 Tools
🪛 LanguageTool
[typographical] ~330-~330: The conjunction “so that” does not have a comma in front.
Context: ... corresponding columns to the CSV schema, so that we can populate those fields. Notably, ...(SO_THAT_UNNECESSARY_COMMA)
[misspelling] ~330-~330: This word is normally spelled as one.
Context: ...mn, it could also be uased to store the sub types of datasets. Additionally, the `glossa...(EN_COMPOUNDS_SUB_TYPES)
334-340
: Repetitive Phrasing in Future Work and Unresolved Questions:
The phrases such as “need to be refactored to be more flexible” are repeated across these sections. Consider rephrasing these areas to add variety and enhance readability.🧰 Tools
🪛 LanguageTool
[style] ~334-~334: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...low, the dataset-level export will also need to be refactored to be more flexible, as a...(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...and as such, this component will likely need to be refactored to be more flexible. We w...(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...refactored to be more flexible. We will need to determine what shape the component shou...(REP_NEED_TO_VB)
[style] ~340-~340: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...rough GraphQL as a side effect. It will need to be evaluated whether this is an accepta...(REP_NEED_TO_VB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
active/0009-import-export-feature/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
active/0009-import-export-feature/README.md
[uncategorized] ~295-~295: Loose punctuation mark.
Context: ...d with these values on import: - name
: The name is extracted from the dataset ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~298-~298: Loose punctuation mark.
Context: ...aName: An empty string. -
platformUrn`: The platform URN is extracted from the ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~299-~299: Loose punctuation mark.
Context: ...ed in the resource
CSV field. - type
: SchemaFieldDataType.Null
- `nativeDat...
(UNLIKELY_OPENING_PUNCTUATION)
[typographical] ~330-~330: The conjunction “so that” does not have a comma in front.
Context: ... corresponding columns to the CSV schema, so that we can populate those fields. Notably, ...
(SO_THAT_UNNECESSARY_COMMA)
[misspelling] ~330-~330: This word is normally spelled as one.
Context: ...mn, it could also be uased to store the sub types of datasets. Additionally, the `glossa...
(EN_COMPOUNDS_SUB_TYPES)
[style] ~334-~334: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...low, the dataset-level export will also need to be refactored to be more flexible, as a...
(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...and as such, this component will likely need to be refactored to be more flexible. We w...
(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...refactored to be more flexible. We will need to determine what shape the component shou...
(REP_NEED_TO_VB)
[style] ~340-~340: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...rough GraphQL as a side effect. It will need to be evaluated whether this is an accepta...
(REP_NEED_TO_VB)
🪛 markdownlint-cli2 (0.17.2)
active/0009-import-export-feature/README.md
2-2: Bare URL used
null
(MD034, no-bare-urls)
🔇 Additional comments (6)
active/0009-import-export-feature/README.md (6)
32-34
: CSV Columns Listing is Clear:
The CSV header block clearly lists all the column names required for the import/export feature. The format is concise and self-explanatory.
36-47
: Detailed CSV Columns Usage Explanation:
This section thoroughly explains the purpose and formatting of each CSV column. It provides valuable context for both datasets and schema fields. Consider adding examples in future revisions if more clarity is needed for edge cases.
50-57
: Export Modal Usage Details:
The explanation for when and how the export modal is activated is comprehensive. Please ensure that the assumptions about the container hierarchy (e.g., the expectation of a specific number of containers) are validated at runtime to avoid potential issues in diverse environments.
70-73
: GraphQL Queries Section Introduction:
This brief introduction to the GraphQL queries sets the stage nicely for the detailed queries that follow.
262-292
: GraphQL Mutation and Input Types Definitions:
The new GraphQL mutation (upsertDataset) along with its input types (DatasetUpsertInput, SchemaMetadataInput, and SchemaFieldInput) are clearly defined and documented. This helps ensure that the integration with GMS is well understood.
1-340
: Overall Documentation Quality:
This README provides a comprehensive overview of the new Import/Export feature for DataHub, covering motivation, design, detailed implementation, and future considerations. The level of detail is excellent; just ensure minor editorial fixes and consistency improvements are applied.🧰 Tools
🪛 LanguageTool
[uncategorized] ~295-~295: Loose punctuation mark.
Context: ...d with these values on import: -name
: The name is extracted from the dataset ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~298-~298: Loose punctuation mark.
Context: ...aName: An empty string. -
platformUrn`: The platform URN is extracted from the ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~299-~299: Loose punctuation mark.
Context: ...ed in theresource
CSV field. -type
:SchemaFieldDataType.Null
- `nativeDat...(UNLIKELY_OPENING_PUNCTUATION)
[typographical] ~330-~330: The conjunction “so that” does not have a comma in front.
Context: ... corresponding columns to the CSV schema, so that we can populate those fields. Notably, ...(SO_THAT_UNNECESSARY_COMMA)
[misspelling] ~330-~330: This word is normally spelled as one.
Context: ...mn, it could also be uased to store the sub types of datasets. Additionally, the `glossa...(EN_COMPOUNDS_SUB_TYPES)
[style] ~334-~334: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...low, the dataset-level export will also need to be refactored to be more flexible, as a...(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...and as such, this component will likely need to be refactored to be more flexible. We w...(REP_NEED_TO_VB)
[style] ~338-~338: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...refactored to be more flexible. We will need to determine what shape the component shou...(REP_NEED_TO_VB)
[style] ~340-~340: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...rough GraphQL as a side effect. It will need to be evaluated whether this is an accepta...(REP_NEED_TO_VB)
🪛 markdownlint-cli2 (0.17.2)
2-2: Bare URL used
null(MD034, no-bare-urls)
This RFC documents the code changes for the import/export feature, which has been requested by the DataHub team.
Summary by CodeRabbit