Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JSON] Support Edit Values for JSON operator #1132

Open
ShihChun-H opened this issue Oct 9, 2024 · 18 comments · May be fixed by instill-ai/pipeline-backend#761
Open

[JSON] Support Edit Values for JSON operator #1132

ShihChun-H opened this issue Oct 9, 2024 · 18 comments · May be fixed by instill-ai/pipeline-backend#761
Assignees
Labels
component feature New feature or request hacktoberfest hacktoberfest2024 Component improvement issues for Hacktoberfest 2024 help-wanted Help from the community is appreciated improvement Improvement on existing features instill core

Comments

@ShihChun-H
Copy link
Member

ShihChun-H commented Oct 9, 2024

Issue Description

Current State

  • It is very difficult to manipulate JSON data with JSON operator.

Proposed Change

JSON schema pseudo code

JsonOperator:
  Task: Edit values
  
  Input:
    data: 
      type: object
      description: Original data, which can be a JSON object or array of objects.
    updates: 
      type: array
      description: An array of objects specifying the values to be updated.
      items:
        type: object
        properties:
          field: 
            type: string
            description: The field in the original data whose value needs to be updated, supports nested paths if "supportDotNotation" is true.
          newValue: 
            type: any
            description: The new value that will replace the current value at the specified field.

#    supportDotNotation:
#     type: boolean
#      default: true
#      description: Determines whether to interpret the field as paths using dot notation. If false, field is treated as a literal key.
    conflictResolution:
      type: string
      enum: [create, skip, error]
      default: skip
      description: Defines how to handle cases where the field does not exist in the data. 
      
  Output:
    data:
      type: object
      description: The modified data with the specified values updated.

Edge Cases and Considerations:

1. Non-Existent Fields:

  • You might decide to skip updates where field does not exist in the data or log a warning for such cases.

2. Type Mismatches:

  • Ensure that the new value is compatible with the existing data structure (e.g., if replacing an array with an object, ensure the change is intended).

Key Features:

  1. field: This specifies the exact location of the value to be edited. It allows for nested paths (e.g., address.city); otherwise, it treats the path as a literal key.
  2. newValue: This is the new value that will replace the current value at the specified field.
  3. conflictResolution Parameter:
    a. create: If the field does not exist, the function creates it.
    b. skip: If the field does not exist, the function skips the update (default behavior).
    c. error: If the field does not exist, the function logs an error or returns an error.

Type Checking:
Before updating a value, the type of the existing field is checked to ensure compatibility.

Example Usage:

Scenario: Input data as JSON object

// input
{
  "data": {
    "name": "John Doe",
    "age": 30,
    "address": {
      "street": "123 Main St",
      "city": "Anytown",
      "state": "CA"
    },
    "contacts": [
      {
        "type": "email",
        "value": "[email protected]"
      },
      {
        "type": "phone",
        "value": "555-1234"
      }
    ]
  },
  "updates": [
    {"field": "name", "newValue": "Jane Doe"},
    {"field": "address.city", "newValue": "Othertown"},
    {"field": "age", "newValue": "31"},
    {"field": "address.zipcode", "newValue": "12345"},
    {"field": "contacts.0.value", "newValue": "[email protected]"}
  ],
//  "supportDotNotation": true,
  "conflictResolution": "skip"
}

Conflict Resolution Scenarios:

1. Skip (Default):

  • Field "name": "John Doe" is updated to "Jane Doe".
  • Field "address.city": "Anytown" is updated to "Othertown".
  • Field "age": "30" is updated to "31".
  • Field "address.zipcode": "zipcode" field does not exist initially. Since conflictResolution is "skip", the update is skipped, and "zipcode" is not added.
  • Field "contacts.0.value": "[email protected]" is updated to "[email protected]".

Final output:

{
  "name": "Jane Doe",
  "age": 31,
  "address": {
    "street": "123 Main St",
    "city": "Othertown",
    "state": "CA"
  },
  "contacts": [
    {
      "type": "email",
      "value": "[email protected]"
    },
    {
      "type": "phone",
      "value": "555-1234"
    }
  ]
}

2. Alternate Scenario with "conflictResolution": "create":
If you had set "conflictResolution": "create", the "zipcode" field would have been created in the "address" object, and the output would look like this:

Final output:

{
  "name": "Jane Doe",
  "age": 31,
  "address": {
    "street": "123 Main St",
    "city": "Othertown",
    "state": "CA",
    "zipcode": "12345"
  },
  "contacts": [
    {
      "type": "email",
      "value": "[email protected]"
    },
    {
      "type": "phone",
      "value": "555-1234"
    }
  ]
}

3. Error:
If the conflictResolution is set to "error", the function will raise an error (or log an error) when it encounters a non-existent field during the update process. In this case, the update process will stop as soon as the non-existent field is encountered, and the existing fields will not be updated beyond that point.

// In this case, the function would return an error, and the output would likely look something like this:
  "error": "Field 'address.zipcode' does not exist."

Scenario: Input Data as an Array of Objects
If the input data is an array of objects, the logic needs to be adapted to handle each object in the array individually. The schema and the function would process each object within the array according to the specified updates and conflictResolution rules.

Input Example:

{
  "data": [
    {
      "name": "John Doe",
      "age": 30,
      "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "state": "CA"
      },
      "contacts": [
        {
          "type": "email",
          "value": "[email protected]"
        },
        {
          "type": "phone",
          "value": "555-1234"
        }
      ]
    },
    {
      "name": "Jane Smith",
      "age": 28,
      "address": {
        "street": "456 Oak St",
        "city": "Othertown",
        "state": "NY"
      },
      "contacts": [
        {
          "type": "email",
          "value": "[email protected]"
        }
      ]
    }
  ],
  "updates": [
    {"field": "name", "newValue": "Updated Name"},
    {"field": "address.city", "newValue": "New City"},
    {"field": "contacts.0.value", "newValue": "[email protected]"},
    {"field": "age", "newValue": 29}
  ],
//  "supportDotNotation": true,
  "conflictResolution": "create"
}

Explanation:

  1. Field "name": Updates "name" to "Updated Name" for each object in the array.
  2. Field "address.city": Updates the "city" in the "address" for each object to "New City".
  3. Field "contacts.0.value": Updates the first contact's value to "[email protected]" in each object.
  4. Field "age": Updates the "age" field to 29 for each object.

Output:

{
  "data": [
    {
      "name": "Updated Name",
      "age": 29,
      "address": {
        "street": "123 Main St",
        "city": "New City",
        "state": "CA"
      },
      "contacts": [
        {
          "type": "email",
          "value": "[email protected]"
        },
        {
          "type": "phone",
          "value": "555-1234"
        }
      ]
    },
    {
      "name": "Updated Name",
      "age": 29,
      "address": {
        "street": "456 Oak St",
        "city": "New City",
        "state": "NY"
      },
      "contacts": [
        {
          "type": "email",
          "value": "[email protected]"
        }
      ]
    }
  ]
}

Rules for the Component Hackathon

  • Each issue will only be assigned to one person/team at a time.
  • You can only work on one issue at a time.
  • To express interest in an issue, please comment on it and tag @kuroxx, allowing the Instill AI team to assign it to you.
  • Ensure you address all feedback and suggestions provided by the Instill AI team.
  • If no commits are made within five days, the issue may be reassigned to another contributor.
  • Join our Discord to engage in discussions and seek assistance in #hackathon channel. For technical queries, you can tag @chuang8511.

Component Contribution Guideline | Documentation | Official Go Tutorial

@ShihChun-H ShihChun-H added help-wanted Help from the community is appreciated improvement Improvement on existing features feature New feature or request instill core component hacktoberfest2024 Component improvement issues for Hacktoberfest 2024 labels Oct 9, 2024
@Prototype4988
Copy link

Hi Team, Is this issue currently open? I would like to work on this.

@ShihChun-H ShihChun-H changed the title [JSON] Support rename fields for JSON operator [JSON] Support Edit Values for JSON operator Oct 10, 2024
@kuroxx
Copy link
Collaborator

kuroxx commented Oct 10, 2024

hey @Prototype4988 , sounds great! I will assign this issue to you.

@Prototype4988
Copy link

Prototype4988 commented Oct 14, 2024

Just a few questions.

  1. Is this supposed to be a task that needs to be added in Json Operator? If it is could you confirm if the following are the stuffs that need to be done
  • Add task in definiton.json
  • Create Schema for task in tasks.json
  • Implement interface for the task in main.go
  1. Is there anyway i can check the working of the task interface in main.go without having to create a pipeline

@kuroxx
Copy link
Collaborator

kuroxx commented Oct 15, 2024

@chuang8511 could you please help answer the technical questions here 🙏

@chuang8511
Copy link
Member

Is this supposed to be a task that needs to be added in Json Operator? If it is could you confirm if the following are the stuffs that need to be done

Yes, actually, we have helped you finish the tasks.json & definition.json. So, what you have to do is only develop the golang code.

Is there anyway i can check the working of the task interface in main.go without having to create a pipeline

Usually, I create test code. You can refer to the main_test.go in some directory. In this way, you do not have to create the pipeline to do end to end test at the first place.

@Prototype4988
Copy link

@chuang8511 I have raised a draft PR. Can you check and let me know what you think of it.

Draft PR Support Edit Values for JSON operator

@kuroxx
Copy link
Collaborator

kuroxx commented Oct 23, 2024

Hey @Prototype4988 Our team has provided some feedback in your PR. Could you check and update soon, thanks!

Also don't forget to submit your contribution through this form to make it count: https://forms.gle/v3kdkKJKt8ZbSJYH6

@Prototype4988
Copy link

@chuang8511 Just a quick question

I have some trouble trying to create a recipe for this component

I have been trying something like this
variable:
test1: # Unique identifier for the variable.
instill-format: # Format type, e.g., image, string, array:string.
title: Json Input # Title of this input field.
description: Test values

instill-format doesnt seem to support object type.

Do you know any recipe that uses json value as input which i could refer?

@chuang8511
Copy link
Member

It seems there is a bug fix a few days ago.
Could you rebuild your instill-core?

image

Sample

# VDP Version
version: v1beta
variable:
  json-string:
    title: JSON
    instill-format: json

component:
  json-0:
    type: json
    input:
      json: ${variable.json-string}
    condition:
    task: TASK_MARSHAL  

output:
  result:
    title: Result
    value: ${json-0.output.string}

@Prototype4988
Copy link

Prototype4988 commented Oct 25, 2024

@chuang8511
I am trying something like this. looks like there is no support for instill-format: array:json.

Any workaround you could suggest.

variable:
  json-string:
    title: JSON
    instill-format: json
  updatearray:
    title: Update
    instill-format: array:json
  conflict:
    title: conflict Resolution
    instill-format: string
  supportdot:
    title: support supportDotNotation
    instill-format: boolean

component:
  json-0:
    type: json
    input:
      json: ${variable.json-string}
      updates: ${variable.updatearray}
      conflict-resolution: ${variable.conflict}
      supportDotNotation: ${variable.supportdot}
    task: TASK_EDIT_VALUES

output:
  result:
    title: Result
    value: ${json-0.output.string}

@kuroxx
Copy link
Collaborator

kuroxx commented Oct 29, 2024

Hey @Prototype4988 could you check the comments left in your PR: instill-ai/pipeline-backend#761 and update accordingly so we can get this closed soon 🙏 thank you & happy hacking 🎃

@Prototype4988
Copy link

@kuroxx I still have some doubts as mentioned below

@chuang8511 I am trying something like this. looks like there is no support for instill-format: array:json.

Any workaround you could suggest.

variable:
  json-string:
    title: JSON
    instill-format: json
  updatearray:
    title: Update
    instill-format: array:json
  conflict:
    title: conflict Resolution
    instill-format: string
  supportdot:
    title: support supportDotNotation
    instill-format: boolean

component:
  json-0:
    type: json
    input:
      json: ${variable.json-string}
      updates: ${variable.updatearray}
      conflict-resolution: ${variable.conflict}
      supportDotNotation: ${variable.supportdot}
    task: TASK_EDIT_VALUES

output:
  result:
    title: Result
    value: ${json-0.output.string}

@chuang8511
Copy link
Member

Hi @Prototype4988 ,
sorry to reply late.

How about trying this?

# VDP Version
version: v1beta
variable:
  json-string:
    title: JSON
    instill-format: string

component:
  json-0:
    type: json
    input:
      string: ${variable.json-string}
    condition:
    task: TASK_UNMARSHAL  

  json-1:
    type: json
    input:
      json: ${json-0.output.json}
    condition:
    task: TASK_MARSHAL  

output:
  json-0:
    title: JSON 0
    value: ${json-0.output.json}
  json-1:
    title: JSON 1
    value: ${json-1.output.string}

image

@Prototype4988
Copy link

@chuang8511 It works but my question is how do i pass array of json as input since it looks like there is no support for instill-format: array:json

@kuroxx
Copy link
Collaborator

kuroxx commented Nov 4, 2024

Hi @Prototype4988 how's this going?

I wanted to let you know that we will need a PR by the end of this week (8th Nov) since we are closing this event.

Please submit:

to ensure your contribution is counted!

Alternatively, if you cannot complete this within the time frame but would still like to contribute, you are more than welcome to but please note it would not be within the scope of Hacktoberfest 2024.

Thank you and look forward to your contribution! ✨

@Prototype4988
Copy link

Hi @kuroxx

There is a bug as mentioned below which is a blocker against the development of the component

@chuang8511 It works but my question is how do i pass array of json as input since it looks like there is no support for instill-format: array:json

Will try to complete it within end of week provided there is a resolution or work around for the bug.

@chuang8511 do you have any suggestions on the issue?

@kuroxx
Copy link
Collaborator

kuroxx commented Nov 5, 2024

Hey @Prototype4988

Could you check the documentation about Instill Format here.

In general, you do not need to specify the array, you can just set it as json.

For example:

Input:

[ 
  { "name": "Jane",  "surname": "Doe" },
  { "name": "Bob", "surname": "Woe"}
]

Recipe:

# VDP Version
version: v1beta

variable:
  json-input: 
    title: json-input
    instill-format: json
    description: "Your JSON input."

output:
  json-output:
    title: json-output
    value: ${json-1.output.string}

component:
  json-1:
    type: json
    input:
      json: ${variable.json-input}
    condition:
    task: TASK_MARSHAL  

Result:
image

You can read about how to use JSON Marshal in the Documentation here too

Let me know if this answers your questions ✨

@kuroxx
Copy link
Collaborator

kuroxx commented Nov 8, 2024

Hey @Prototype4988 could you address the comments in your PR today?

If not, then Hacktoberfest 2024 submissions will be closed but we look forward to your PR when it is ready 🙌✨

Update: as discussed in Discord, let's follow up on this after the event 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component feature New feature or request hacktoberfest hacktoberfest2024 Component improvement issues for Hacktoberfest 2024 help-wanted Help from the community is appreciated improvement Improvement on existing features instill core
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

4 participants