How to train NER and Relation Extraction model together #9139

karndeepsingh · 2021-09-04T08:22:41Z

karndeepsingh
Sep 4, 2021

Hi,
I want to train NER and Relation Extraction model together instead of training separately. Please help me to understand the flow from scratch. Following are my queries :

How to prepare dataset for NER and Relation. I am using Label Studio for annotation of NER+ RELATION but I don't know how should I convert it into spacy binary format. Here is the example output of annotation when I annotate it from label studio.

[
  {
    "id": 6,
    "annotations": [
      {
        "id": 3,
        "completed_by": {
          "id": 2,
          "email": "[email protected]",
          "first_name": "",
          "last_name": ""
        },
        "result": [
          {
            "value": {
              "start": 9,
              "end": 63,
              "text": "Synergy One Lending, Inc. dba Mutual of Omaha Mortgage",
              "labels": [
                "PARTY NAME"
              ]
            },
            "id": "fkVqYYR_P7",
            "from_name": "label",
            "to_name": "text",
            "type": "labels"
          },
          {
            "value": {
              "start": 0,
              "end": 8,
              "text": "BORROWER",
              "labels": [
                "PARTY ROLE"
              ]
            },
            "id": "62m6jvJopr",
            "from_name": "label",
            "to_name": "text",
            "type": "labels"
          },
          {
            "value": {
              "start": 64,
              "end": 102,
              "text": "5716 Corsa Avenuew, Sulle 102 Westlake",
              "labels": [
                "PARTY ADDRESS"
              ]
            },
            "id": "ZMBV98QR7N",
            "from_name": "label",
            "to_name": "text",
            "type": "labels"
          },
          {
            "from_id": "62m6jvJopr",
            "to_id": "fkVqYYR_P7",
            "type": "relation",
            "direction": "right",
            "labels": [
              "ROLE"
            ]
          },
          {
            "from_id": "ZMBV98QR7N",
            "to_id": "fkVqYYR_P7",
            "type": "relation",
            "direction": "right",
            "labels": [
              "ADDRESS"
            ]
          }
        ],
        "was_cancelled": false,
        "ground_truth": false,
        "created_at": "2021-09-04T08:14:02.157201Z",
        "updated_at": "2021-09-04T08:14:02.157201Z",
        "lead_time": 80162.164,
        "prediction": {},
        "result_count": 0,
        "task": 6
      }
    ],
    "predictions": [],
    "file_upload": "New_Text_Document_2_E6Don9E.txt",
    "data": {
      "text": "BORROWER Synergy One Lending, Inc. dba Mutual of Omaha Mortgage 5716 Corsa Avenuew, Sulle 102 Westlake village."
    },
    "meta": {},
    "created_at": "2021-09-03T09:54:27.365638Z",
    "updated_at": "2021-09-03T09:54:27.365638Z",
    "project": 6
  }
]

Above is the output file when I annotate for Relation. Please guide me what are the key value pairs required to make it in spacy binary format.

Do I also need to save seperate annotated dataset for NER. As I want to club NER + RELATION, so, do I need to prepare separate annotated Dataset for NER or this relation annotated dataset will also work.
After annotated dataset is convert to spacy binary format what changes shall i do in files to get these NER + RELATION trained together?

Please help me to understand the flow. It will be a great help.
Thanks

Answered by polm

Sep 5, 2021

Don't use all capitals when creating an issue.

As mentioned in the training data section of the docs, training data in spaCy just consists of making the Doc objects you want to see as output from your model. One thing that's a little tricky is that you can't specify different training data for different components - if you want to train multiple components at the same time, you need give the same data to all of them. In this case this means that your training data needs both relation annotations and NER annotations on each individual doc. Because training data for relation extraction already includes entity labels you should just be able to use your relation extraction training data as is…

View full answer

polm · 2021-09-05T06:41:03Z

polm
Sep 5, 2021

Don't use all capitals when creating an issue.

As mentioned in the training data section of the docs, training data in spaCy just consists of making the Doc objects you want to see as output from your model. One thing that's a little tricky is that you can't specify different training data for different components - if you want to train multiple components at the same time, you need give the same data to all of them. In this case this means that your training data needs both relation annotations and NER annotations on each individual doc. Because training data for relation extraction already includes entity labels you should just be able to use your relation extraction training data as is for NER too.

I think you've already trained the components separately, but the NER annotations go in doc.ents and the relations go in doc._.rel. See the relation extraction sample project for an example.

For some components you can provide special annotations that are to just be ignored, usually by giving a "-" label. This works for NER but I don't think there's anything equivalent for relation extraction. This means you should annotate all your data with both NER and relations.

For the config, you just need to specify NER and relation extraction sections. The components should be in the pipeline in that order. You can see what a pipeline with multiple components looks like by using the Quickstart and specifying NER and textcat, for example. You might also want to experiment with specifying NER as an annotating component, since relation extraction uses the entity annotations.

0 replies

karndeepsingh · 2021-09-05T07:42:06Z

karndeepsingh
Sep 5, 2021
Author

Hi @polm ,
Thanks for detailed answer. The structure of annotation output that I have shared after labelling a single text from label studio looks good to you? If this output structure is fine and now additional changes are required then, I can directly convert this annotation json file to .spacy binary file?

I am unable to understand whether this annotation structure is fine or it need changes before I convert this json annotated output to .spacy binary format.

20 replies

Rithik-Alias Oct 27, 2021

Hey @karndeepsingh, did you get the error solved? I am also facing the same issue.

polm Oct 27, 2021

@Rithik-Alias There are a lot of different things happening in this thread, so it's not clear what you're referring to. It would be better for you open a new Discussion and clearly state what error you're seeing and what caused it.

Rithik-Alias Oct 27, 2021

@polm Thank you for replying to it. I have started a new discussion : #9551

panchalsp Mar 29, 2023

@polm you referring to this script right ?

Is this script also prepares the data for NER ? or it is just for relation? Because I want to train NER + Relation, how both annotations can be added ? Does this script already consider NER + Relation annotation? If not, then what shall I add in order to get NER + RELATION binary file generated?

Hey @karndeepsingh @polm Did this script work for both NER + Relation?

Faizan-Siddique Sep 25, 2023

Hi @karndeepsingh if possible could you share your script for conversion f label studio annotation json file into spacy binary format thanx in advance

karndeepsingh · 2021-09-09T09:52:45Z

karndeepsingh
Sep 9, 2021
Author

@polm How I can visualize the relation and entities using DisplayC ? I can only see NER is being displayed by using Displayc but how both relation and Entities can be displayed together?

Thanks

6 replies

karndeepsingh Sep 12, 2021
Author

Okay thanks @polm

Lolologist Sep 16, 2021

@karndeepsingh there are other tools out there that can, though! I think Label Studio is one.

karndeepsingh Sep 17, 2021
Author

@Lolologist @polm Hi, I am using Label Studio to do the annotation. But it doesn't output "token_start" and "token_end" when I annotate a document as token_start and token_end output is given in prodigy. So, I have to calculate this "token_Start" and "token_end" through code while creating .spacy binary file. Can you please help me to understand how I can achieve "token_start" and "token_end" for every document like what is the logic to get token_start or token_end.

Thanks

polm Sep 18, 2021

I asked you to not @ people before and Sofie just told you not to do it in another discussion. Stop doing that.

Lolologist Sep 21, 2021

Karneepsingh, you will likely need to make your own connector that translates between LS and spaCy. I'm afraid that's likely outside the scope of this thread and explosion's interests, however.

Lance-Owen · 2022-10-25T03:02:54Z

Lance-Owen
Oct 25, 2022

@karndeepsingh Hello, friend, I also want to make a ner and relation together model, but I don't know how to configure config, please ask, can you share your method, or config file, if you can thank you very much!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train NER and Relation Extraction model together #9139

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 26 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to train NER and Relation Extraction model together #9139

Replies: 4 comments · 26 replies

karndeepsingh Sep 5, 2021 Author

karndeepsingh Sep 9, 2021 Author

karndeepsingh Sep 12, 2021 Author

karndeepsingh Sep 17, 2021 Author

Replies: 4 comments 26 replies

karndeepsingh
Sep 5, 2021
Author

karndeepsingh
Sep 9, 2021
Author

karndeepsingh Sep 12, 2021
Author

karndeepsingh Sep 17, 2021
Author