Skip to content

Latest commit

 

History

History
88 lines (78 loc) · 7.22 KB

ml-annotator.md

File metadata and controls

88 lines (78 loc) · 7.22 KB
copyright lastupdated
years
2015, 2017
2017-12-11

{:shortdesc: .shortdesc} {:new_window: target="_blank"} {:tip: .tip} {:pre: .pre} {:codeblock: .codeblock} {:screen: .screen} {:javascript: .ph data-hd-programlang='javascript'} {:java: .ph data-hd-programlang='java'} {:python: .ph data-hd-programlang='python'} {:swift: .ph data-hd-programlang='swift'}

This documentation is for {{site.data.keyword.knowledgestudiofull}} on {{site.data.keyword.cloud}}. To see the documentation for the previous version of {{site.data.keyword.knowledgestudioshort}} on {{site.data.keyword.IBM_notm}} Marketplace, click this link External link icon{: new_window}. {: tip}

Machine-learning model creation workflow

{: #ml_annotator}

Create a machine-learning annotator that trains a model you can use to identify entities, coreferences, and relationships of interest in new documents. {: shortdesc}

Understand the typical workflow for creating a machine-learning annotator component in {{site.data.keyword.watson}}™ {{site.data.keyword.knowledgestudioshort}}.

All the steps are performed by the project manager, except for the Annotate documents step, which is performed by the human annotator. Because human annotators are often subject matter experts, they might be consulted during the creation of workspace resources, such as the type system, also.

Shows the key steps you must perform to create a model.

Step Description

Create a workspace.

See [Creating a workspace](/docs/services/watson-knowledge-studio/create-project.html). A workspace contains the resources that are used to create the annotator component, including:

Type system

Import or create the type system, and define the entity types and relation types that human annotators can apply when annotating text. The annotator process manager typically works with subject matter experts for your domain to define the type system. See [Establishing a type system](/docs/services/watson-knowledge-studio/typesystem.html)

Source documents

Create a corpus by importing sample documents that are representative of your domain content into the workspace. See [Adding documents for annotation](/docs/services/watson-knowledge-studio/document-for-annotation.html). Partition the corpus into document sets, specify the percentage of documents that are shared among all document sets, and assign the document sets to human annotators. See [Creating and assigning annotation sets](/docs/services/watson-knowledge-studio/documents-for-annotation.html#wks_projdocsets).

Dictionaries

Import or create dictionaries for annotating text. You can choose to manually add dictionary entries or import entries from a file, and then edit the entries. See [Creating dictionaries](/docs/services/watson-knowledge-studio/dictionaries.html).

Optional: Pre-annotate documents

Pre-annotate documents according to the terms in the workspace dictionaries, mentions of AlchemyLanguage types, or based on rules that you define. See [Bootstrapping annotation](/docs/services/watson-knowledge-studio/preannotation.html#wks_preannotate).

Annotate documents

  1. The project manager assigns annotation tasks to human annotators, configures the inter-annotator agreement threshold, and provides annotation guidelines for the human annotators to follow. See [Creating an annotation task](/docs/services/watson-knowledge-studio/annotate-documents.html#wks_hatask).

  2. Human annotators use the Ground Truth Editor to manually annotate documents. A human annotator identifies mentions of interest in your domain content and labels them with entity types. The human annotator also identifies relationships between mentions (for example, Mary is an employee of IBM) and how the mentions co-reference the same entity (such as an occurrence of "she" that refers to Mary). Refer to the [User Guide: Ground Truth Editor](/docs/services/watson-knowledge-studio/user-guide.html).

Adjudicate and promote documents

Accept or reject the ground truth that was generated by human annotators, and adjudicate any annotation differences to resolve conflicts. Evaluating the accuracy and consistency of the human annotation effort might be the responsibility of a senior human annotator or a user with stronger subject matter experience than the project manager. See [Adjudication](/docs/services/watson-knowledge-studio/build-groundtruth.html#wks_haperform).

Train the model

Create the machine-learning annotator component. See [Creating a machine-learning annotator component](/docs/services/watson-knowledge-studio/train-ml.html#wks_madocsets).

Evaluate the model.

Evaluate the accuracy of the annotator component. See [Evaluating annotations added by the annotator component](/docs/services/watson-knowledge-studio/train-ml.html#wks_matest). Depending on annotator accuracy, this step might result in the need to repeat earlier steps again and again until optimal accuracy is achieved. See [Analyzing machine-learning model performance](/docs/services/watson-knowledge-studio/evaluate-ml.html) for ideas about what to update based on common performance issues.

Publish the model.

Export or deploy the model. See [Using the machine-learning model](/docs/services/watson-knowledge-studio/publish-ml.html).