Skip to content
Jinho D. Choi edited this page Dec 31, 2016 · 1 revision

Named Entity Recognition

Your task is to implement a named entity recognizer. You are allowed to work in groups of at most 2. Submit your work by Nov. 11th before the class.

  • Clone the Emory NLP project.
  • Download the benchmark dataset and brown clusters.
  • Run NERTrain with the default setting in config_train_ner.xml.
  • Improve the named entity recognizer.
  • Download the ontology in DBPedia and use it as ambiguity_class in NERFeatureTemplate.
  • Ensure the output of all chunks follow the BILOU notation.
  • Evaluate the accuracy of your system, precision, recall, and F1, on both the development and evaluation sets.
  • Write a report (4-8) pages in the ACL format. Your report must include abstract, introduction, related work, approach, experiments, and conclusion.
  • Commit all your work to your Github repository.

Data Format

Only      only      RB   _  O
France    france    NNP  _  U-LOC
and       and       CC   _  O
Britain   britain   NNP  _  U-LOC
backed    back      VBD  _  O
Fischler  fischler  NNP  _  U-PER
's        's        POS  _  O
proposal  proposal  NN   _  O
.         .         .    _  O

Each column represents:

  • 0: word-form.
  • 1: lemma (predicted).
  • 2: POS tag (predicted).
  • 3: extra features (blank).
  • 4: named entity recognition (gold).

Optimizer

Element Value
algorithm perceptron, softmax, adagrad, agagrad-mini-batch, agadelta-mini-batch, agagrad-regression
l1_regularization L1 regularization, lower-bound (for adagrad*)
learning_rate Learning rate
max_epochs Maximum number of epochs
batch_size Number of sentences used in mini-batch
roll_in Gold label probability, upper-bound
bias Bias value

Dictionary

Index Type DBPedia
0 PERSON Person, PersonFunction, Mayor, Name
1 NORP GeopoliticalOrganisation, Legislature, Parliament, PoliticalParty, ReligiousOrganisation, EthnicGroup
2 FACILITY ArchitecturalStructure, Cemetery, ConcentrationCamp, Garden, HistoricPlace, Mine, Monument, SkiResort, SportFacility, Park, Street
3 ORGANIZATION GovernmentAgency, Broadcaster, Company, EducationalInstitution, EmployersOrganisation, NonProfitOrganisation, SambaSchool, SportsLeague, SportsTeam, Website
4 GPE Country, Settlement, State
5 LOCATION Region, NaturalRegion, HistoricalRegion, Street, Territory, ProtectedArea, SkiArea, Island, NaturalPlace, Continent
6 PRODUCT Aircraft, Automobile, Locomotive, MilitaryVehicle, Motorcycle, Rocket, Ship, SpaceShuttle, Spacecraft, Train, Device, Drug, Food
7 EVENT NaturalEvent, Competition, SocietalEvent
8 WORK_OF_ART Artwork, Cartoon, CollectionOfValuables, Document, Film, Musical, MusicalWork, WrittenWork, TelevisionShow
9 LANGUAGE Language
10 DATE TimePeriod
11 MONEY Currency

CS571: Natural Language Processing

Instructor


Emory University

Clone this wiki locally