Annotation examples

This is an assignment from the last time I taught the class. I’m not sure we’ll do it again, but there are good links to example guidelines and papers.

Annotation Efforts

Part of Speech Tagging

Penn Treebank

Discourse Treebank

Time ML

Unified Linguistic Annotation Text Collection: Committed Belief and REFLEX Entity Translation

Language Understanding Corpus, Committed Belief

REFLEX Entity Translation Dev Test

CoNLL 2010: Detecting Uncertain Information and Resolution on in-sentence scopes of hedge cues

OntoNotes: coreference, named entity, parses, propositions, sense

More links to guidelines and tasks. Some of the data may overlap with data pointed to or linked to above


Propbank: Semantic Role Labeling (CoNLL-2005 Shared Task)

  • Propbank
Senseval 3

*Sem 2012: Resolving the Scope and Focus onf Negation


*Sem 2013 Semantically Textual Similarity


Possibly interesting options, but I couldn’t find data. But if you can, go for it!

Dependency Treebank


ACE: Automatic Content Extraction

BOLT: Broad Operational Language Translation

Assignment (to be done in pairs) (from 2016)

  • Using the information provided (which is a mix of guidelines and papers) determine
    the annotation goal
    the annotation task
  • Describe the properties fo the corpus. How was it collected? Is it a good example of sampling and balance?
  • Using the text provided (soon) follow the guidelines, annotating the text by hand individually.
  • In your groups, compare your annotations with the “gold standard” and discuss differences, what was hard, what was underspecified, what was clear.
  • Find at least two research groups that used the annotated corpus (you can each do one) and determine their annotation goal, what algorithms they usedhow they evalutated their work (e.g. f-measure, WER, etc), and what the result was.
  • Present to the class a description of the annotation project, your assessment of the guideline, and a summary of the research that used the data. You should have roughly 1 slide per bullet point (though one for each research group you look at)