Assignments_2017 – CS140a Natural Language Annotation for Machine Learning

Assignment 1: Reading Case Studies from the Handbook of Annotation

Each set of annotation papers below cover a particularly kind of annotation but from different perspectives and/or different research groups. Read both papers in the set you have been assigned (see class email announcement). On Tuesday, Jan 24th you will be given time in class to discuss the papers in groups and come up with a short presentation (~10 minutes) describing the core ideas of the project, including the following:

Assignment Details

What was the goal of the task?
Which specific aspect of natural language did they try to capture?
What were the hurdles and blocks while capturing such phenomena?
How successful were their work, and how can we tell it’s successful or not?
What kinds of application can use such annotation?

Presentations will be Friday, January 27. You do not have to talk about the details of the annotation or tools. We will be revisiting the papers later in the semester where you will be doing a deeper dive into the work.

Paper Sets: The papers are available on Latte.

Set 1 Propbank and Framenet

25. VerbNet/PropBank-based Sense Annotation
Meredith Green, Claire Bonial, Orin Hargraves, Jinying Chen, Lyndsie Clark, and Martha Palmer

27. FrameNet: Frame Semantic Annotation in Practice
Collin Baker

Set 2 Sentiment Analysis

28. MPQA Opinion Corpus
Theresa Wilson, Janyce Wiebe, and Claire Cardie

29. The JDPA Sentiment Corpus for the Automotive Domain
Jason S. Kessler and Nicolas Nicolov

Set 3 Space

36. ISO-Space: Annotating Static and Dynamic Spatial Information
James Pustejovsky
37. Spatial Role Labeling Annotation Scheme
Parisa Kordjamshidi, Martijn van Otterlo, and Marie-Francine Moens

Set 4 Discourse

44. The Penn Discourse Treebank: An Annotated Corpus of Discourse Relations
Rashmi Prasad, Bonnie Webber, and Aravind Joshi

46. Annodis and Related Projects: Case Studies on the Annotation of Discourse Structure
Nicholas Asher, Farah Benamara, Philippe Muller, Stergos Afantenos, and Mai Ho Dac

Set 5 Speech/Dialog

47. NICT Kyoto Dialogue Corpus
Kiyonori Ohtake and Etsuo Mizukami

48. Case Study: The Austalk Corpus
Steve Cassidy, Dominique Estival, and Felicity Cox

Assignment 2: Annotation of dialog and reviews

Each group will be assigned an annotation effort from the set below. Read the paper and look through the guidelines. We’ll provide you some data to look at before Friday’s class. Your goal in class will be to try to annotate that data given the guidelines and create a presentation covering the goal of the annotation, a description and assessment of the guidelines (including your experience trying to follow them), an a quick summary of the evaluation results and what kinds of machine learning algorithms were tried.

Presentations on Tuesday, March 7

about 10-15 mins.
Annotation Goal
Task Description
Description of corpus
Annotation Guidelines and your experience applying it
Results: inter annotator agreement, evaluation results, 1-2 examples of ML algorithms used on the data (not all of these will be available for every project. Just include what you can find)
What you learned that applies to your project

Assignment Details

Topic Segmentation
Topic Annotation Guidelines
Paper on topic segmentation
Paper From AMI corpus work, but it’s pretty good.

Annotating dialogs in the AMI meeting corpus (for reference)

AMI Meeting Corpus

Corpus
Overview
Overview paper

Dialog Act
Annotation Guidelines for Dialog Act and Addressee
Paper
Paper

Topic Segmentation
Annotation Guidelines for Topic Segmentation
Paper

Assignment N (maybe): Conference papers on annotation

We will be reading, discussing and presenting on these papers in the future.
Assignment Details

Poesio, Massimo, and Ron Artstein. “Anaphoric Annotation in the ARRAU Corpus.” LREC. 2008.

Pyry Takala, Pekka Malo, Ankur Sinha, Oskar Ahlgren, “Gold-standard for Topic-specific Sentiment Analysis of Economic Texts, LREC 2015

Carlson, Lynn, Daniel Marcu, and Mary Ellen Okurowski. “Building a discourse-tagged corpus in the framework of rhetorical structure theory.” Current and new directions in discourse and dialogue. Springer Netherlands, 2003. 85-112.

Miltsakaki, Eleni, et al. “Annotating discourse connectives and their arguments.” Proceedings of the HLT/NAACL Workshop on Frontiers in Corpus Annotation. 2004.

Pustejovsky, James, Jessica L. Moszkowicz, and Marc Verhagen. “Using ISO-Space for annotating spatial information.” Proceedings of the International Conference on Spatial Information Theory. 2011.

Agarwal, Apoorv, et al. “Sentiment analysis of twitter data.” Proceedings of the workshop on languages in social media. Association for Computational Linguistics, 2011.

Poesio, Massimo. “Discourse annotation and semantic annotation in the GNOME corpus.” Proceedings of the 2004 ACL Workshop on Discourse Annotation. Association for Computational Linguistics, 2004.

Asher, Nickolas, et al. “Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus”

Jennifer D’Souza, Vincent Ng, “Annotating Inter-Sentence Temporal Relations in Clinical Notes”, LREC 2014

Snow, Rion, et al. “Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks.” Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2008.