Finding Relevant Relations in Relevant Documents

Datset for ECIR'16 Paper


We present here our dataset of query-specific relations as described in:

Michael Schuhmacher, Benjamin Roth, Simone Paolo Ponzetto, and Laura Dietz. Finding Relevant Relations in Relevant Documents. To appear in Proc. of ECIR'16.

Supplementary Information: Q & A

The datasets below contain all information needed (besides the TREC data which have to be obtained from TREC).

In the following we answer some questions regarding the paper or dataset:

  1. Which queries where used exacty?
    The evaluation dataset contains more than thousend extracted relations belonging to the following 17 TREC queries: 201, 202, 205, 206, 208, 214, 216, 220, 223, 228, 234, 242, 250, 251, 253, 268, 270.

Evaluation Dataset

You find here the ground truth dataset for the query-specific relation extraction as described in the paper


Extraction correctness with the columns:


  1. sentence_id (queryid:docid:subj:obj:tokenids:predicate): sentence identifier containing the query number (201), the clueweb document id (clueweb12-0908wb-09-14790), the subject entity (Eben_Upton), the object entity (Raspberry_Pi_Foundation), the start/end document token offsets for the subject and the object surface form (881:883:887:890), the relation predicate (per:employee_or_member_of)
  2. text: the sentence text
  3. extraction_correctness_annotation: human ground truth if extraction is correct (1) or not (0)
  4. fact_id: the fact identifier of the corresponding fact relation (also contained in the sentence_id)

Fact relevance with the columns:


  1. qid: the TREC query id
  2. fact: the fact identifier containing the containing the query number (201), the subject entity (Eben_Upton), the object entity (Raspberry_Pi_Foundation), the relation predicate (per:employee_or_member_of)
  3. fact_relevance: human ground truth if the fact is relevant (1) w.r.t. the query or not (0)
  4. sum_pos_sentence_extraction_annotation: number of positive extraction correctness labels
  5. dbp_relation: predicate between subj and obj as contained in DBpedia (null, if non exist)
  6. subj_rel and obj_rel: if entity (subj or obj) is relevant w.r.t. the query; downloaded from here