We present here our dataset of query-specific relations as described in: Michael Schuhmacher, Benjamin Roth, Simone Paolo Ponzetto, and Laura Dietz. Finding Relevant Relations in Relevant Documents. To appear in Proc. of ECIR'16.
The datasets below contain all information needed (besides the TREC data which have to be obtained from TREC). In the following we answer some questions regarding the paper or dataset: Supplementary Information: Q & A
- Which queries where used exacty? The evaluation dataset contains more than thousend extracted relations belonging to the following 17 TREC queries: 201, 202, 205, 206, 208, 214, 216, 220, 223, 228, 234, 242, 250, 251, 253, 268, 270.
Evaluation Dataset
You find here the ground truth dataset for the query-specific relation extraction as described in the paper
Extraction correctness with the columns:
- sentence_id (queryid:docid:subj:obj:tokenids:predicate): sentence identifier containing the query number (201), the clueweb document id (clueweb12-0908wb-09-14790), the subject entity (Eben_Upton), the object entity (Raspberry_Pi_Foundation), the start/end document token offsets for the subject and the object surface form (881:883:887:890), the relation predicate (per:employee_or_member_of)
- text: the sentence text
- extraction_correctness_annotation: human ground truth if extraction is correct (1) or not (0)
- fact_id: the fact identifier of the corresponding fact relation (also contained in the sentence_id)
Fact relevance with the columns:
- qid: the TREC query id
- fact: the fact identifier containing the containing the query number (201), the subject entity (Eben_Upton), the object entity (Raspberry_Pi_Foundation), the relation predicate (per:employee_or_member_of)
- fact_relevance: human ground truth if the fact is relevant (1) w.r.t. the query or not (0)
- sum_pos_sentence_extraction_annotation: number of positive extraction correctness labels
- dbp_relation: predicate between subj and obj as contained in DBpedia (null, if non exist)
- subj_rel and obj_rel: if entity (subj or obj) is relevant w.r.t. the query; downloaded from here