RCD-2020 (Retrieval From Conversational Dialogues)

A FIRE 2020 Shared Task

Get Started

Task motivation and description


Application of Information retrieval (IR) systems in dialogue based interactive systems has increasingly drawn attention from the research community. IR systems can be used to retrieve relevant information that can be used either for system generated answers or to add more context about a particular topic during an interactive dialogue between two entities. We explore the utility of IR systems to retrieve more information about entities discussed in interactive dialogues in this task.


The objective of the proposed track, to be organized as a part of FIRE 2020, is to automatically contextualize two-party or multi-party dialogue systems. We explore the utility of IR systems to retrieve more information about entities discussed in interactive dialogues. To study the problem under a laboratory-based reproducible setting, we propose a number of simplifications. First, since collecting multi-party chat dialogues can lead to individual privacy concerns, items requiring contextualization. Using movie scripts as a starting point, the task requires participants to retrieve relevant information from wikipedia given a dialogue span from a movie script. The task of the participants is to identify important or central entities in a span of movie dialogues and to subsequently retrieve a list of passages that provide more context or information about entities in that span. The annotated span of text (i.e. the one requiring contextualization) will not be disclosed to the participants. Instead, they need to estimate the information need from the given conversation. As an example, given the excerpt of the example script from the movie ‘12 angry men’, a simple approach would be to execute the whole script as a query and retrieve a ranked list of documents from a collection. Participants are encouraged to explore different methods to identify/formulate the query from the dialogue and then proceed with the retrieval step. 

The participants shall be provided with a manually annotated sample of dialogue spans extracted from four movie scripts along with entire movie scripts. The collection from which passages are to be retrieved for contextualization is the Wikipedia collection (dump from 2019). Each document in the Wikipedia collection is composed of explicitly marked-up passages (in the form of the paragraph tags). The retrievable units in our task are the passages (instead of whole documents).


For the RCD track, we have chosen conversations from movie scripts that constitute situations requiring contextualization. These are long conversations between one or more actors. One such example is the highlighted span of text (requiring contextualization) from the movie 12 angry men as shown below.

  • NO2: guilty. I thought it was obvious. I mean nobody proved otherwise.
  • NO8: is on the prosecution. The defendant doesn’t have to open his mouth. That’s in the Constitution. The Fifth Amendment. You’ve heard of it.
  • NO2: I... what I meant... well, anyway, I think he was guilty.
In the above piece of conversation example, the goal is to develop a system that will identify that Fifth Amendment is the piece of text that may require contextualization, and retrieve a ranked list of Wikipedia passages corresponding to this concept.


There are two separate tasks. Each team can participate in any one or both the tasks. The first task is entity linking whereas the second pertains to retrieving relevant information about identified entities.
Following are the two tasks.
  • Task 1: Given an excerpt of a dialogue act as shown in example above, output the span of text indicating a potential piece of information need (requiring contextualization), i.e., in case of the above example, output the text Fifth Amendment.
  • Task 2: Given an excerpt of a dialogue act (see above example), return a ranked list of passages containing information on the topic of the information need (requiring contextualization), i.e., with respect to the above example return passages from Wikipedia that contain information on the Fifth Amendment.

Training and Test Phases
During the training phase, we will release two pieces of information, namely i) a conversation piece from a movie script and ii) the span of text comprising the concept requiring contextualization. During the test phase, we will release only the conversation piece. Participants in Task-1 would then need to find the relevant piece of text from a given conversation, e.g. find 'Fifth Amendment' from the text excerpt in the above example. Participants in Task-2 only may not need to explicitly find 'Fifth Amendment' from the text. Rather they need to find documents (Wiki passages) from the given collection that provide information on this particular topic. Although the tasks are independent, we believe that scoring well on Task-1 will benefit the effectiveness of Task-2 as well. This is because, as the given conversation context may be quite diverse in terms of topics, identifying a suitable topic could help to construct a well-formed query that could help retrieve a more focussed list of documents. A too verbose query on the other hand (a simple approach could be to use the entire context as the query) may not retrieve relevant documents corresponding to the information need (Fifth Amendment in the example).

Evaluation Metrics

  • Task 1: For task-1 (information extraction), we will use the overlap of the ground-truth text span (e.g. ‘fifth amendment’) and the predicted text span (e.g. ‘Constitution. The Fifth Amendment’) with the Jacard coefficient measure. Exact match would lead to a Jacard coefficient of 1, and false positives or false negatives would penalize a predictive approach.
  • Task 2: For task-2 (passage retrieval), we will use mean average precision (MAP) to compare and evaluate different approaches. This metric would favour systems that retrieve a higher number of relevant passages towards the top ranks.


The movie scripts that were chosen to depict situations requiring contextualization involve relatively long conversations between one or more actors. For each movie script, we annotated a span of text that were manually assessed to be indicative of potential contextualization as shown in section above. We selected a number of play-style movie scripts for annotation, i.e. the ones which involve long dialogue acts for plot development. In total, we annotated 4 movie scripts, namely

The text extracted from the movie script along with the   BRAT   formatted annotation file can be found  here.

The dataset for this track comprises of:

  • Document Collection: The objective in this track (Task-2) is to retrieve a ranked list of Wikipedia passages. We release a pre-processed version of Wikipedia dump, where we have enclosed each paragraph of a Wikipedia page into separate XML tags. Each tag is assigned a unique identifier. The basic retrieval unit for this ranking task is hence a Wikipedia passage. Note that it's up to you if you want to treat the passages as units contained within a document or treat them independently while ranking them. Each passage has been (and will be) judged independently. You can download our pre-processed Wikipedia collection from this Google drive link. A sample mavenized Lucene project to help the participants get started with indexing and retrieval can be found here.
  • Queries/Topics: A query for the ranking task contains a dialog piece from a movie. The topic file is similar in structure to a standard TREC query. Each query (topic) starts with a topic tag. The num tag assigns a unique identifier to the topic. You have to use these ids in the output results file (more on this later). Each topic has a desc tag which contains a list of dialogues enclosed within 'p' tags. Each 'p' tag indicates a change of speaker. Some topics have identical description fields because while annotating we identified multiple concepts from the same dialog piece that may require contextualization. In such a case, the one with a smaller number is associated to the concept which occurs in the text before the other corresponding to the larger number. In addition to the 'desc' field, the training topics contains an additional title field which describes the exact span of text representing the information need. We make this information available for the participants to get an idea about the types of text spans that would typically require contextualization from dialog streams.
  • Relevance Judgments: For the training set of topics, we release a TREC qrel formatted file with 4 white-space separated columns, the first column denoting query id, the second unused, the third - a string denoting the passage (basic retrieval unit) identifier and the fourth indicating the relevance label (1/0) in our case. The training topics along with the relevance judgments can be downloaded from this link to training data. To obtain the relevance assessments, we constructed a pool of passages by a combination of retrieval with a number of standard IR models, such as BM25, LM, DFR etc. Passages in the pool were assessed by the organizors with respect to each input dialogue. For test topics, we will add the documents collected from the participating systems in the pool and reevaluate the extended pool.

Run Submission Format

For each query in the test topic file, the participants need to submit automatically generated outputs one or both the tasks.
  • Task-1: Participants need to submit a two column file where each line contains the query id and the predicted text span from the given description. An example task-1 submission file looks like
        1 [\t] That's in the Constitution. The Fifth Amendment.
        4 [\t] cute little switchknife
  • Task-2: Participants need to submit a standard TREC .res formatted file comprising of the following. The first column denotes query id (matches the ids of the test topics), the second is unused, the third is the retrieved document (Wiki passage) identifier, the fourth is the rank of this document (passage), the fifth is the similarity score and the sixth column denotes a run-name to distinguish between different runs (the run-name should be meaningful and representative of the method used to generate the run). An example task-2 submission file looks like
        1	Q0	10046153-34	1 13.23 BM25_termselection
        1	Q0	10275774-4	2 12.58 BM25_termselection
        2	Q0	5202223-19	1 7.64 BM25_termselection
        2	Q0	527390-11	2 7.37 BM25_termselection
Note that since the retrievable units for our track are Wikipedia passages (not documents), in the collection provided we have assigned unique identifiers to passages. The naming convention for a passage is doc number-passage offset, i.e. two integers separated by a hyphen character (encoded within the pno tags within the collection provided). Task-2 participants will be required to print these identifiers in the third column of the run submission file. Using other arbitrary identifiers would not enable us to match the relevant retrieved passages from the ones in the qrel file. A sample XML excerpt for a Wikipedia document, titled Anarchism is shown below.

For submission, you need to send your runs to rcd2020@firetask@gmail.com.


Important Dates

  • Training Data Release (Download link)- 16th July, 2020
  • Test Data Release (Download link) - 16th July, 2020
  • Run Submission Deadline - 5th September, 2020
  • Results Declaration - 15th September, 2020
  • Working Note Submission - 8th October, 2020
  • Review Notifications - 25th October, 2020
  • Final Version of Working Note - 5th November, 2020


Results for Task 1

Team Name Run Name Weighted BlEU Score
ADAPT F6_4_model2 0.0583
ADAPT F7_4_model3 0.1020
ADAPT F5_0_model1 0.0895
ADAPT Y7_2_model5_2 0.0018
ADAPT F8_4_model4 0.0796

Contact US

Please reach out to the organizers for any questions. You can also mail to rcd2020firetask@gmail.com.