Collaborative Dialogue in Minecraft

Anjali Narayan-Chen*, Prashant Jayannavar*, Julia Hockenmaier
Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics (ACL)
, July 2019
*equal contribution

Supplementary Materials
The Minecraft Dialogue Corpus
Code -- data collection and baseline models
NB: Code for data collection will be made available shortly
We wish to develop interactive agents that can communicate with humans to collaboratively solve tasks in grounded scenarios. Since computer games allow us to simulate such tasks without the need for physical robots, we define a Minecraft-based collaborative building task in which one player (A, the Architect) is shown a target structure and needs to instruct the other player (B, the Builder) to build this structure. Both players interact via a chat interface. A can observe B but cannot place blocks. We present the Minecraft Dialogue Corpus, a collection of 509 conversations and game logs. As a first step towards our goal of developing fully interactive agents for this task, we consider the subtask of Architect utterance generation, and show how challenging it is.

The Collaborative Building Task

We define the Collaborative Building Task as a two-player game between an Architect (A) and a Builder (B). A is given a target structure (Target) and has to instruct B via a text chat interface to build a copy of Target on a given build region. A and B can communicate back and forth via chat throughout the game (e.g. to resolve confusions or to correct B's mistakes). B is given access to an inventory of 120 blocks of six given colors that it can place and remove. A can observe B and move around in its world, allowing it to provide instructions from varying perspectives. But A cannot move blocks, and remains invisible to B. The task is complete when the structure built by B (Built) matches Target, invariant to translations within the horizontal plane and rotations about the vertical axis. Built also needs to lie completely within the boundaries of the predefined build region.

Dataset Examples

The following sequence shows intermittent screenshots from an instance of the Collaborative Building Task:


We implement this task within Malmo, a version of Minecraft for AI research that also includes an API to create agents etc. We have built a data collection platform and have used it to collect the Minecraft Dialogue Corpus, consisting of 509 human-human written dialogues, screenshots and complete game logs for this task. For more details on the task, the human-human dialogue dataset we created, baseline models, etc., please refer to our ACL 2019 paper on this work (supplementary here).

The Minecraft Dialogue Corpus

NB: Code and data will be made available shortly
As part of the task, we record the game log as well as screenshots. The human-human dialogue data we collected can be accessed here. There are two zip files:
  1. The Minecraft Dialogue Corpus -- with
  2. The Minecraft Dialogue Corpus -- no
The former contains screenshots taken during the Collaborative Building Task as well. If you do not need them, we recommend downloading the zip file without this data as it is significanlty smaller in size.

This document in the same Google Drive folder describes the data we collect and the data format used for our game logs.

This file contains the data splits we use for modeling purposes. These splits were done across target structures. There are three sets in it: train (target structures used in training data), test (target structures used in test data) and val (target structures used in validation data). Hence, all of the corresponding dialogue data collected for a certain target structure goes into the data split which the structure has been assigned to.

The Architect utterance generation subtask

Although the Minecraft Dialogue Corpus was motivated by our ultimate goal of building agents that can successfully play an entire collaborative building game as Architect or Builder, we first consider the task of Architect utterance generation: given access to the entire game state context leading up to a certain point in a human-human game at which the human Architect spoke next, we aim to generate a suitable Architect utterance.

Our Systems

You will be able to try out the following systems for the Collaborative Building Task:
  1. Data collection (to collect human-human dialogue data)
  2. Creating target structures (to create new target structures for the task)
  3. Baseline models we developed for the Architect utterance generation subtask

Installation Instructions

Instructions for data collection and creating target structures will be made available shortly
  1. Clone the following repo which hosts our machine learning modeling related code for the Architect system:
  2. To run the baselines models for the Architect utterance generation subtask follow this

Citing our work

If you use this work, please cite:

@inproceedings{narayan-chen-etal-2019-collaborative, title = "Collaborative Dialogue in {M}inecraft", author = "Narayan-Chen, Anjali and Jayannavar, Prashant and Hockenmaier, Julia", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "", pages = "5405--5415", }


This work was supported by Contract W911NF-15-1-0461 with the US Defense Advanced Research Projects Agency (DARPA) Communicating with Computers Program and the Army Research Office (ARO). Approved for Public Release, Distribution Unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Last modified: Thu Jul 25 15:46:51 CDT 2019