The Builder Action Prediction (BAP) Task
We define the Builder Action Prediction (BAP) Task as the task of predicting the sequence of actions (block placements and/or removals) that a human Builder performed at a particular point in a human-human game.
Example
The following shows a sample sequence of human-human game states. The game starts with an empty grid and an initial A instruction (a), which B executes in the first action sequence (b) by placing a single block. In (c), B begins to execute the next A instruction given in (b). However, A interrupts B in (c), leading to two distinct B action sequences: (b)‐(c) (single block placement), and (c)‐(h) (multiple placements and removals).
Evaluation
To evaluate models for the BAP task, we compare each model's predicted action sequence against the corresponding action sequence that the human builder performed at that point in the game. Specifically, we compute a micro‐averaged F1 between net actions in the ground truth (human) sequence and in the model's predicted sequence.
Data
We use the Minecraft Dialogue Corpus. Our training, test and development splits contain 3709, 1616, and 1331 Builder action sequences respectively. In this work we also propose data augmentations techniques to generate more synthetic data for training. We increase the training data to 7418 (2x), 14836 (4x) and 22254 (6x) items by sampling items from the synthetic data.
BAP Models
We developed end‐to‐end neural models for the BAP task. Our best model achieves an F1 of 21.2%. Follow the instructions below to use our data and models.
Installation Instructions
Clone our
GitHub repo. It hosts our data items, augmented data items and models. Follow the instructions in the README for setup.
Citing our work
If you use this work, please cite:
@inproceedings{jayannavar-etal-2020-learning,
title = "Learning to execute instructions in a {M}inecraft dialogue",
author = "Jayannavar, Prashant and
Narayan-Chen, Anjali and
Hockenmaier, Julia",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.232",
pages = "2589--2602",
abstract = "The Minecraft Collaborative Building Task is a two-player game in which an Architect (A) instructs a Builder (B) to construct a target structure in a simulated Blocks World Environment. We define the subtask of predicting correct action sequences (block placements and removals) in a given game context, and show that capturing B{'}s past actions as well as B{'}s perspective leads to a significant improvement in performance on this challenging language understanding problem.",
}
Acknowledgements
This work was supported by Contract W911NF-15-1-0461 with the US Defense Advanced Research Projects Agency (DARPA) Communicating with Computers Program and the Army Research Office (ARO). Approved for Public Release, Distribution Unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.