This work was done during my internship at Microsoft Research Asia.
Problem
In this work, we mainly focus on the problem of
grounded language acquisition. Specifically, given a series of structured data and text pairs, we aim to establish the correspondence between data and text. For example, a sports reporter might write:
The Raptors edged out the Wizards 120 - 116.
We also have a table recording the statistics of this game. So we could know that the words "Raptors" and "Wizards" are two team names in the table, and the number "120" and "116" are the total points for two teams. More interestingly, people tend to use various phrases to describe information that is inferred from the data. For instance, the phrase "edged out" above implies that the Raptors beat Wizards with a very narrow margin, which could be inferred from the points difference.
Approach
We design a hidden semi-Markov model to address this problem. Since there's no additional supervision signal other than statistical occurrences of information between data and text, our model is trained in an unsupervised fashion. Also, to address the notorious "garbage collection" issue, we adopt the
posterior regularization technique.
Video
The
video presentation (15 min long) for this work is available on Vimeo, which was delivered by me on EMNLP 2018.
[Click to hide summary]