Based upon guidance provided by the University System of Georgia, all Georgia Tech sponsored events through July 31, including athletics competitions, are cancelled, postponed or will move to a virtual format.

Thursday, March 12 2020
11:30am - 1:00pm
Technology Square Research Building, 1st Floor Ballroom, Atlanta, Ga
Free food
For more information:
Add To My Calendar
GVU Center Brown Bag: David Reitter "Modeling Language in its Visual Context"


Words are always spoken and understood in context. Linguists, psychologists and computer scientists have recognized this when they defined word meaning using their aggregate linguistic contexts across large samples.  Early examples include Latent Semantic Indexing (Landauer & Dumais 1997), while recently, some computational models have become vastly successful in NLP, such as Word2Vec (Mikolov et al, 2013) or BERT (Devlin et al., 2018).

However, when young children learn to speak, they refer to a much richer context than language alone. Their parents expose them to a rich visual world, and that is also a common context for adult language use. In this talk, I argue that modern models for commonsense reasoning and natural language processing should therefore learn from visual and linguistic data.

I will present two examples of multimodal neural models that can bring together visual and linguistic context. One is a language model that predicts next words better when trained on image embeddings in addition to their captions (Ororbia et al., ACL 2019). The other is a BERT-based model that set a new state-of-the-art in visual commonsense reasoning, choosing answers to questions about movie stills and also giving a reason for these answers (Alberti et al., EMNLP 2019). It is pre-trained on a WWW corpus of images and ALT tags (texts added for accessibility purposes) before learning to answer questions about the movie screenshots.

Speaker Bio:

David Reitter is a senior research scientist at Google Research, New York City, where he works on modeling conversational and multimodal interaction using very large-scale data. Until recently, he was an associate professor of information sciences at Penn State. There, his research group carried out NSF-funded research on computational models of human cognition. David did his postdoc in psychology at Carnegie Mellon University and holds a PhD in informatics from the University of Edinburgh (2008).

Schedule of Brown Bag Speakers Spring 2020

Click images in enlarge.