Marine Carpuat receives NSF Career Award

Marine Carpuat, an assistant professor in the Department of Computer Science and the Institute for Advanced Computer Studies at the University of Maryland, has been named a recipient of the National Science Foundation (NSF) CAREER award for her work entitled Semantic Divergences Across the Language Barrier.  Carpuat’s research focuses on the problem of translation—which is more complicated than simply translating words, or finding cognates between and among languages. Carpuat demonstrates how the act of translating languages also involves a rendering of culture perspectives and understanding. Carpuat’s research introduces computational models and methods to help translate one language into another by building on and connecting seemingly disparate work on machine translation and semantic analysis. With these techniques she plans to use these models to detect and explain nuanced differences between words and sentences in various languages.  Perhaps during the course of her research, Carpuat will determine whether semantic models of language will be able to translate feeling—that which is sometimes untranslatable. 

Before she joined Maryland in January of 2015, Carpuat was as a Research Scientist at the National Research Council of Canada, and a postdoctoral researcher at the Columbia University Center for Computational Learning Systems. She earned a PhD in Computer Science from the Hong Kong University of Science & Technology (HKUST) in 2008. A recipient of the Outstanding Computer Science Professor Award for 2015-16, Carpuat has also served as the Area chair for Machine Translation and Multilinguality at EMNLP 2017 and the board of SIGLEX, the ACL Special Interest Group on Lexical Semantics.  She is also a member of the Computational Linguistics and Information Processing (CLIP) Lab at the university.

Carpuat kindly agreed to an interview about her work and her thoughts on the problems of translation.

When did you first start thinking about the problem of translation?

I first started thinking about translation as a computer science problem as a PhD student. Getting computers to translate automatically was fascinating to me because it is simultaneously a practical engineering challenge, and a lens to study all aspects of computational modeling of language, including syntax and semantics. At the time, I lived in Hong Kong in a very international and multilingual environment and volunteered as a French-English interpreter. These experiences made me keenly aware of the gap between machine translation as a research problem and the challenges raised by language barriers in the real world.

How do you expect language technology to help translators and language learners to help convey the complexities of moving from one language (and culture) to another?

When reading the news in different languages, the same topics or events can be discussed from widely different perspectives, and even faithful translations can be hard to understand without the appropriate linguistic and cultural background knowledge. Together with my students, I work on methods for detecting and characterizing the differences between the meaning of words, phrases and sentences across languages. Using machine learning techniques, we process huge quantities of text to create representations that let us compare the meaning of two chunks of text.  Our goal is to develop representations that do not only tell us  how similar two words are, but also explain how they differ, by determining which semantic relation holds between them (as in our recent papers published at NAACL in 2016 [1] and *SEM in 2017 [2]), or by providing salient examples of usage. For instance, the adjective ``liberal'' in English translates to ``libéral'' in French, but they are used very differently: a liberal politician typically refers to a left-wing politician in English, and to a right-wing politician in French. We can detect these differences by modeling patterns of word usage in context in large amounts of text.


What do you think is the biggest barrier to the act of translation?

As humans, we are endlessly creative with our use of language, This makes computational modeling of language and translation challenging, because no matter how much data we train our models on, it is never enough to observe all possible combinations of words that humans might come up with.  A key aspect of the NSF funded project is to better characterize the translation data we have, so that our machine learning models can use it more judiciously.  We will build on our recent results [3] which showed that translations that do not fully preserve the meaning of the source text can slow down the training of neural machine translation systems.


Your research is very interdisciplinary as it involves language study, psychology, computer science, and linguistics (and I’m sure that I’ve left out several other fields of study).  Might you want to comment on how your interdisciplinary work has attracted students from diverse backgrounds?

My research is primarily in computer science and is informed by linguistics, but as a member of the CLIP lab and of the Language Science community at UMD, I get to interact on a regular basis with faculty and students from many other disciplines.  All the students I have worked with have a strong Computer Science background, and they are often excited to use their CS skills to address problems that are relevant to other aspects of their lives. In my time at UMD, I have worked with several undergraduate students who double-major in CS and Linguistics, as well as students who have studied abroad or who grew up speaking multiple languages.

The CAREER award supports early career-development activities teacher-scholars “who most effectively integrate research and education within the context of the mission of their organization. Such activities should build a firm foundation for a lifetime of integrated contributions to research and education.”

The Department welcomes comments, suggestions and corrections.  Send email to editor [at] cs [dot] umd [dot] edu.