Launch of the EPSRC Centre for Doctoral Training in Data Science

By EPSRC Centre for Doctoral Training in Data Science

Date and time

Mon, 3 Nov 2014 14:00 - 18:30 GMT

Location

Informatics Forum

University of Edinburgh 10 Crichton St Edinburgh EH8 9AB United Kingdom

Description

Distinguished Lectures to mark the Launch of the

EPSRC Centre for Doctoral Training in Data Science

14:00

Registration & Welcome Refreshments

Tea & Coffee Available

14:30

Welcome & Introduction

Professor Chris Williams, Director of the CDT in Data Science, University of Edinburgh

Professor Richard Kenway, Vice Principal of High Performance Computing, University of Edinburgh

15:00

Distinguished Lecture: “At the Intersection of Data Science and Language”

Professor Kathleen R. McKeown, Director of the Institute for Data Sciences and Engineering, Columbia University

16:00

Refreshment Break

Tea & Coffee Available

16:30

Distinguished Lecture: “Bottom-Up Semantics”

Professor Fernando C. N. Pereira, Research Director, Google

17:30

Evening Reception

18:30

For further information about the EPSRC Centre for Doctoral Training in Data Science please visit http://datascience.inf.ed.ac.uk.

Professor Kathleen R. McKeown

Director of the Institute for Data Sciences and Engineering, Columbia University

Abstract: “At the Intersection of Data Science and Language”

Data science holds the promise to solve many of society’s most pressing challenges. But much of the data necessary to solve problems is locked within volumes of text and speech on the web. Thus, in many cases, data science can only succeed if paired with natural language processing. In this talk, I will discuss the data science initiative at Columbia University and research within its New Media Center, where we investigate the analysis of news, Twitter, online discussion and as well as texts coming from digital humanities, scientific and other disciplines. I will describe research projects that draw from scientific journals, from historical sources, from online media, and from novels.

Biography:

Kathleen R. McKeown is the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University and she also serves as the Director of the Institute for Data Sciences and Engineering. She served as Department Chair from 1998-2003 and as Vice Dean for Research for the School of Engineering and Applied Science for two years. McKeown received the Ph.D. in Computer Science from the University of Pennsylvania in 1982 and has been at Columbia since then. Her research interests include text summarization, natural language generation, multi-media explanation, question-answering and multi-lingual applications. In 1985 she received a National Science Foundation Presidential Young Investigator Award, in 1991 she received a National Science Foundation Faculty Award for Women, in 1994 she was selected as a AAAI Fellow, in 2003 she was elected as an ACM Fellow, and in 2012 she was selected as one of the founding Fellows of the Association for Computational Linguistics. In 2010, she received the Anita Borg Institute Women of Vision Award in Innovation for her work on text summarization. McKeown is also quite active nationally. She has served as President, Vice President and Secretary-Treasurer of the Association of Computational. She also served as a board member of the Computing Research Association and as secretary of the board.

Professor Fernando C. N. Pereira

Research Director, Google

Abstract: “Bottom-Up Semantics”

Advances in statistical and machine learning approaches to natural-language analysis have yielded a wealth of methods and applications in information retrieval, speech recognition, machine translation, and information extraction. Yet, even as we enjoy these advances, we recognize that our successes are to a large extent the result of clever exploitation of redundancy in language structure and use, allowing our algorithms to eke out a few useful bits that we can put to work in applications. By focusing on applications that extract a limited amount of information from the text, finer structures such as word order or syntactic structure could be largely ignored in information retrieval or speech recognition. However, by leaving out those finer details, our language-processing systems have been stuck in an "idiot savant" stage where they can find everything but cannot understand anything. Our main language processing challenge is to create robust, accurate, efficient methods that learn to understand the main entities and concepts discussed in any text, and the main claims made. These will enable our systems to answer questions more precisely, to verify and update knowledge bases, and to trace arguments for and against claims throughout the written record. I will argue with examples from my team’s research that we need deeper levels of linguistic analysis to do this. But I will also argue that it is possible to do much that is useful even with our very partial understanding of linguistic and computational semantics, by taking (again) advantage of distributional regularities and redundancy in large text collections to learn effective analysis and understanding rules.

Biography:

Fernando Pereira is a distinguished researcher at Google, where he leads work on language understanding. His previous positions include chair of the Computer and Information Science department of the University of Pennsylvania, head of the Machine Learning and Information Retrieval department at AT&T Labs, and research and management positions at SRI International. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982, and he has over 120 research publications on computational linguistics, machine learning, bioinformatics, speech recognition, and logic programming, as well as several patents. He was elected AAAI Fellow in 1991 for contributions to computational linguistics and logic programming, and ACM Fellow in 2010 for contributions to machine-learning models of natural language and biological sequences. He was president of the Association for Computational Linguistics in 1993.

Organised by

EPSRC Centre for Doctoral Training in Data Science

The EPSRC Centre for Doctoral Training in Data Science, hosted by the University of Edinburgh, will train a new generation of data scientists, comprising 50 PhDs over five intake years, with the technical skills and interdisciplinary awareness necessary to become R&D leaders in this emerging area.

Sales Ended