Title:Advancing Computer Vision via Human-Machine Collaboration
Computer vision has made a lot of progress in the past several decades. However, aside from our roles as researchers and perhaps ground-truth generating minions, humans play a limited role in advancing the state of the art. This seems rather counter-productive since humans are often a working system whose performance we aim for machines to replicate (e.g. in semantic image understanding), and are frequently users of the technology (e.g. in image search). In this talk, I will describe my recent efforts in involving humans in advancing computer vision.
In the first part of my talk, I will present our work on allowing humans and machines to better communicate with each other. We utilize visual attributes as a mode of communication. Visual attributes are mid-level concepts such as "furry" and "metallic" that bridge the gap between low-level image features (e.g. texture) and high-level concepts (e.g. rabbit or car). They are shareable across different but related concepts. Most importantly, visual attributes are both machine detectable and human understandable, making them ideal as a mode of communication between the two. I will present our work on discovering a vocabulary of these attributes in the first place and on enhancing the communication power of these attributes by using them relatively. We utilize attributes for a variety of applications including improved image search and effective active learning of image classifiers.
In the second part of my talk, I will describe our recently-introduced "human-debugging" paradigm. It allows us to identify the aspects of machine vision approaches that require future research efforts. It involves replacing various components of a machine vision pipeline with human subjects, and examining the resultant effect on recognition performance. I will present several of our efforts within this framework that address image classification, object recognition and person detection. I will discuss the lessons learnt and present subsequent improvements to computer vision algorithms inspired by these findings. Besides computer vision, human-debugging is also applicable to other areas in AI such as speech recognition and machine translation.
Devi Parikh is a Research Assistant Professor at TTI-Chicago, an academic computer science institute affiliated with University of Chicago. She received her M.S. and Ph.D. degrees from the Electrical and Computer Engineering department at Carnegie Mellon University in 2007 and 2009 respectively, advised by Tsuhan Chen. She received her B.S. in Electrical and Computer Engineering from Rowan University in 2005.
Her research interests include computer vision, pattern recognition and AI in general. Recently, she has been involved in leveraging human-machine collaborations for building smarter machines. She has also worked on other topics such as ensemble of classifiers, data fusion, inference in probabilistic models, 3D reassembly, barcode segmentation, computational photography, interactive computer vision, contextual reasoning and hierarchical representations of images. She has held visiting positions at Cornell University, University of Texas at Austin, Microsoft Research and MIT. She was a recipient of the Carnegie Mellon Dean's Fellowship, National Science Foundation Graduate Research Fellowship, and the 2011 Marr Prize awarded at ICCV.