Hundreds of billions of photographs are uploaded on the web each year. An important step towards automatically analyzing the content of these photographs is building computer vision models that can recognize and localize all the depicted objects. Traditionally, work on scaling up object recognition has focused on algorithmic improvements, e.g., building more efficient or more powerful models. However, I will argue that data plays at least as big a role: effectively collecting and annotating the right data is a critical component of scaling up object detection.
The first part of the talk will be about constructing an object detection dataset (as part of the ImageNet Large Scale Visual Recognition Challenge) that is an order of magnitude larger than previous datasets such as the PASCAL VOC. I will discuss some of the decisions we made in designing this benchmark as well as some of our crowd engineering innovations. The availability of this large-scale data gives us as a field an unprecedented opportunity to work on designing algorithms for scalable and diverse object detection. It also allows for thorough analysis to understand the current algorithmic shortcomings and to focus the next round of algorithmic improvements.
In the second part of the talk, I will bring together the insights from large-scale data collection and from recent algorithmic innovations into a principled human-in-the-loop framework for image understanding. This approach can be used both for reducing the cost of large-scale detailed dataset annotation efforts as well as for effectively understanding a single target image.
Olga Russakovsky (http://cs.cmu.edu/~orussako) is a postdoctoral research fellow at Carnegie Mellon University. She recently completed a PhD in computer science at Stanford University advised by Prof. Fei-Fei Li. Her research interests are in computer vision and machine learning, specifically focusing on large-scale object detection and recognition. She was the lead organizer of the ImageNet Large Scale Visual Recognition Challenge (http://image-net.org/challenges/LSVRC) for two years, which was featured in the New York Times and MIT Technology Review. She organized multiple workshops and tutorials at premier computer vision conferences, including helping pioneer the “Women in Computer Vision” workshop at CVPR’15. She founded and directs the Stanford AI Laboratory’s outreach camp SAILORS ( http://sailors.stanford.edu, featured in Wired) designed to expose high school students in underrepresented populations to the field of AI.
Dr. Arun Ross