Skip to main content

Computer Vision for the Audio Domain

Dr. Rahul Sukthankar

Intel Research & Carnegie Mellon

Friday, December 1, 2006
Talk: 11:00 am - 12:00 pm
3105 Engineering

Host: Rong Jin


Recent work in geometry and machine learning has significantly benefited computer vision, particularly in the areas of object recognition and image retrieval. We argue that computer vision techniques that have been successfully applied in those settings can effectively be translated to other domains, such as audio. This claim is supported by recent results in music vs. speech classification, structure from sound, robust music identification and sound object recognition. This talk presents an overview of some of these new ideas in computer vision and demonstrates how they map naturally to problems in the audio domain.


Rahul Sukthankar is a principal research scientist at Intel Research Pittsburgh and adjunct research professor in the Robotics Institute at Carnegie Mellon. He was previously a senior researcher at HP/Compaq's Cambridge Research Lab and a research scientist at Just Research. Rahul received his Robotics Ph.D. from Carnegie Mellon and his B.S.E. summa cum laude in computer science from Princeton. His current research focuses on computer vision and machine learning, particularly in the areas of object recognition, information retrieval and projector-camera systems.