Title: Reliability Engineering of the Google Infrastructure
Dr. Zhiwei (Jerry) Cen, Google
Date: October 1, 2010
Time: 11:30 am
Room: 1279 Anthony Hall
Abstract:Google's mission is to organize the world's information and make it universally accessible and useful. To achieve such a tremendous task one can imagine the amount of computing power behind the scene. Those are not just consumer-level computers running an open-source operating system. Those are fully-charged computing particles acting for different functionalities in a rumbling information processing fleet. The site reliability engineering department in Google manages the availability, functionality and serving efficiency of Google infrastructure and services. In this talk I will give an overview of site reliability engineering at Google, explaining its purposes and describing the challenges it addresses. As a site reliability engineer I will also share some experience of my daily work, as well as how I relate my background in computer networks and distributed systems to an actual industrial network.
Zhiwei (Jerry) Cen is currently a site reliability engineer in Google in Mountain View, California. He received his Ph.D. degree from Michigan State University in 2007. Before that he received his B.S. and M.S. degrees in computer science and engineering from Fudan University in Shanghai, China. During his Ph.D. studies, Zhiwei was a member of the ELANS Laboratory under the guidance of Dr. Matt Mutka. His research projects included network quality of service in wired and wireless environments, pervasive computing and sensor networks. His research results have been published in a range of conferences and journals.