Linear Discriminant Analysis for Dimension Reduction and Visualization of Clustered High Dimensional Data
Professor and Associate Chairperson
Computational Science and Engineering Division
Georgia Institute of Technology
Friday, April 3
10:20 AM - 11:20 AM
1257 Anthony Hall
Host: Pang-Ning Tan
One of the major challenges in analyzing modern data sets is that often they are massive and high dimensional.
Dimension reduction is imperative for efficient processing of high dimensional data. Numerous dimension reduction methods such as principal component analysis (PCA) and latent semantic indexing (LSI) have been developed. When the data set is already clustered, Linear Discriminant Analysis (LDA) has been utilized as an effective method of choice. We review the classical LDA which is applicable only when the data set is oversampled and show how it can be generalized so that it becomes applicable regardless of the relative size between the data points and the data dimension. We also present nonlinear extension of discriminant analysis based on kernel functions. Some experimental results from text classification, facial recognition, and fingerprint classification demonstrate effectiveness of the LDA based approaches. We show how LDA is further developed into methods for effective 2D visualization of clustered high dimensional data.
Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She was on the faculty of the Department of Computer Science and Engineering, University of Minnesota, Twin Cities, from 1987 to 2005. From 2003 to 2005, she served as a program director for the Computing and Communication Foundations Division at the National Science Foundation, Arlington, VA, U.S.A. Since July 2005, she has been a professor in the Computational Science and Engineering Division at the Georgia Institute of Technology, Atlanta, Georgia where she is currently the associate chair. Her research interests include numerical algorithms, data analysis, bioinformatics, and parallel computing. She has published over 120 refereed research papers in these areas. She is the director of the NSF/DHS FODAVA (Foundations of Data and Visual Analytics) project where the goal is to create mathematical and computational foundations for data and visual analytics which is a newly emerging discipline of science of analytical reasoning facilitated by data analysis and interactive visualization. Prof. Park has served on numerous conference committees including conference co-chair for SIAM International Conference for Data Mining in 2008 and 2009. Currently she is on the editorial board of BIT Numerical Mathematics, SIAM Journal on Matrix Analysis and Applications, Statistical Analysis and Data Mining, and International Journal of Bioinformatics Research and Applications.