Project: Large-Scale Multi-label Learning (NSF IIS-0643494)

 

PI: Rong Jin (rongjin@cse.msu.edu), Michigan State University

 

Abstract:

Important applications in science and business depend on automatic classification. Multi-label learning refers to the classification problem where each example can be assigned to multiple class labels simultaneously. It has found applications in many different domains, such as natural language processing, computer vision, human computer interaction, bioinformatics, health care, and physiology. Existing machine learning technologies are unsuitable for large-scale multi-label learning because they are unable to handle rare class classification problems and distinguish classes with similar input patterns.

 

To overcome the limitation of the existing approaches, the project will develop a relation propagation framework for multi-label learning that explicitly exploits the similarity of examples and the correlation among classes simultaneously. In particular, this project will include research to (1) develop efficient optimization algorithms for the proposed relation propagation framework; (2) develop effective algorithms for learning the similarity of examples and the correlation among classes; (3) develop effective active learning algorithms for multi-label learning; and (4) evaluate the proposed framework for multi-label learning through three real world applications.

 

The project will advance the state of the art of techniques for large-scale multi-label learning through the development of relation propagation framework, which in return will have a significant impact on a wide range of applications. The research results will also enhance the current machine learning curricula, and improve the education of the information technology workforce.

 

Students:

 

  1. Miao Xu (joined the project since Apr. 2012)
  2. Mehrdad Mahdavi (joined the project since Jun., 2011)
  3. Jinfeng Yi (joined the project since Jan., 2010)
  4. Tianbao Yang (joined the project since Aug., 2008)
  5. Liu Yang (http://www.cse.msu.edu/~yangliu1/) (graduated on July, 2008)
  6. Feng Kang (graduated on Dec., 2007)

 

Collaborator:

  1. Rahul Sukathankar, Intel Research Pittsburgh. The collaboration with Dr. Sukathankar is focused on visual object recognition. .

  2. Anil K. Jain, Dept. of Computer Science and Engineering, Michigan State University. The collaboration with Dr. Jain is focused on applying multi-label learning and distance metric learning techniques to automated image annotation and content-based image retrieval.

  3. Christina Chan, Dept. of Chemical Engineering, Michigan State University. The collaboration with Dr. Chan is focused on the applying of multi-label learning to the discovery of gene regulatory networks. 

  4. Jieping Ye, Dept. of Computer Science and Engineering, Arizona State University. The collaboration with Dr. Ye is focused on developing novel methods for kernel learning and efficient algorithms for multi-label learning. 

  5. Shijun Wnag, Diagnostic Radiology Department, Clinical Center, National Institutes of Health. The collaboration with Dr. Wang is focused on distance metric learning.

  6. Steven C. H. Hoi, Division of Information Systems, School of Computer Engineering, Nanyang Technological University. The collaboration with Dr. Hoi is focused on kernel learning and online learning.

 

Project Goal:

To address the fundamental challenge in large-scale multi-label learning,  this project will develop a relation propagation framework  for multi-label learning that explicitly exploits the similarity of examples and the correlation among classes simultaneously. In particular, the object of this project to advance the state of the art of techniques for large-scale multi-label learning in the following aspects: (1) develop efficient optimization algorithms for the proposed relation propagation framework; (2) develop effective algorithms for learning the similarity of examples and the correlation among classes; (3) develop effective active learning algorithms for multi-label learning; and (4) evaluate the proposed framework for multi-label learning through three real world applications.

 

Research Challenges:

There are fundamental challenges in developing algorithms and theories for large-scale multi-label learning:

  1. Rare class classification. Previous studies showed that large-scale multi-label learning tends to include a number of rare classes that are only assigned to a small portion of the examples. Thus, the research question is how to automatically classify the rare classes given their limited number of training examples.

  2. Classes with similar input patterns. With increasing numbers of classes, it is likely that different classes will share similar input patterns, making it difficult to distinguish them. Thus, the open research question is how to distinguish the classes that share similar input patterns.

 

Current Results:

  1. Develop active learning algorithms to facilitate distance metric learning and kernel learning.

  2. Develop efficient algorithms for both multiple kernel learning and non-parameteric kernel learning.

  3. Develop semi-supervised learning algorithms for multi-label learning

  4. Develop online learning algorithms for multi-label learning

  5. Develop generalization theory for distance metric learning

  6. Develop efficient algorithms to learn distance functions from multi-labeled and multi-instance data

  7. Apply the developed learning algorithms to visual object recognition, image annotation, and gene expression pattern prediction.

 

Publication:

 

  1. L. Wu, R. Jin, and A. Jain, Tag Completion for Image Retrieval, to appear on IEEE Transaction in Pattern Analysis and Machine Intelligence (PAMI), 2012 (PDF)
  2. T. Yang, R. Jin, and M. Mahdavi, Online Kernel Selection: Algorithms and Evaluations, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12), 2012 (PDF)
  3. L. Zhang, R. Jin, J. Bu, C. Chen, and X. He, Efficient Online Learning for Large-scale Sparse Kernel Logistic Regression, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12), 2012 (PDF)
  4. T. Yang, M. Ji, and R. Jin, A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound, Proceedings of the 29th International Conference on Machine Learning (ICML2012), 2012 (PDF)
  5. S. C. H. Hoi and R. Jin, Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning, Proceedings of the 29th International Conference on Machine Learning (ICML2012), 2012 (PDF)
  6. S. S. Bucak, R. Jin and A. K. Jain, Multi-label Learning with Incomplete Class Assignments, Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 2011 (PDF)
  7. H. Valizadegan, R. Jin, S.Wang, Learning to trade o between exploration and exploitation in multiclass bandit prediction, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), 2011 (PDF)
  8. P. Zhao, S. C.H. Hoi, R. Jin, and T. Yang, Proceedings of the 28th International Conference on Machine Learning (ICML2011), 2011 (PDF)
  9. P. Zhao, S. C.H. Hoi, and R. Jin, Double Updating Online Learning, Journal of Machine Learning Research (JMLR), 2011  (PDF)  (in press)
  10. K. Huang, R. Jin, Z. Xu, and C. Liu, Robust Metric Learning with Smooth Optimization, Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010 (PDF)
  11. Z. Xu, R. Jin, H. Yang, I. King, and M. Lyu, Simple and Efficient Multiple Kernel Learning By Group Lasso, Proceedings of the 27th International Conference on Machine Learning (ICML), 2010 (PDF)
  12. R. Jin, S. C. H. Hoi, and T. Yang, Online Multiple Kernel Learning: Algorithms and Mistake Bounds, Proceedings of the 21st International Conference on Algorithmic Learning Theory (ALT2010), 390-404, 2010 (PDF)
  13. Z. Xu, R. Jin, S. Zhu, M. Lyu, and I. King, Smooth Optimization for Effective Multiple Kernel Learning, Proceedings of the 24th Conference on Artificial Intelligence (AAAI), 2010 (PDF)
  14. Y. Zhou, R. Jin, and S. C. H. Hoi, Exclusive Lasso for Multi-task Feature Selection, Proceeding of the 14th International Conference on Artificial Intelligence and Statistics (AISTAT), 2010 (PDF)
  15. Z. Xu, I. King, M. Lyu, and R. Jin, Semi-supervised Feature Selection based on Manifold Regularization, IEEE Transaction on Neural Networks, Pages 1033-1047, 2010 (PDF)
  16. T. Yang, R. Jin, and A. K. Jain, Unsupervised Transfer Classification: Application to Text Categorization, Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010 (PDF)
  17. T. Yang, R. Jin, and A. Jain, Learning from Noisy Side Information by Generalized Maximum Entropy Model, Proceedings of the 27th International Conference on Machine Learning (ICML), 2010 (PDF)
  18. S. C. H. Hoi, R. Jin, M. R. Lyu, Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval, IEEE Trans. Knowl. Data Eng. 21(9): 1233-1248, 2009 (PDF)
  19. S. Hoi, R. Jin, J. Zhu, M. R. Lyu, Semi-Supervised SVM Batch Mode Active Learning with Applications to Image Retrieval, ACM Transaction on Information System (TOIS) 27(3), July, 2009 (PDF)
  20. P. Zhao, S. C. H. Hoi, R. Jin, DUOL: A Double Updating Approach for Online Learning, Advances in Neural Information Processing System 22 (NIPS2009), 2009 (PDF)
  21. L. Wu, R. Jin, S. C. H. Hoi, J. Zhu, N. Yu, Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering, Advances in Neural Information Processing System 22 (NIPS2009), 2009 (PDF)
  22. R. Jin, S. Wang, Regularized Distance Metric Learning:Theory and Algorithm, Advances in Neural Information Processing System 22 (NIPS2009), 2009  (PDF)
  23. R. Jin , S. Wang, and Z.-H. Zhou, Learning a Distance Metric from Multi-instance Multi-label Data, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009 (PDF)
  24. S. S. Bucak, P. K. Mallapragada, R. Jin, A. K Jain, Efficient Multi-label Ranking for Multi-class Learning: Application to Object Recognition, Proceedings of 12th IEEE International Conference on Computer Vision (ICCV 2009), 2009 (PDF)
  25. R. Jin, L. Si, and C. Chan, A Bayesian Framework for Knowledge Driven Regression Model in Micro-array Data Analysis, accepted by International Journal of Data Mining and Bioinformatics (IJDMB), 2008 (PDF)
  26. P. K. Mallapragada, R. Jin, A. K. Jain, and Y. Liu, SemiBoost: Boosting for Semi-supervised Learning, accepted by IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 2008 (PDF)
  27. S. C. H. Hoi, R. Jin, J. Zhu, and M. Lyu, Semi-Supervised SVM Batch Mode Active Learning for Image Retrieval, to appear in the Proceedings of IEEE Computer Society on Computer Vision and Pattern Recognition (CVPR 2008), 2008 (PDF)
  28. R. Jin, H. Valizadegan, and L. Hang, Ranking Refinement and Its Application to Information Retrieval, Proceedings of 17th International World Wide Web Conference (WWW 2008), 397-406, 2008 (PDF)
  29. S. C. H. Hoi and R. Jin, Active Kernel Learning, Proceedings of the 25th International Conference on Machine Learning (ICML 2008), 400-407, 2008 (PDF)
  30. Yang Zhou, Zheng Li, Xuerui Yang, Linxia Zhang, Shireesh Srivastava, Rong Jin, Christina Chan: Using Knowledge Driven Matrix Factorization to Reconstruct Modular Gene Regulatory Network. Proceedings of 23rd National Conference on Artificial Intelligence (AAAI 2008), 811-816, 2008 (PDF)
  31. S. C. H. Hoi and R. Jin, Semi-Supervised Ensemble Ranking, Proceedings of 23rd National Conference on Artificial Intelligence (AAAI 2008), 643-649, 2008 (PDF)
  32. H. Valizadegan, R. Jin, and A. K. Jain, Semi-supervised Boosting for Multi-Class Classification, Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database (ECML/PKDD 2008), 522-537, 2008 (PDF)
  33. Y. Liu, R. Jin, and A. Jain, BoostCluster: Boosting Clustering by Pairwise Constraints, Proceedings of Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), 450-459, 2007 (PDF)
  34. L. Yang, R. Jin, and R. Sukthankar, Bayesian Active Distance Metric Learning, Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007), 2007 (PDF)
  35. L. Yang, R. Jin, and R. Sukthankar, Discriminative Cluster Refinement: Improving Object Category Recognition Given Limited Training Data, Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 1-8, 2007 (PDF)

 

Presentation:

  1. A Potential-based Framework for Multi-class Learning with Partial Feedback, Zhejiang University, June, 2010

  2. A Potential-based Framework for Multi-class Learning with Partial Feedback, Nanjin University, June, 2010

  3. A Potential-based Framework for Multi-class Learning with Partial Feedback, Yahoo! Research Lab, November, 2009

  4. Slides for the seminar course "Recent Topics in Biological Networks, Systems Biology and Modeling" (Chemical Engineering and Material Science) (ppt file)

  5. An Extended Level Method for Multiple Kernel Learning, Microsoft Research Asia, December 2008

  6. An Extended Level Method for Multiple Kernel Learning, Dept. of Computer Science, Purdue University, November 2008

  7. An Extended Level Method for Multiple Kernel Learning, Yahoo!, October 2008

  8. Non-parametric Kernel Learning, Dept. of Computer Science and Engineering, University of Michigan, September 2008

  9.  An introductory lecture of Machine Learning lecture for the course of introductory bioinformatics (Plant Biology) (ppt file)

  10. Machine Learning for Information Retrieval, tutorial at ACM SIGIR Conference (2006, 2007, & 2008)

  11. Batch Mode Active Learning, Dept. of Computer Science, University of Wisconsin Madison, March 2008

  12. Boosting Clustering by Pairwise Consraints, Department of Operational Research, University of Montreal, Feb. 2008

  13. Batch Mode Active Learning, Dept. of Computer Science, McGill University, Feb. 2008

  14. Boosting Clustering by Pairwise Constraints, Yahoo!, Aug. 2007

  15. Boosting Clustering by Pairwise Constraints, School of Engineering, University of California Santa Cruz, Jun. 2007

  16. Discriminative Cluster Refinement, NEC America Lab, Jun. 2007

 

Download Software:

 

  1. We have developed a package for multi-label ranking. More information of this package can be found from the web page http://www.cse.msu.edu/~bucakser/other.htm
  2. We have developed a package for distance metric learning that implements a number of state-of-the-art algorithms. More information of this package can be found from the web page http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf.

 

Broader Impacts

 

We have applied the multi-label learning algorithms developed in this project to visual object recognition, automated image annotation, and gene expression pattern recognition.

 

Point of Contact:  Rong Jin (rongjin@cse.msu.edu)

 

Last update: 01/11/2011