Project:
Large-Scale Multi-label Learning (NSF IIS-0643494)
PI: Rong Jin (rongjin@cse.msu.edu),
Michigan State
University
Abstract:
Important
applications in science and business depend on automatic classification.
Multi-label learning refers to the classification problem where each example
can be assigned to multiple class labels simultaneously. It has found
applications in many different domains, such as natural language processing,
computer vision, human computer interaction, bioinformatics, health care, and
physiology. Existing machine learning technologies are unsuitable for
large-scale multi-label learning because they are unable to handle rare class
classification problems and distinguish classes with similar input patterns.
To
overcome the limitation of the existing approaches, the project will develop a
relation propagation framework for
multi-label learning that explicitly exploits the similarity of examples and
the correlation among classes simultaneously. In particular, this project will
include research to (1) develop efficient optimization algorithms for the
proposed relation propagation framework; (2) develop effective algorithms for
learning the similarity of examples and the correlation among classes; (3)
develop effective active learning algorithms for multi-label learning; and (4)
evaluate the proposed framework for multi-label learning through three real
world applications.
The
project will advance the state of the art of techniques for large-scale
multi-label learning through the development of relation propagation framework,
which in return will have a significant impact on a wide range of applications.
The research results will also enhance the current machine learning curricula,
and improve the education of the information technology workforce.
Students:
- Miao Xu (joined the
project since Apr. 2012)
- Mehrdad Mahdavi
(joined the project since Jun., 2011)
- Jinfeng Yi (joined
the project since Jan., 2010)
- Tianbao Yang (joined
the project since Aug., 2008)
-
Liu Yang (http://www.cse.msu.edu/~yangliu1/)
(graduated on July, 2008)
- Feng Kang (graduated on Dec., 2007)
Collaborator:
-
Rahul Sukathankar, Intel
Research Pittsburgh. The collaboration with Dr. Sukathankar is focused on
visual object recognition. .
-
Anil K. Jain, Dept. of
Computer Science and Engineering, Michigan State University. The
collaboration with Dr. Jain is focused on applying multi-label learning and
distance metric learning techniques to automated image annotation and
content-based image retrieval.
-
Christina Chan, Dept. of Chemical Engineering, Michigan State
University. The collaboration with Dr. Chan is focused on the applying of
multi-label learning to the discovery of gene regulatory networks.
-
Jieping Ye, Dept. of
Computer Science and Engineering, Arizona State University. The
collaboration with Dr. Ye is focused on developing novel methods for kernel
learning and efficient algorithms for multi-label learning.
-
Shijun Wnag, Diagnostic
Radiology Department, Clinical Center, National Institutes of Health. The
collaboration with Dr. Wang is focused on distance metric learning.
-
Steven C. H. Hoi, Division
of Information Systems, School of Computer Engineering, Nanyang
Technological University. The collaboration with Dr. Hoi is focused on
kernel learning and online learning.
Project Goal:
To
address the fundamental challenge in large-scale multi-label learning,
this project will develop a
relation propagation framework for
multi-label learning that explicitly exploits the similarity of examples and
the correlation among classes simultaneously. In particular, the object of this
project to advance the state of the art of techniques for large-scale
multi-label learning in the following aspects: (1) develop efficient optimization algorithms for the
proposed relation propagation framework; (2) develop effective algorithms for
learning the similarity of examples and the correlation among classes; (3)
develop effective active learning algorithms for multi-label learning; and (4)
evaluate the proposed framework for multi-label learning through three real
world applications.
Research Challenges:
There
are fundamental challenges in developing algorithms and theories for large-scale
multi-label learning:
-
Rare class classification. Previous studies showed that
large-scale multi-label learning tends to include a number of rare classes
that are only assigned to a small portion of the examples. Thus, the
research question is how to automatically classify the rare classes given
their limited number of training examples.
-
Classes with similar input patterns. With increasing numbers
of classes, it is likely that different classes will share similar input
patterns, making it difficult to distinguish them. Thus, the open research
question is how to distinguish the classes that share similar input
patterns.
Current Results:
-
Develop active learning algorithms to facilitate distance metric learning
and kernel learning.
-
Develop efficient algorithms for both multiple kernel learning and non-parameteric
kernel learning.
-
Develop semi-supervised learning algorithms for multi-label learning
-
Develop online learning algorithms for multi-label learning
-
Develop generalization theory for distance metric learning
-
Develop efficient algorithms to learn distance functions from multi-labeled
and multi-instance data
-
Apply the developed learning algorithms to visual object recognition, image
annotation, and gene expression pattern prediction.
Publication:
-
L. Wu, R. Jin, and A. Jain, Tag Completion for Image Retrieval, to appear on
IEEE Transaction in Pattern Analysis and Machine Intelligence (PAMI), 2012 (PDF)
-
T. Yang, R. Jin, and M. Mahdavi, Online Kernel Selection: Algorithms and
Evaluations, Proceedings of the Twenty-Sixth AAAI Conference on Artificial
Intelligence (AAAI-12), 2012 (PDF)
-
L. Zhang, R. Jin, J. Bu, C. Chen, and X. He, Efficient Online Learning for
Large-scale Sparse Kernel Logistic Regression, Proceedings of the
Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12), 2012 (PDF)
-
T. Yang, M. Ji, and R. Jin, A Simple Algorithm for Semi-supervised Learning
with Improved Generalization Error Bound, Proceedings of the 29th
International Conference on Machine Learning (ICML2012), 2012 (PDF)
-
S. C. H. Hoi and R. Jin, Fast Bounded Online Gradient Descent Algorithms for
Scalable Kernel-Based Online Learning, Proceedings of the 29th International
Conference on Machine Learning (ICML2012), 2012 (PDF)
-
S. S. Bucak, R. Jin and A. K. Jain, Multi-label Learning with Incomplete
Class Assignments, Proceedings of the 24th IEEE Conference on Computer
Vision and Pattern Recognition (CVPR 2011), 2011
(PDF)
-
H. Valizadegan, R. Jin, S.Wang, Learning to trade o between exploration and
exploitation in multiclass bandit prediction, Proceedings of the 17th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD
2011), 2011 (PDF)
-
P. Zhao, S. C.H. Hoi, R. Jin, and T. Yang, Proceedings of the 28th
International Conference on Machine Learning (ICML2011), 2011
(PDF)
-
P. Zhao, S. C.H. Hoi, and R. Jin, Double Updating Online Learning, Journal
of Machine Learning Research (JMLR), 2011
(PDF) (in
press)
- K. Huang, R. Jin, Z. Xu, and C. Liu, Robust Metric Learning with Smooth
Optimization, Proceedings of the 26th Conference on Uncertainty in
Artificial Intelligence (UAI), 2010 (PDF)
- Z. Xu, R. Jin, H. Yang, I. King, and M. Lyu, Simple and Efficient Multiple
Kernel Learning By Group Lasso, Proceedings of the 27th International
Conference on Machine Learning (ICML), 2010 (PDF)
- R. Jin, S. C. H. Hoi, and T. Yang, Online Multiple Kernel Learning:
Algorithms and Mistake Bounds, Proceedings of the 21st International
Conference on Algorithmic Learning Theory (ALT2010), 390-404, 2010 (PDF)
- Z. Xu, R. Jin, S. Zhu, M. Lyu, and I. King, Smooth Optimization for
Effective Multiple Kernel Learning, Proceedings of the 24th Conference on
Artificial Intelligence (AAAI), 2010 (PDF)
- Y. Zhou, R. Jin, and S. C. H. Hoi, Exclusive Lasso for Multi-task Feature
Selection, Proceeding of the 14th International Conference on Artificial
Intelligence and Statistics (AISTAT), 2010 (PDF)
- Z. Xu, I. King, M. Lyu, and R. Jin, Semi-supervised Feature Selection based
on Manifold Regularization, IEEE Transaction on Neural Networks, Pages
1033-1047, 2010 (PDF)
- T. Yang, R. Jin, and A. K. Jain, Unsupervised Transfer Classification:
Application to Text Categorization, Proceedings of the 16th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD), 2010 (PDF)
- T. Yang, R. Jin, and A. Jain, Learning from Noisy Side Information by
Generalized Maximum Entropy Model, Proceedings of the 27th International
Conference on Machine Learning (ICML), 2010 (PDF)
- S. C. H. Hoi, R. Jin, M. R. Lyu, Batch Mode Active Learning with
Applications to Text Categorization and Image Retrieval, IEEE Trans. Knowl.
Data Eng. 21(9): 1233-1248, 2009
(PDF)
- S. Hoi, R. Jin, J. Zhu, M. R. Lyu, Semi-Supervised SVM Batch Mode Active
Learning with Applications to Image Retrieval, ACM Transaction on
Information System (TOIS) 27(3), July, 2009
(PDF)
- P. Zhao, S. C. H. Hoi, R. Jin, DUOL: A Double Updating Approach for Online
Learning, Advances in Neural Information Processing System 22 (NIPS2009),
2009
(PDF)
- L. Wu, R. Jin, S. C. H. Hoi, J. Zhu, N. Yu, Learning Bregman Distance
Functions and Its Application for Semi-Supervised Clustering, Advances in
Neural Information Processing System 22 (NIPS2009), 2009
(PDF)
- R. Jin, S. Wang, Regularized Distance Metric Learning:Theory and Algorithm,
Advances in Neural Information Processing System 22 (NIPS2009), 2009
(PDF)
- R. Jin , S. Wang, and Z.-H. Zhou, Learning a Distance Metric from
Multi-instance Multi-label Data, Proceedings of IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009
(PDF)
- S. S. Bucak, P. K. Mallapragada, R. Jin, A. K Jain, Efficient
Multi-label Ranking for Multi-class Learning: Application to Object
Recognition, Proceedings of 12th IEEE International Conference on Computer
Vision (ICCV 2009), 2009
(PDF)
- R. Jin, L. Si, and C. Chan,
A Bayesian Framework for Knowledge Driven Regression Model in Micro-array
Data Analysis, accepted by International Journal of Data Mining and
Bioinformatics (IJDMB), 2008 (PDF)
- P. K. Mallapragada, R. Jin,
A. K. Jain, and Y. Liu, SemiBoost: Boosting for Semi-supervised Learning,
accepted by IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI),
2008 (PDF)
- S. C. H. Hoi, R. Jin, J.
Zhu, and M. Lyu, Semi-Supervised SVM Batch Mode Active Learning for Image
Retrieval, to appear in the Proceedings of IEEE Computer Society on Computer
Vision and Pattern Recognition (CVPR 2008), 2008 (PDF)
- R. Jin, H. Valizadegan, and
L. Hang, Ranking Refinement and Its Application to Information Retrieval,
Proceedings of 17th International World Wide Web Conference (WWW 2008),
397-406, 2008
(PDF)
- S. C. H. Hoi and R. Jin,
Active Kernel Learning, Proceedings of the 25th International Conference on
Machine Learning (ICML 2008), 400-407, 2008 (PDF)
- Yang Zhou, Zheng Li, Xuerui
Yang, Linxia Zhang, Shireesh Srivastava, Rong Jin, Christina Chan: Using
Knowledge Driven Matrix Factorization to Reconstruct Modular Gene Regulatory
Network. Proceedings of 23rd National Conference on Artificial Intelligence
(AAAI 2008), 811-816, 2008 (PDF)
- S. C. H. Hoi and R. Jin,
Semi-Supervised Ensemble Ranking, Proceedings of 23rd National Conference on
Artificial Intelligence (AAAI 2008), 643-649, 2008 (PDF)
- H. Valizadegan, R. Jin, and
A. K. Jain, Semi-supervised Boosting for Multi-Class Classification,
Proceedings of European Conference on Machine Learning and Principles and
Practice of Knowledge Discovery in Database (ECML/PKDD 2008), 522-537, 2008 (PDF)
-
Y. Liu, R. Jin, and A. Jain, BoostCluster: Boosting Clustering by
Pairwise Constraints, Proceedings of
Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD 2007), 450-459, 2007 (PDF)
-
L. Yang, R. Jin, and R. Sukthankar, Bayesian Active Distance
Metric Learning, Proceedings of the
23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007),
2007 (PDF)
-
L. Yang, R. Jin, and R. Sukthankar, Discriminative Cluster
Refinement: Improving Object Category Recognition Given Limited Training
Data, Proceedings of the 2007 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR 2007), 1-8, 2007 (PDF)
Presentation:
-
A Potential-based Framework for Multi-class
Learning with Partial Feedback, Zhejiang University, June, 2010
-
A Potential-based Framework for Multi-class
Learning with Partial Feedback, Nanjin University, June, 2010
-
A Potential-based Framework for Multi-class
Learning with Partial Feedback, Yahoo! Research Lab, November, 2009
-
Slides for the seminar course "Recent Topics
in Biological Networks, Systems Biology and Modeling" (Chemical Engineering
and Material Science)
(ppt file)
-
An Extended Level Method for Multiple Kernel
Learning, Microsoft Research Asia, December 2008
-
An Extended Level Method for Multiple Kernel
Learning, Dept. of Computer Science, Purdue University, November 2008
-
An Extended Level Method for Multiple Kernel
Learning, Yahoo!, October 2008
-
Non-parametric Kernel Learning, Dept. of
Computer Science and Engineering, University of Michigan, September 2008
-
An
introductory lecture of Machine Learning lecture for the course of introductory
bioinformatics (Plant Biology) (ppt file)
-
Machine Learning for Information Retrieval, tutorial at ACM SIGIR
Conference (2006, 2007, & 2008)
-
Batch Mode Active Learning, Dept. of Computer Science, University of
Wisconsin Madison, March 2008
-
Boosting Clustering by Pairwise Consraints, Department of Operational
Research, University of Montreal, Feb. 2008
-
Batch Mode Active Learning, Dept. of Computer Science, McGill University,
Feb. 2008
-
Boosting Clustering by Pairwise Constraints, Yahoo!, Aug. 2007
-
Boosting Clustering by Pairwise Constraints, School of Engineering,
University of California Santa Cruz, Jun. 2007
-
Discriminative Cluster Refinement, NEC America Lab, Jun. 2007
Download Software:
- We have developed a package for multi-label ranking. More information of
this package can be found from the web page
http://www.cse.msu.edu/~bucakser/other.htm
-
We have developed a package for distance metric learning that
implements a number of state-of-the-art algorithms. More information of
this package can be found from the web page
http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf.
Broader Impacts
We
have applied the multi-label learning algorithms developed in this project to
visual object recognition, automated image annotation, and gene expression
pattern recognition.