Mining and Searching Graphs in Biological Databases
Database and Information Systems Research Lab
Place: 1225 Engineering
Host: P.N. Tan
Recent research on pattern discovery has progressed from mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in bioinformatics. However, mining and searching large graphs in graph databases is challenging due to the presence of an exponential number of frequent subgraphs.
In this talk, we present our recent progress on developing efficient and scalable methods for mining and searching of graphs in large biological databases. We first introduce gSpan, an efficient method for mining all the frequent graph patterns in graph databases, by extension of a depth-first frequent pattern growth method, developed in our previous research. Then we introduce CloseGraph, an efficient method for mining closed frequent graph patterns. A graph g is closed in a database if there exists no proper supergraph of g that has the same support as g. After that, we introduce a graph indexing method, gIndex and a graph approximate searching method, grafil, both taking advantages of frequent graph mining to construct a compact but highly effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and querying graph patterns in massive biological datasets but also claim broad applications in other fields, including DB/OS systems and software engineering.
Jiawei Han, Professor,
Department of Computer Science,