Unapparent Information Revelation: A Hybrid Model for Text Mining

Dr Rohini K. Srihari
SUNY Buffalo

Thursday, April 20, 2006
Time: 4:00pm-5:00pm
Place: 1260 Anthony Hall

Host: A. Jain


It is often the case that a document collection reveals interesting information other than what is explicitly stated. Since these documents are generated by many authors working independently and at various times, interesting links that connect facts, assertions or hypotheses may be missed. We refer to this special case of text mining as unapparent information revelation (UIR). The goal of information analysts is to sift through these extensive document collections and find such links. Currently they perform this task with limited assistance from tools such as search engines. What is required is a set of automated tools that will expose such links, or at least generate plausible evidence chains. A traditional search query involving, for example, two person names will attempt to find documents mentioning both of these individuals, favouring those where they appear closer together. Failing this, the search results in documents containing one of the names. This research focuses on a different interpretation of such a query: what is the best evidence trail across documents that connects these two individuals? We refer to this type of query as a concept chain query, a special case of text mining. A generalization of this task involves query terms representing general concepts (e.g. airplane crash, fuel contaminants). The UIR model for text mining incorporates (i) a hybrid approach to content representation combining the robustness of information retrieval models with the granularity provided by information extraction systems and (ii) algorithms for ranking concept chains. Results based on a counterterrorism corpus will be presented.


Rohini Srihari received a B. Math. in Computer Science from the University of Waterloo , Ontario, Canada. She received a Ph.D. in Computer Science from the University at Buffalo in 1992 for a dissertation on using collateral text in interpreting photographs.

Rohini Srihari has founded three companies:
(i) Cymfony Inc. in 1996-- she served as CEO until 2001 and on its board and as Chief Scientist until 2005,
(ii) Cymfony Net Private Limited (Bangalore, India) in 2000 and presently serving on its board, and
(iii) Janya Inc. in 2005 as CEO.

Rohini Srihari was named to the Women of Accomplishment Legacy Project that identified "outstanding women of the 20th and 21st century who have touched Western New York with their genius, dedication and humanity and left a lasting legacy for the generations to come."

Dr. Srihari has authored over a hundred papers , two United States patents and supervised six completed doctoral dissertations . She has been an invited speaker at several symposia and academic departments. She has served on several recent NSF and NIH review panels. A member of the Association for Computing Machinery (ACM) and the Association for Computational Linguistics (ACL), she is on the advisory/organizing committees of several scientific conferences, including: IJCAI 07 (Hyderabad, India), MSPIL-06 (Bombay, India), HLT/NAACL 06 (New York City), ICON 2005 (Kanpur, India), EMNLP 04, HLT/NAACL 04 , and SIGIR 04 (Bioinformatics).