Yahoo! Research has launched a new search engine called Correlator. It uses advanced techniques from Natural Language Processing and Computational Linguistics to locate entities within text and to group sentences about these entities from different documents. In his talk at the ECIR 2008 Industry Day, Hugo Zaragoza who championed the Correlator project at Yahoo! Research Barcelona, described some of the system's underlying approaches and technologies. In a blog post introducing the search engine, he states:
The core of Correlator is a search engine capable of returning not only relevant documents, but also relevant sentences and entities.
Currently, Correlator uses Wikipedia as the underlying document collection. However, the Correlator team contends that this can be extended to other collections and types of documents such as blogs.
I have quickly tried Correlator this morning. My first impression of the system is that it does extremely well on many queries - e.g. results for queries such as "precision and recall" are pretty good and informative. However, there are several areas for improvement when it comes to identifying relationships between entities. For example, for the query "Tony Blair", when searching for names, the system suggests many entities as 'probably related to Tony Blair', however the precise nature of the relationship between the two entities is not stated, e.g. Cherie Blair should be presented as the definite wife of Tony Blair. Indeed, it is left to the user the task of browsing through various possible suggested relationships between the named entities. However, this might be a design choice by the designers of the system, favouring high coverage over high precision.
Relatedly, it is of note that TREC 2009 will include a new Entity track. One of the currently proposed search tasks is the identification of relationships between entities.
2 comments:
It looks like you have to quote phrases sometimes. Try looking for names related to enterprise search. :-)
Yes indeed. There is definitely scope for improvements when searching names. Even results for names related to information retrieval could be enhanced further.
The interface is awesome though.
Post a Comment