Tuesday, August 4, 2009

AcademTech: Faceted People Search

AcademTech is a Computing Science-specific expert search engine based on the Terrier IR Platform. Persons working at Computing Science departments in Scottish Universities are considered as candidate experts by the system. Profiles of their expertise evidence are then mined from their homepages, publicly available digital libraries (e.g. DBLP) and related information found on the Web through Yahoo! BOSS. The ranking of experts is provided by a variant of the Voting Model expert search approach.

The system is integrated with a novel faceted search interface to allow users to browse and explore the results using a number of categories such as Location or Conference/Journal publications. Each expert in the system has a profile page containing a number of elements including query specific supporting publications, most informative associated terms displayed as a tag cloud, co-authors and web links. Although the system is currently applied in the context of Scottish Computing Science Academia, it can easily be expanded to go beyond its current Scottish scope, cover other academic fields, and people in general.

I was lucky enough to be able to demo AcademTech at SIGIR 2009 in Boston on July 20th. Thankfully, I spoke to a large number of attendees receiving largely very helpful feedback.

A popular suggestion was to utilize AcademTech's core system in the scope of biology. This would meet the medical field's need for finding related organisms, diseases etc. Possible facets in the area would likely be biological classifications such as species and genus.

Daniel Tunkelang from The Noisy Channel suggested providing profile page-located facets, allowing filtering of search results by features present in a selected expert's page such as co-authors. This would satisfy an example scenario such as "Show me co-authors of this expert who work for the University of Glasgow." Profile facets could also allow the experts publications list to be filtered by a number of fields such as co-author location, conference etc.

Much of the feedback mirrored that of intended future work. Name disambiguation is a high priority update as a current problem with AcademTech is the publication mismatch when multiple experts have the same name. In fact, the system is specifically designed to allow for expansion of facets, and name disambiguation. With a large amount of publication collaborators working in industry a useful move would be to expand to accommodate these experts.

AcademTech Sigir 2009 PosterAcademTech is now publicly accessible from http://www.terrier.org/academtech
A description of the system is available in the SIGIR'09 proceedings.

Thank you to all those who spoke to me and gave me some great feedback.

Tuesday, July 21, 2009

SIGIR 2009: Expert Search from Glasgow

A short update from SIGIR09 to announce our recently published work on expert search. This should hopefully be the first of a series of a few posts about SIGIR this year.

In On Perfect Document Rankings for Expert Search (Craig Macdonald & Iadh Ounis), we examine the effect of the document ranking to an expert search engine. Intuitively, improving the topical relevance properties of the document ranking usually leads to an improvement in the performance of the generated ranking of documents. In this poster, we examine the extreme case, by making the document ranking component perfect with respect to topical relevance.

In Usefulness of Click-through data in Expert Search (Craig Macdonald & Ryen White), we examine how user clicks on an intranet search engine can be used as features by an expert search engine. The proposed techniques are based on the voting techniques from the Voting Model, but examine documents clicks instead of weighting model scores. To our knowledge, this is the first work examining how clicks can be integrated into expert search.

Finally, the Voting Model was show-cased in the Expertise Search in Academia using Facets (Duncan McDougall & Craig Macdonald), which demoed AcademTech, a faceted search interface for expert search in academia.

Thursday, June 4, 2009

CIKM 2011 in Glasgow!

We are delighted that our bid to host the ACM Conference on Information and Knowledge Management (CIKM 2011) in Glasgow has been successful.

After the highly successful ESSIR 2007 and ECIR 2008 events, we are excited at the prospect of hosting the prestigious ACM CIKM Conference in Glasgow in 2011. We look forward to having our colleagues gather in Glasgow, and to surpassing their expectations.

Further information about the conference (dates, venues, etc.) will be available in due course.

CIKM 2009 will be held on November 2-6, 2009, in Hong Kong. Hope to see you there!

Wednesday, April 29, 2009

TREC Blog track 2009

We have just released a draft of the guidelines for the TREC 2009 Blog track.

Compared to previous years, the Blog track 2009 aims to investigate more refined and complex search scenarios. In particular, we propose to run two tasks in TREC 2009:
  • Faceted blog distillation: a more refined version of the blog distillation task that addresses the quality aspect of the retrieved blogs and mimics an exploratory search task. The task can be summarised as "Find me a good blog with a principal, recurring interest in X". We propose several facets for the TREC 2009 blog distillation task, which may be of varying difficulty to identify for the participant systems.
  • Top stories identification: a new pilot task that addresses the news dimension in the blogosphere. Systems are asked to identify the top news stories of a given day, and to provide a list of relevant blog posts discussing each news story. The ranked list of blog posts should have a diverse nature, covering different/diverse aspects, perspectives or opinions of the news story.
The new Blogs08 collection, an up-to-date and large sample of the blogosphere from January 2008 to February 2009, will be used for both tasks.

We welcome feedback. Please feel free to post feedback and comments about the proposed tasks for 2009.

Thursday, April 9, 2009

Blogs08 Collection Released

We are pleased to announce that the Blogs08 collection is now ready for distribution. As announced before, Blogs08 is one order of magnitude bigger than Blogs06, and samples the blogosphere from January 2008 to February 2009. The uncompressed permalink size is approx 1.3TB, while including feeds, this amounts to over 2TB of data. As usual, the data is shipped compressed on a SATA hard drive.

The distribution mechanism will be the same as for Blogs06. There is specific information about the size of the collection here, while the instructions for obtaining the collection are here.

If you intend on participating in the TREC 2009 Blog track, please start working on the paperwork right away, so that you can get the collection as soon as possible. Due to the larger size of the collection, we will operate a queuing system for shipping the data. Moreover, if you haven't done so already, respond to the TREC 2009 Call for Participation.

Blog track co-ordinators are finalising the guidelines for this year's tasks and will continue to update the TREC Blog wiki, the TREC blog track mailing list and this blog.