Wednesday, March 10, 2010

Terrier 3.0 released

Firstly, we have a new website for Terrier: http://terrier.org

Also, we have just released Terrier 3.0!

This is a major update to Terrier, including:
  • support for indexing WARC collections (such as ClueWeb09)
  • improved MapReduce mode indexing
  • improved and more scalable index structures
  • added field-based and proximity term dependence models, such as BM25F, PL2F and Markov Random Fields
  • new Web-based retrieval interface
Fuller changelog at http://terrier.org/docs/current/whats_new.html

If your looking for our team publications, etc., please see our new team website: http://terrierteam.dcs.gla.ac.uk/

Thanks are due to everyone in the Terrier Team for their hard work to make this release, as well as the contributions and feedback about Terrier from our users and collaborators.

Tuesday, February 23, 2010

TREC Blog Track 2010

The TREC Blog track will be continuing in 2010. In
 2009, 
the
 Blog 
track 
has
 been
 markedly 
revamped
, addressing 
more
 refined
 Blog 
search 
scenarios
 using 
the new Blogs08 collection, a
 large
 sample 
of
 the 
blogosphere covering the period of 14th January 2008 to 10th February 2009.

A summary of the TREC Blog track 2009 edition has been presented by Iadh Ounis at the main TREC conference (Slides). The Blog track 2009 overview paper will be available on the TREC website shortly, once it is updated and reviewed.

The details of the TREC 2010 Blog track are still being finalised by the organisers. However, following the discussions at the TREC 2009 Blog track workshop, here are some salient details (see also the TREC 2009 Wrap-up Slides):

1. Faceted blog search task will run again in 2010: The task addresses
 the 
quality aspect
 of
 the
 retrieved blogs
. It is a feed search task.
  • We will adopt a two-stage submission procedure: (1) a participating group submits "topically-relevant"blogs for each query; (2) a few standard baselines will be distributed to participants, so that they can re-rank them with respect to various facet inclinations (e.g. opinionated, in-depth, personal).
  • Groups can participate in stage 2 without stage 1, and vice-versa. Stage 1 is akin to an adhoc blog search task.
  • More topics for various facet inclinations.

2. Top news story identification task will run again in 2010: The task addresses the 
news‐related 
dimension
 of 
the 
blogosphere. In particular, it investigates whether the blogosphere can be used to identify the most important news stories of the day.


  • Real-time news search task rather than retrospective.
  • Much larger and a more comprehensive headlines sample, provided by a major news organisation.
  • A two-stage submission procedure: (1) Groups submit a ranking of top stories for some days per-category (e.g. sport, politics, business, etc.) (2) We will then select some top relevant stories, for which we will ask the participating groups to identify the related blog posts, in a manner that covers the various/diverse aspects of each story.
  • Groups can participate in stage 2 without stage 1. In the latter case, its is an adhoc diversity blog post search task, where the headline is the query.
We welcome any feedback and comments on the tasks above to trecblog-organisers (at) dcs.gla.ac.uk

Finally, note that if you wish to participate in TREC 2010, you should answer the TREC 2010 call for participation. We will update the Blog track wiki as things become more refined - keep following the Blog track developments as they happen on our dedicated Wiki web site.

Tuesday, August 4, 2009

AcademTech: Faceted People Search

AcademTech is a Computing Science-specific expert search engine based on the Terrier IR Platform. Persons working at Computing Science departments in Scottish Universities are considered as candidate experts by the system. Profiles of their expertise evidence are then mined from their homepages, publicly available digital libraries (e.g. DBLP) and related information found on the Web through Yahoo! BOSS. The ranking of experts is provided by a variant of the Voting Model expert search approach.

The system is integrated with a novel faceted search interface to allow users to browse and explore the results using a number of categories such as Location or Conference/Journal publications. Each expert in the system has a profile page containing a number of elements including query specific supporting publications, most informative associated terms displayed as a tag cloud, co-authors and web links. Although the system is currently applied in the context of Scottish Computing Science Academia, it can easily be expanded to go beyond its current Scottish scope, cover other academic fields, and people in general.

I was lucky enough to be able to demo AcademTech at SIGIR 2009 in Boston on July 20th. Thankfully, I spoke to a large number of attendees receiving largely very helpful feedback.

A popular suggestion was to utilize AcademTech's core system in the scope of biology. This would meet the medical field's need for finding related organisms, diseases etc. Possible facets in the area would likely be biological classifications such as species and genus.

Daniel Tunkelang from The Noisy Channel suggested providing profile page-located facets, allowing filtering of search results by features present in a selected expert's page such as co-authors. This would satisfy an example scenario such as "Show me co-authors of this expert who work for the University of Glasgow." Profile facets could also allow the experts publications list to be filtered by a number of fields such as co-author location, conference etc.

Much of the feedback mirrored that of intended future work. Name disambiguation is a high priority update as a current problem with AcademTech is the publication mismatch when multiple experts have the same name. In fact, the system is specifically designed to allow for expansion of facets, and name disambiguation. With a large amount of publication collaborators working in industry a useful move would be to expand to accommodate these experts.

AcademTech Sigir 2009 PosterAcademTech is now publicly accessible from http://www.terrier.org/academtech
A description of the system is available in the SIGIR'09 proceedings.

Thank you to all those who spoke to me and gave me some great feedback.

Tuesday, July 21, 2009

SIGIR 2009: Expert Search from Glasgow

A short update from SIGIR09 to announce our recently published work on expert search. This should hopefully be the first of a series of a few posts about SIGIR this year.

In On Perfect Document Rankings for Expert Search (Craig Macdonald & Iadh Ounis), we examine the effect of the document ranking to an expert search engine. Intuitively, improving the topical relevance properties of the document ranking usually leads to an improvement in the performance of the generated ranking of documents. In this poster, we examine the extreme case, by making the document ranking component perfect with respect to topical relevance.

In Usefulness of Click-through data in Expert Search (Craig Macdonald & Ryen White), we examine how user clicks on an intranet search engine can be used as features by an expert search engine. The proposed techniques are based on the voting techniques from the Voting Model, but examine documents clicks instead of weighting model scores. To our knowledge, this is the first work examining how clicks can be integrated into expert search.

Finally, the Voting Model was show-cased in the Expertise Search in Academia using Facets (Duncan McDougall & Craig Macdonald), which demoed AcademTech, a faceted search interface for expert search in academia.

Thursday, June 4, 2009

CIKM 2011 in Glasgow!

We are delighted that our bid to host the ACM Conference on Information and Knowledge Management (CIKM 2011) in Glasgow has been successful.

After the highly successful ESSIR 2007 and ECIR 2008 events, we are excited at the prospect of hosting the prestigious ACM CIKM Conference in Glasgow in 2011. We look forward to having our colleagues gather in Glasgow, and to surpassing their expectations.

Further information about the conference (dates, venues, etc.) will be available in due course.

CIKM 2009 will be held on November 2-6, 2009, in Hong Kong. Hope to see you there!