TerrierTeam: September 2008

Friday, September 12, 2008

Conference Deadline Traffic Jam

I noted today that Matthew Hurst has posted the ICWSM 2009 Call for Papers. Unfortunately the submission deadline is on the 21st January. This is a full 6 weeks later than for ICWSM 2008. Moreover, this falls four days before the SIGIR full paper deadline.

As an IR researcher, we have to target certain conferences. While I'd like to have multiple papers ready for several conferences with similar deadlines in advance, various pressures and reasons don't make that possible (e.g. I'd like a holiday at Christmas!).

The conference deadlines in January and February now look like:

11th January: WWW 2009 Posters due
19th January: SIGIR 2009 Abstracts due
21st January: ICWSM 2009 Papers/Posters/Demos dues
25th January: SIGIR 2009 Papers due
9th February: NAACL-HLT 2009 Short Papers due
22nd February: ACL-IJCNLP Papers due
23rd February: SIGIR 2009 Posters due

Happy writing!

Wednesday, September 10, 2008

About Blog Search Tasks

We have been very busy recently with the TREC 2008 Blog track. Now that all runs have been submitted and that the relevance assessments are on-going, it is the time of the year where we start planning for the future of the track at TREC 2009! Indeed, TREC operates a policy where existing tracks are renewed on an annual basis, and following the submission of a proposal.

Back in 2006, when we first proposed the Blog track, our aim was to have a long-term objective for the track, recognising that the richness of the blogosphere and its peculiarities will require several years of investigation before reaching a full understanding of the different blog search tasks, and how they should be effectively addressed. In particular, we proposed to adopt an incremental approach, where we begin with basic blog search tasks and progressively move to more complex search scenarios.

In the first three years of the track (2006-2008), we addressed two main blog search tasks:

Opinion finding: involves locating blog posts that express an opinion about a given target.
Blog distillation: involves locating blogs that are principally devoted to a topic X over the timespan of the feed.

The first task tackles an important aspect of blogs, namely their opinionated/subjective nature, and the tendency of bloggers to express views, thoughts and feelings towards named-entities. This tasks helps users to find out what the bloggers think about X. The second search task addresses a scenario where the user would like to find a blog to follow or read in their RSS reader. Our main findings and conclusions from the first two years of the Blog track at TREC are summarised in the ICWSM 2008 paper, entitled On the Trec Blog Track. The Blog track 2006 and 2007 overview papers provide further detailed analysis and results.

We are now proposing to move to a second phase of the Blog track, where more refined and complex search scenarios should be investigated. In particular, we are thinking to use a new and larger collection of blogs, which has a much longer timespan than the 11-weeks period covered in the Blog06 collection. This allows investigating another important characteristic of the blogosphere, namely the temporal/chronological aspect of blogging, and various related search tasks such as story identification and tracking.

While we were thinking about such possible future tasks, we came across a position paper by Marti Hearst, Matthew Hurst and Susan Dumais, entitled "What Should Blog Search Look Like?", which will be presented in the forthcoming Search in Social Media (SSM 2008) workshop at CIKM 2008.

In particular, Hearst et al. propose that the blog distillation task should be further refined by taking into account a number of dimensions or attributes such as the authority of the blog, the trustworthiness of its authors, the genre of the blog and its style of writing. For example, a user might be interested in blogs to read about a topic X, but where the blogger expresses in-depth viewpoints, backed up by a scientific methodology or evidence. The Cranfield evaluation paradigm adopted by TREC requires deeper thoughts about how relevance assessments should be conducted in such a scenario.

Unsurprisingly for a strong advocate of the importance of user interfaces and visualisation tools for information retrieval, Hearst together with her co-authors propose a faceted blog search interface to help the user explore the attributes of the blogs before choosing those they wish to follow or read, i.e. exploratory search at its best! The conclusion of the paper provides a good summary of Hearst et al.'s views:

For the problem of selecting a blog to read, we propose a faceted interface which highlights different attributes of interest, with a focus on people and on matching the taste preferences of the reader. For the task of “taking the pulse of the blogosphere,” we suggest that blog data be integrated with other social media and that the existing work on tracking trends and aggregating views is heading in the right direction.

As we are trying to wrap up our proposal for TREC 2009, we would like to hear other suggestions and comments about what blog search should look like. Please feel free to post your thoughts and comments in this post, or to email them privately, if you wish so.

Monday, September 1, 2008

From Expert Search to Commoditising Workers

While I'm putting the finishing touches to my PhD thesis (titled The Voting Model for Expert and Blog Search), I thought I'd pick up on a recent related article.

An excerpt from The Numerati has been published on BusinessWeek.com. In the excerpt, Stephen Baker interviews the scientist Samer Takriti while he was working at IBM . Samer, who is a specialist in Operations Research, is working on commoditising workers. Similar to how supply chains and production lines have been modelled and improved, Samer believes that people can be assigned to projects using combinations of their availability, their scost, and their skills/expertise. The idea is to optimise the use of co-workers, leading to a better productivity within an organisation.

What's really interesting here is that this is a real application of expert search technology, being applied not just to satisfy occasional expertise needs ("I'm stuck, who should I ask for help?"), but in daily use to determine work assignments and to increase productivity. A fusion of search technology with constraint optimisation. Tools like these are likely to become invaluable in assigning jobs in global consultancy companies, where managers are unlikely to know everyone at their disposal. Such tools could even be used to identify the best training path for a co-worker to become skilled and productive in a particular area.

Imagine, says Aleksandra Mojsilovic, one of Takriti's close colleagues, that the company has a superior worker named Joe Smith. Management could really benefit from two or three others just like him, or even a dozen. Once the company has built rich mathematical profiles of Smith and his fellow workers, it might be possible to identify at least a few of the experiences or routines that make Joe Smith so good. "If you had the full employment history, you could even compute the steps to become a Joe Smith," she says.

Van drivers have been having their routes assigned automatically for many years. Why shouldn't consultants at IBM be any different? However, Baker points out that some people may be left out by systems (his example, a senior consultant left out because of his high cost, which Takriti counteracts by allowing senior staff members more "time on the bench" than junior staff, because when senior consultants are utilised they get larger cheques). Even still, the concern is this reliance on an expert search system to assign jobs when "expertise relevance" is an even vaguer concept than "document relevance", and expert search systems are not yet (and might never be) as accurate as a travelling salesman solution or a program to optimise a supply chain.

(Via Slashdot)