We have been very busy recently with the
TREC 2008
Blog track. Now that all runs have been submitted and that the relevance assessments are on-going, it is the time of the year where we start planning for the future of the track at TREC 2009! Indeed, TREC operates a policy where existing tracks are renewed on an annual basis, and following the submission of a proposal.
Back in 2006, when we first proposed the Blog track, our aim was to have a long-term objective for the track, recognising that the richness of the blogosphere and its peculiarities will require several years of investigation before reaching a full understanding of the different blog search tasks, and how they should be effectively addressed. In particular, we proposed to adopt an incremental approach, where we begin with basic blog search tasks and progressively move to more complex search scenarios.
In the first three years of the track (2006-2008), we addressed two main blog search tasks:
- Opinion finding: involves locating blog posts that express an opinion about a given target.
- Blog distillation: involves locating blogs that are principally devoted to a topic X over the timespan of the feed.
The first task tackles an important aspect of blogs, namely their opinionated/subjective nature, and the tendency of bloggers to express views, thoughts and feelings towards named-entities. This tasks helps users to find out what the bloggers think about X. The second search task addresses a scenario where the user would like to find a blog to follow or read in their RSS reader. Our main findings and conclusions from the first two years of the Blog track at TREC are summarised in the ICWSM 2008 paper, entitled
On the Trec Blog Track. The Blog track
2006 and
2007 overview papers provide further detailed analysis and results.
We are now proposing to move to a second phase of the Blog track, where more refined and complex search scenarios should be investigated. In particular, we are thinking to use a new and larger collection of blogs, which has a much longer timespan than the 11-weeks period covered in the
Blog06 collection. This allows investigating another important characteristic of the blogosphere, namely the temporal/chronological aspect of blogging, and various related search tasks such as story identification and tracking.
In particular, Hearst et al. propose that the blog distillation task should be further refined by taking into account a number of dimensions or attributes such as the authority of the blog, the trustworthiness of its authors, the genre of the blog and its style of writing. For example, a user might be interested in blogs to read about a topic X, but where the blogger expresses in-depth viewpoints, backed up by a scientific methodology or evidence. The Cranfield evaluation paradigm adopted by TREC requires deeper thoughts about how relevance assessments should be conducted in such a scenario.
Unsurprisingly for a strong advocate of the importance of user interfaces and visualisation tools for information retrieval, Hearst together with her co-authors propose a
faceted blog search interface to help the user explore the attributes of the blogs before choosing those they wish to follow or read, i.e.
exploratory search at its best! The conclusion of the paper provides a good summary of Hearst et al.'s views:
For the problem of selecting a blog to read, we propose a faceted interface which highlights different attributes of interest, with a focus on people and on matching the taste preferences of the reader. For the task of “taking the pulse of the blogosphere,” we suggest that blog data be integrated with other social media and that the existing work on tracking trends and aggregating views is heading in the right direction.
As we are trying to wrap up our proposal for TREC 2009, we would like to hear other suggestions and comments about what blog search should look like. Please feel free to post your thoughts and comments in this post, or to email them privately, if you wish so.