Sunday, November 30, 2008
The TREC 2008 Blog track workshop
We just came back from Gaithersburg a few days ago. It was a nice (and cold!) week at the TREC 2008 conference. Besides presenting the main results of our participation in the Blog, Enterprise, and Relevance Feedback tracks, we had fruitful discussions at the Blog track workshop regarding the directions of the track for 2009.
It was a consensus among the attendees that opinion retrieval and polarity detection are still open, relevant problems. Yet a few groups managed to deploy interesting techniques that achieved consistent opinion retrieval performances across several strongly performing baselines in the track this year, polarity detection approaches looked rather naive. It was suggested that polarity detection be investigated at a finer granularity (e.g., at the sentence rather than the document level). This, however, could result in crossing the boundaries with respect to the TAC conference.
Nonetheless, believing that, after three years, the Blog track has contributed a comprehensive experimental setting for those who wish to continue investigating these search scenarios, the organisers decided to discontinue the opinion finding and polarity tasks, at least in their current format. Instead, they propose to investigate the opinionated nature of blogs as one of many interesting facets of a broader search task. This task extends the current blog distillation task by moving beyond topic relevance and introducing different requirements in order to qualify "good" blogs, i.e., blogs that have a recurrent interest in a given topic and that also fulfil a set of predefined "facets". This way, for instance, one could search for humorous blogs about the government, or opinionated blogs about whisky.
Besides this faceted blog distillation task, a second task was considered relevant and worth investigating by the workshop attendees, namely, tracking stories on the blogosphere. The aim is to investigate how stories emerge and evolve along the time frame of the blog corpus. It was also noted that this task could be linked to a news search task so as to draw a connection between stories published on the blogosphere and on the mainstream media.
As pointed out, however, the 11-weeks time frame of the Blogs06 collection does not adequately support the story tracking task. Furthermore, the availability of a more representative sample of the blogosphere is an important step towards addressing blog search as a social media problem. For such, a new corpus will be used in 2009, with a much larger size and time frame.
For those who did not attend the Blog track workshop at TREC, please feel free to post your comments about the proposed tasks for 2009.
Hope you all join us in the TREC 2009 Blog track!