We have just released a draft of the guidelines for the TREC 2009 Blog track.
Compared to previous years, the Blog track 2009 aims to investigate more refined and complex search scenarios. In particular, we propose to run two tasks in TREC 2009:
- Faceted blog distillation: a more refined version of the blog distillation task that addresses the quality aspect of the retrieved blogs and mimics an exploratory search task. The task can be summarised as "Find me a good blog with a principal, recurring interest in X". We propose several facets for the TREC 2009 blog distillation task, which may be of varying difficulty to identify for the participant systems.
- Top stories identification: a new pilot task that addresses the news dimension in the blogosphere. Systems are asked to identify the top news stories of a given day, and to provide a list of relevant blog posts discussing each news story. The ranked list of blog posts should have a diverse nature, covering different/diverse aspects, perspectives or opinions of the news story.
The new Blogs08 collection, an up-to-date and large sample of the blogosphere from January 2008 to February 2009, will be used for both tasks.
We welcome feedback. Please feel free to post feedback and comments about the proposed tasks for 2009.
4 comments:
The new tasks sound very interesting. I don't completely understand the evaluation plan for integration of facets with relevance. All of the proposed facets are binary valued. Will the query specify a facet + value a searcher might be interested in? Or just a facet? From the example, it looks like the latter. In that case, will the systems produce two ranked lists for each query -- one for facet value A and one for value B with the expectation that the two lists will be disjoint?
Hello Jon,
Glad that you like the new tasks. We are also trying to make sure that we have realistic queries from a query log (I prefer to use the adjective 'realistic' instead of 'representative').
There is an example topic on the Blog track wiki.
The facets are not necessary binary. It is possible to have more elaborated facets, e.g. +ve opinionated, -ve opinionated, balanced, non-opinionated.
If the facet is binary, say gender, the system will produce two ranked lists for each query.
Will CMU be participating in the Blog track this year?
don't know if we're going to participate this year. I've got a busy summer and am getting interested in the relevance feedback track also, but this does seem like an interesting task.
If a relevant document is returned, but with the wrong facet value, would that count as non-relevant? Will you report facet accuracy as well as relevance?
At least for the example in the guidelines, relevance evaluation is being conflated with facet classification evaluation.
What is the motivation behind conflating these two evaluations, rather than reporting MAP and accuracy? I realize a user would see only a single ranked list, possibly filtered by facet value, but from the participants' perspective we might like to understand whether our system is failing on the relevance ranking or the facet value assignment.
Jon,
These are indeed very good points.
We should indeed separate facet accuracy classification from relevance, and I believe that this would be possible/doable. It is pretty much how we addressed the opinion finding task.
The relevance assessments for the blog distillation task this year will be performed by NIST.
We are still working on the details. Your feedback on the issue is very much appreciated.
Post a Comment