Sunday, October 19, 2008

TREC Blog track will run in 2009

Following our previous post, I'm pleased to report that we have just heard that the TREC program committee has accepted our proposal for the blog track to continue in 2009.

The intention is to use a larger Blog collection, and to have at least one search task that goes beyond topical relevance by taking into account a facet representing an attribute of required "quality".

There will be a workshop to discuss the proposed blog search tasks at the TREC 2008 conference on the afternoon of Thursday 20th November 2008.

If you cannot attend TREC, and wish to make any comments or suggestions, please feel free to post your thoughts in this post, or to email them privately, if you wish so.

5 comments:

Jon said...

Great news! Looking forward to seeing you at the workshop.

Iadh Ounis said...

Thanks Jon.

See you in TREC!

Sérgio Nunes said...

These are good news! I look forward to the new blog collection and tasks for 2009.

One topic that I would like to see discussed is the cost of the Blog collection being used by TREC in this track. The current cost of the dataset is a significant investment. Particularly for small teams.

I recently found that ICWSM 2009 is teaming up with Spinn3r to make a large blog dataset available for their conference. I think that the Blog Tack should consider a similar approach.

Alternatively, finding industry sponsors to pay and host the custom built collection might be an option to significantly reduce participation costs.

Thanks for listening.

Iadh Ounis said...

Sergio,

You have raised this comment several times in the past. We appreciate your concern. During the previous years, we have been working flat out to ensure that the costs are as minimal as possible, while providing maximum support and promptness.

The rationale of the costs as well as our position remains the same though. Creating and distributing test collections is not as simple as you seem to suggest.

The current distribution mechanism mitigates copyright and other legal issues, while assuring a long-term availability of the collections.

For example, the ICWSM collections may only be available for a short span of time (e.g. the 2005 one is no longer distributed).

Be assured that the current collection fees are at their lowest possible values. They only contribute towards the costs of preparing and distributing the data. This is not unusual. For example, the Linguistic Data Consortium (LDC), which distributes many collections in IR and related fields adopts a similar distribution mechanism.

Sérgio Nunes said...

First of all, thanks for the feedback.

I've commented a couple of times on this issue since the cost of the collection is a big investment for very small teams like ours. I know other research groups that share this same concern.

However, I have never underestimated the effort needed to put together such a collection. My comments have always been towards pushing the costs to the industry, not eliminating those costs (not realistic). The (significant) work that you are doing in building the collections and organizing the Blog Track is of great value to the research community.