Tuesday, March 26, 2013

Learning to Rank Research Using Terrier - The Role of Features (Part 2)

This is the second post in a two-part series addressing our recent research in learning to rank. While the previous blog post addressed the role of the sample within learning to rank, and its impact on effectiveness and efficiency, in this blog post, I'll be discussing the role of different features within the learning to rank process.


With various IR approaches being proposed over the years, these have naturally formed features within learned approaches. Features can be calculated on the documents, in a query independent (e.g. URL length, PageRank) or query dependent (e.g. weighting or proximity models) manner. For instance, the LETOR learning to rank test collections deploy various (query dependent) weighting models calculated on different fields (body, title, anchor text, and URL). We have successfully been using the same four fields for representing Web documents in our participations to the TREC Web tracks.

The role of different weighting models - particularly those calculated on different fields - within learning to rank intrigued us and formed an article About Learning Models with Multiple Query Dependent Features that we recently published in Transactions in Information Systems. In particular, some learned models take the form of linear combinations of feature scores. In contrast, Robertson [CIKM 2004] warned against the linear combination of weighting model scores. Yet the LETOR test collections deploy weighting models deployed on each field (e.g. BM25 body, BM25 title, BM25 anchor text). Hence, among other research questions, we re-examined the role of field-based weighting models in the learning to rank era. Our findings on the TREC ClueWeb09 corpus showed that field-based models such as BM25F and PL2F are still important for effective learned models.

Our TOIS paper also shows how to efficiently calculate multiple query dependent features within in IR system. In particular, as the postings of an inverted index are compressed, it is expensive to calculate additional query dependent features once the top K sample has been identified, due to cost of decompressing the relevant parts of the inverted index posting lists again. Instead, we show how the postings of documents that are inserted into the top K documents within a DAAT retrieval strategy can be "cloned", and retained decompressed in memory, such that additional query dependent features can be calculated in the Feature Extraction phase. We call this the fat framework, as it "fattens" the result set with the postings of the query terms for those documents. We have implemented this framework within the Terrier IR platform, and it will be released as part of the next major release of Terrier, as described in our OSIR 2012 paper.

Finally, query features are an increasingly important aspect within learning to rank. In contrast to (query independent or query dependent) document features, query features have the same value for each document ranked for a query. In this way, query features can be said to be document independent. Our CIKM 2010 and SIGIR 2011 papers on diversification used query features to decide on the ambiguity of a query, or to decide on the likely type of information need underlying different aspects of an ambiguous query. On the other hand, the role of query features is to allow learners based on regression trees (e.g. GBRT, LambdaMART) to customise branches of the learned model for different types of query. For instance, if query length is a query feature, then the learner might recognise that a query dependent proximity (document) feature is important for two terms queries, but not for one term queries. In our CIKM 2012 short paper, we recognised the lack of a comprehensive study on the usefulness of query features for learning to rank. Our experiments combined the LambdaMART learning to rank technique with 187 different query features that were grouped into four types: pre-retrieval Query Performance Prediction, Query Concept Identification, Query Log Mining, and Query Type Classification. We found that over a quarter of the 187 query features could significantly improve the effectiveness of a learned model. The most promising query features were Query Type Classification, which identified the presence of entities in the query, suggesting that such features are useful for triggering sub-trees that promote entity homepages. Overall, we found query features could be employed to customise learned ranking models for queries with different popularity, length, difficulty, ambiguity, and related entities.


There remains a great deal of black magic involved in the effective and efficient application of learning to rank within information retrieval. Indeed, my colleagues and I strongly believe that empirically dervied best practices are an important part of information retrieval research. This series of blog posts has been aimed at addressing some of the aspects missing in the literature, and provide insights into our recent research within this area.


This body of research would not have been possible without a number of co-authors and contributors: Iadh Ounis, Rodrygo Santos, Nicola Tonellotto (CNR, Italy) and Ben He (University of the Chinese Academy of Science).

Key References

About Learning Models with Multiple Query Dependent Features. Craig Macdonald, Rodrygo L.T. Santos, Iadh Ounis and Ben He. Transactions in Information Systems, 2013, in press.

On the Usefulness of Query Features for Learning to Rank. Craig Macdonald, Rodrygo Santos and Iadh Ounis. In Proceedings of CIKM 2012.

Efficient and effective retrieval using selective pruning. Nicola Tonellotto, Craig Macdonald and Iadh Ounis. In Proceedings of WSDM 2013.

The Whens and Hows of Learning to Rank. Craig Macdonald, Rodrygo Santos and Iadh Ounis. Information Retrieval Journal, 2012.

Effect of Dynamic Pruning Safety on Learning to Rank Effectiveness. Craig Macdonald, Nicola Tonellotto, and Iadh Ounis. In Proceedings of SIGIR 2012.

No comments: