|Phases of a retrieval system deploying learning to rank, taken from Tonellotto et al, WSDM 2013.|
- Top K Retrieval, where a number of top-ranked documents are identified, which is known as the sample.
- Feature Extraction - various features are calculated for each of the sample documents.
- Learned Model Application - the learned model obtained from a learning to rank technique re-ranks the sample documents to better satisfy the user.
Our article The Whens and Hows of Learning to Rank in the Information Retrieval Journal studied the sample size parameter for many topic sets and learning to rank techniques - for the mixed information needs on the TREC ClueWeb09 collection, we found that while a sample size of 20 documents was sufficient for effective performance according to ERR@20, larger sample sizes of thousands of documents were needed for effective NDCG@20; for navigational information needs, predominantly larger samples sizes (upto 5000 documents) were needed; Moreover, the particular document representations that used to identify the sample was shown to have an impact on effectiveness - indeed, navigational queries were found to be considerably easier (requiring smaller samples) when anchor text was used, but for informational queries, the opposite was observed. In the article, we examined these issues in detail, across a number of test collections and learning to rank techniques, as well as investigating the role of the evaluation measure and its rank cutoff for listwise techniques - for in depth details and conclusions, see the IR Journal article.