<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6043705792807544709</id><updated>2011-10-06T04:27:41.527+01:00</updated><category term='thesis'/><category term='SIGIR 2010'/><category term='CIKM'/><category term='Terrier'/><category term='WWW 2010'/><category term='twitter; CEO; Europe; Interface; Public relations'/><category term='opinion-finding; branding; blogging; TREC Blog track'/><category term='Computational Linguistics'/><category term='Faceted Blog search interface'/><category term='scarab'/><category term='jira'/><category term='forum'/><category term='Raleigh'/><category term='document prior'/><category term='blog search'/><category term='Grid_CLEF'/><category term='TREC Blog track'/><category term='Terrier; Hadoop; information retrieval; indexing'/><category term='Wikipedia'/><category term='phd'/><category term='TREC; Blog Track; Web Track; Microblog Track'/><category term='evaluation'/><category term='information retrieval'/><category term='deadlines'/><category term='Faceted Search'/><category term='training'/><category term='cfp'/><category term='Blogs08 collection'/><category term='NLP'/><category term='Diversity'/><category term='Top authors in IR; TerrierTeam; University of Glasgow'/><category term='ICWSM'/><category term='Entity search'/><category term='expert search; commoditising workers'/><category term='Blog track; TREC; Enterprise track; Relevance feedback track;'/><category term='ECIR 2011; DDR 2011; TREC; NTCIR; diversity; blog; microblog; crowdsourcing'/><category term='Ecir2010; No-shows; Best paper award; BCS; IRSG; KSJ award'/><category term='Sigir'/><category term='blog'/><category term='issues tracking'/><category term='Virtual observatory; Astronomy; Semantic web; Information Retrieval; Workshop'/><category term='CLEF'/><category term='mantis'/><category term='TREC Blog track; News search; Faceted search; Adhoc search'/><category term='Blog track; TREC'/><category term='CIKM 2011; Glasgow'/><category term='craig macdonald'/><category term='TREC'/><category term='query logs; data mining'/><category term='Correlator'/><category term='CIKM 2010; Diversity; CIKM 2011; Glasgow'/><category term='#Terrier35; Hadoop; DAAT; Next-generation DFR'/><category term='opinion-finding'/><category term='expert search'/><category term='travel; Blog search'/><category term='social media'/><category term='Blog track; TREC 2009; Search tasks'/><category term='RIAO 2010; Voting Model; News search; Blog search'/><category term='SIGIR 2009'/><title type='text'>TerrierTeam</title><subtitle type='html'>This is the &lt;a href="http://ir.dcs.gla.ac.uk/terrier"&gt;Terrier Team&lt;/a&gt; blog. It is managed by several members of the Terrier Team at the &lt;a href="http://www.gla.ac.uk"&gt;University of Glasgow&lt;/a&gt;. It is used to publicise our research projects, to post news about various research work and activities performed by the team, as well as to share information and thoughts on information retrieval and search engines issues.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Terrier Team @ Glasgow</name><uri>http://www.blogger.com/profile/11678159696002044810</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>38</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-6481444127443594931</id><published>2011-06-16T17:41:00.009+01:00</published><updated>2011-06-16T18:31:15.706+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='#Terrier35; Hadoop; DAAT; Next-generation DFR'/><title type='text'>Terrier 3.5 released</title><content type='html'>Today, we are proud to announce a brand new release of &lt;a href="http://terrier.org/"&gt;Terrier&lt;/a&gt;, our state-of-the-art open source information retrieval platform. Terrier 3.5 represents a significant update over its previous version (Terrier 3.0), including:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://terrier.org/docs/v3.5/javadoc/org/terrier/matching/daat/package-summary.html"&gt;Document-at-a-time (DAAT)&lt;/a&gt; retrieval for large indices&lt;/li&gt;&lt;li&gt;Refactored &lt;a href="http://terrier.org/docs/v3.5/javadoc/org/terrier/indexing/tokenisation/package-summary.html"&gt;tokenisation&lt;/a&gt; for enhanced &lt;a href="http://terrier.org/docs/v3.5/languages.html"&gt;multi-language support&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Upgraded &lt;a href="http://terrier.org/docs/v3.5/hadoop_configuration.html"&gt;Hadoop support&lt;/a&gt; to version 0.20&lt;/li&gt;&lt;li&gt;&lt;a href="http://terrier.org/docs/v3.5/querylanguage.html"&gt;Synonym support&lt;/a&gt; in query language and retrieval&lt;/li&gt;&lt;li&gt;Out-of-the box indexing support for &lt;a href="http://terrier.org/docs/v3.5/terrier_http.html"&gt;query-biased summaries and improved example web-based interface&lt;/a&gt;&lt;/li&gt;&lt;li&gt;New, 2nd generation &lt;a href="http://terrier.org/docs/v3.5/dfr_description.html"&gt;DFR models&lt;/a&gt; as well as other recent effective information-theoretic models&lt;/li&gt;&lt;li&gt;Fully revised and improved &lt;a href="http://terrier.org/docs/v3.5/"&gt;documentation&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Many more JUnit tests (now 300+)&lt;/li&gt;&lt;/ul&gt;Check out the full &lt;a href="http://terrier.org/docs/current/whats_new.html"&gt;change log&lt;/a&gt; for this release and &lt;a href="http://terrier.org/download"&gt;upgrade to Terrier 3.5&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;Many thanks to everyone at the &lt;a href="http://terrierteam.dcs.gla.ac.uk/"&gt;TerrierTeam&lt;/a&gt; and all &lt;a href="http://terrier.org/people.html"&gt;Terrier contributors&lt;/a&gt; for their hard work making this release possible!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-6481444127443594931?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/6481444127443594931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=6481444127443594931' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6481444127443594931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6481444127443594931'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2011/06/terrier-35-released.html' title='Terrier 3.5 released'/><author><name>Rodrygo L.T. Santos</name><uri>http://www.blogger.com/profile/09502952528669992135</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://4.bp.blogspot.com/_JtuxhJ3QzZg/STMMxScZpiI/AAAAAAAAAEY/Zi4Nmre6mfk/S220/n767603947_216647_5827.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-6298271730475918136</id><published>2011-04-27T12:42:00.010+01:00</published><updated>2011-04-27T17:46:12.066+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ECIR 2011; DDR 2011; TREC; NTCIR; diversity; blog; microblog; crowdsourcing'/><title type='text'>ECIR 2011 + DDR 2011 in Dublin</title><content type='html'>Last week, a few of us attended &lt;a href="http://www.ecir2011.dcu.ie/"&gt;ECIR 2011&lt;/a&gt;  in Dublin. The conference was a resounding success  both in terms of  its program and organisation.  Compared to last year, the event was very  well attended with about 250 delegates registered to the conference  and/or its satellite events. The majority of delegates were from Ireland  and the United Kingdom.&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Workshops&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;The kick-off was on Monday, with a selection of workshops and tutorials at the fabulous &lt;a href="http://www.guinness-storehouse.com/"&gt;Guinness Storehouse&lt;/a&gt;. We attended the &lt;a href="http://www.dcs.gla.ac.uk/workshops/ddr2011/"&gt;Diversity in Document Retrieval (DDR 2011) workshop&lt;/a&gt;, jointly organised by &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/"&gt;Craig Macdonald&lt;/a&gt;, &lt;a href="http://web4.cs.ucl.ac.uk/staff/jun.wang/blog/"&gt;Jun Wang&lt;/a&gt;, and &lt;a href="http://plg.uwaterloo.ca/%7Eclaclark/"&gt;Charlie Clarke&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The  DDR workshop was sometimes a standing-room only event and appeared to  be the largest workshop of the conference. It was structured around  three broad themes: &lt;span style="font-style: italic; font-weight: bold;"&gt;evaluation&lt;/span&gt;, &lt;span style="font-style: italic; font-weight: bold;"&gt;modelling&lt;/span&gt;, and &lt;span style="font-style: italic; font-weight: bold;"&gt;applications&lt;/span&gt;. Besides good keynotes by &lt;a href="http://research.microsoft.com/en-us/people/tesakai/"&gt;Tetsuya Sakai&lt;/a&gt; and &lt;a href="http://disi.unitn.it/moschitti/"&gt;Alessandro Moschitti&lt;/a&gt;,  the workshop featured technical and position paper presentations, as  well as a poster session and a breakout group discussion on all three  workshop themes. While there was no agreement on a possible "killer  application" for diversity, there was a consensus that diversity is best  described or seen as the &lt;span style="font-style: italic;"&gt;lack of contex&lt;/span&gt;t. In addition, a few key points arose across the boundaries of the tackled themes:&lt;br /&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Representing diversity&lt;/span&gt;&lt;br /&gt;How  to best represent the possible multiple information needs underlying a  query? Should this representation reflect the interests of the user  population, or should it be itself diverse?&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;Measuring diversity&lt;/span&gt;&lt;br /&gt;What  does diversity mean and how should it be promoted in different  scenarios? The workshop featured some ideas for applications, including  expert search, geographical IR, and graph summarisation.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;Unifying diversity&lt;/span&gt;&lt;br /&gt;How  to diversify across multiple search scenarios (e.g., multiple verticals  of a search engine)? How to convey a summary relevant to multiple  information needs in a single page of results?&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: left;"&gt;Some of these ideas are currently being investigated as part of the &lt;a href="http://www.thuir.org/intent/ntcir9/"&gt;NTCIR-9 Intent&lt;/a&gt; task. Charlie was also keen to consider these questions in future incarnations of the diversity task in the &lt;a href="http://plg.uwaterloo.ca/%7Etrecweb/"&gt;TREC Web track&lt;/a&gt;. During the workshop, &lt;a href="http://www.dcs.gla.ac.uk/%7Erodrygo"&gt;Rodrygo&lt;/a&gt; presented our position paper entitled "&lt;a href="http://terrierteam.dcs.gla.ac.uk/publications/santos2011ddr.pdf"&gt;Diversifying for multiple information needs&lt;/a&gt;". The full DDR &lt;a href="http://www.dcs.gla.ac.uk/workshops/ddr2011/ddr2011.proceedings.pdf"&gt;workshop proceedings&lt;/a&gt;  are available online.&lt;br /&gt;&lt;br /&gt;While we haven't attended it, it was of note that the &lt;a href="http://ir.cis.udel.edu/ECIR11Sessions/index.html"&gt;Information Retrieval Over Query Sessions&lt;/a&gt; workshop, which was held at the same time as DDR, also received very good and positive feedback from its attendees.&lt;br /&gt;&lt;br /&gt;The workshops were followed by an excellent welcome  reception where the least we could say is that Guinness was not in  shortage.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Conference&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;On Tuesday, the main conference took over with a diverse (no pun intended) &lt;a href="http://www.ecir2011.dcu.ie/program/"&gt;program&lt;/a&gt;. The conference started with a thoughtful keynote by &lt;a href="http://www.uta.fi/%7Elikaja/"&gt;Kalervo Järvelin&lt;/a&gt;  who urged the information retrieval community to see beyond the  [search] box. The keynote led to some very interesting discussions about  whether IR is a science or a technology (i.e. mostly about  engineering). We would like to believe that it is science, although some  delegates argued (sadly) for the opposite.&lt;br /&gt;&lt;br /&gt;&lt;span class="fontbold font10"&gt;The second keynote was given by &lt;a href="http://research.yahoo.com/Evgeniy_Gabrilovich"&gt;Evgeniy Gabrilovich&lt;/a&gt;, winner of this year's&lt;/span&gt;  &lt;a href="http://irsg.bcs.org/ksjaward.php"&gt;            &lt;/a&gt;&lt;span class="fontbold font10"&gt;&lt;a href="http://irsg.bcs.org/ksjaward.php"&gt;KSJ Award&lt;/a&gt;. &lt;/span&gt;&lt;span class="fontbold font10"&gt;Evgeniy provided a very comprehensive overview  of the fascinating computational advertising field, highlighting &lt;/span&gt;the current state-of-the-art and possible future research directions. We were encouraged to hear about the &lt;a href="http://labs.yahoo.com/Academic_Relations/Faculty"&gt;Yahoo! Faculty Research and Engagement Program (FREP)&lt;/a&gt;,  which might allow academics to access the necessary datasets to conduct  research in a field that has been thus far the sole territory of  researchers based in industry.&lt;br /&gt;&lt;br /&gt;The last keynote talk was superbly  given by &lt;a href="http://www.cs.cornell.edu/people/tj/"&gt;Thorsten Joachims&lt;/a&gt;  about the value of user feedback. Thorsten convincingly argued for the  importance of collecting user feedback as an intrinsic part of both the  retrieval and learning processes. The talk highlighted how user feedback  could improve the quality of retrieval and by how much. We wish that  the slides will be made publicly available at some point.&lt;br /&gt;&lt;br /&gt;As for  the rest of the program, there were two types of papers/presentations:  full papers were presented in 30 min, while short papers had only 15  min.  As usual, the quality of papers (or at least the presentations)  varied from the outstanding to the less good. One suggestion for future  ECIR conferences is to limit all the talks to at most 20 min,  encouraging conciseness and pushing the speakers to focus on the  "message out of the bottle". Indeed, some talks appeared to be  exceedingly long with respect to their informative content. While we see  the value of giving a 30 min slot to a 10-pages long ACM-style paper,  there does not seem to be a valid reason for giving that much time for a  (comparatively much shorter) 12-pages LNCS-style paper.&lt;br /&gt;&lt;br /&gt;It was  interesting  to see several Twitter-related papers in the program,   suggesting that the community will find the upcoming new &lt;a href="http://sites.google.com/site/trecmicroblogtrack/"&gt;TREC 2011 Microblog track&lt;/a&gt;   and its corresponding shared dataset particularly useful/helpful. The  theme of crowdsourcing was also highly featured in the conference, with  several papers showing how cheap and reliable relevance assessments  could be obtained through the &lt;a href="https://www.mturk.com/mturk/welcome"&gt;Amazon Mechanical Turk&lt;/a&gt; or similar services. Finally, we were very pleased to see many presented papers using our open source &lt;a href="http://terrier.org/"&gt;Terrier&lt;/a&gt; software in their experiments.&lt;br /&gt;&lt;br /&gt;Overall, a few papers caught our attention and were particularly interesting:&lt;br /&gt;&lt;/div&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;On the contributions of topics to system evaluation&lt;/span&gt;&lt;br /&gt;Steve Robertson&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Caching for realtime search&lt;/span&gt; - in our opinion by far the best paper/presentation of the conference&lt;br /&gt;Edward Bortnikov, Ronny Lempel and Kolman Vornovitsky &lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Are semantically related links effective for retrieval?&lt;/span&gt;&lt;br /&gt;Marijn Koolen and Jaap Kamps&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;A methodology for evaluating aggregated search results&lt;/span&gt; - Excellent paper/presentation that was also awarded the best student paper award&lt;br /&gt;Jaime Arguello, Fernando Diaz, Jamie Callan and Ben Carterette&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Design and implementation of relevance assessments using crowdsourcing&lt;/span&gt;&lt;br /&gt;Omar Alonso and Ricardo Baeza-Yates&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The power of peers &lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;Nick Craswell, Dennis Fetterly and Marc Najork&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;&lt;span style="font-style: italic;"&gt;Automatic people tagging for expertise profiling in the enterprise&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Pavel Serdyukov, Mike Taylor, Vishwa Vinary, Matthew Richardson and Ryen W. White&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;What makes re-finding information difficult? A study of email re-finding&lt;/span&gt;&lt;br /&gt;David Elsweiler, Mark Baillie and Ian Ruthven&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: left;"&gt;Of  course, we also recommend our own paper, which was nominated for best  paper award, and for which we received excellent feedback:&lt;br /&gt;&lt;/div&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;&lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/publications/macdonald11learned.pdf"&gt;&lt;span style="font-style: italic;"&gt;Learning models for ranking aggregates&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;Craig Macdonald and Iadh Ounis&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: left;"&gt;The program also featured a busy poster and demo session. We liked the work of &lt;span style="font-style: italic;"&gt;Gerani Keikha, Carman and Crestani&lt;/span&gt; concerning identifying personal blogs using the &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;TREC Blog track&lt;/a&gt;, and that of &lt;span style="font-style: italic;"&gt;Perego, Silvestri and Tonellotto,&lt;/span&gt;  which suggests that document length can be quantized from docids  without loss of retrieval effectiveness. There were also several  interesting demos that caught our eye:&lt;br /&gt;&lt;/div&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;ARES - A retrieval engine based on sentiments: Sentiment-based search result annotation and diversification&lt;/span&gt; - which used our &lt;a href="http://ir.dcs.gla.ac.uk/terrier/publications/santos10www.pdf"&gt;xQuAD framework&lt;/a&gt; for diversifying sentiments&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;br /&gt;Gianluca Demartini&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Conversation Retrieval from Twitter&lt;/span&gt;&lt;br /&gt;Matteo Magnani, Danilo Montesi, Gabriele Nnziante and Luca Rossi&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Finding Useful Users on Twitter: Twittomender the Followee Recommender&lt;/span&gt; - addressed the Who to Follow (WTF?) task on Twitter&lt;br /&gt;John Hannon, Kevin McCarthy and Barry Smyth&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The ECIR organisers hosted a particularly sumptuous conference banquet at the impressive, unique and beautiful venue of &lt;a href="http://www.villageatlyons.com/"&gt;The Village at Lyons Demesne&lt;/a&gt;  in County Kildare.  The journey to the village was a welcome break from  the hotel setting of the conference and its technical program.&lt;br /&gt;&lt;br /&gt;On the last day of the conference, and concurrently to the technical research sessions, an &lt;a href="http://www.ecir2011.dcu.ie/program/industry-day/"&gt;Industry Day&lt;/a&gt; event was under way. However, we only had the chance to go and see the excellent talk by &lt;a href="http://research.yahoo.com/Flavio_Junqueira"&gt;Flavio Junqueira&lt;/a&gt;  on the practical aspects of caching in search engine deployments.  There is a comprehensive summary of the whole Industry program in this &lt;a href="http://www.flax.co.uk/blog/2011/04/27/ecir-2011-industry-day-part-1-of-2/"&gt;blog post&lt;/a&gt;. We  believe that the planning of the Industry Day event in parallel to the  technical sessions was detrimental to attendance. Next year, the  Industry Day will be held after the conference ends.&lt;br /&gt;&lt;br /&gt;Finally, we would like to thank the organisers of ECIR 2011 for a very enjoyable conference, and a great stay in Dublin. &lt;a href="http://ecir2012.upf.edu/"&gt;ECIR 2012&lt;/a&gt; will be held in Barcelona, Spain, between 1st and 5th April 2012. We hope to see you all there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-6298271730475918136?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/6298271730475918136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=6298271730475918136' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6298271730475918136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6298271730475918136'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2011/04/ecir-2011-ddr-2011-in-dublin.html' title='ECIR 2011 + DDR 2011 in Dublin'/><author><name>Terrier Team @ Glasgow</name><uri>http://www.blogger.com/profile/11678159696002044810</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-3693146802348032462</id><published>2010-11-26T12:39:00.026Z</published><updated>2010-11-30T13:18:00.329Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='TREC; Blog Track; Web Track; Microblog Track'/><title type='text'>TREC 2010 Roundup</title><content type='html'>Back from another successful TREC conference on the NIST campus. 2010 is a transition year, with the end of old tracks and the proposition of new ones. Indeed, TREC is moving with the times, looking at new data sources and test collections, as well as new evaluation strategies.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; font-weight: bold;font-size:130%;" &gt;Outwith the old . . . &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For example, TREC 2010 marks the end of the Relevance Feedback and Blog tracks. While TREC 2010 will be the  last year of the Relevance Feedback track, the Blog track, which has been running for the last 5 years,  is now morphing into a new Microblog track, investigating real-time and social search tasks in Twitter. A brand new test collection possibly containing 2 months of tweets is planned, with linked web-pages and a partial follower graph. &lt;a href="http://groups.google.com/group/trec-microblog"&gt;Join the Microblog track googlegroup&lt;/a&gt; to obtain the latest updates and &lt;a href="http://twitter.com/trecmicroblog"&gt;follow the Microblog track on Twitter&lt;/a&gt;.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_DKIEnisEWSk/TO-4vEbnWoI/AAAAAAAAAAU/pqc1yA4bGbE/s1600/microblogpostercraig.jpg"&gt;&lt;br /&gt;&lt;/a&gt;&lt;a href="http://4.bp.blogspot.com/_DKIEnisEWSk/TO_I_1T_sXI/AAAAAAAAAAc/DzsBgE5fZeE/s1600/microblogpostercraig.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 280px;" src="http://4.bp.blogspot.com/_DKIEnisEWSk/TO_I_1T_sXI/AAAAAAAAAAc/DzsBgE5fZeE/s320/microblogpostercraig.jpg" alt="" id="BLOGGER_PHOTO_ID_5543870665284628850" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;TREC 2011 will also witness the initiation of the new Medical Records  track, dedicated to investigating approaches to access free-text fields  of electronic medical records.&lt;br /&gt;&lt;br /&gt;On the test collection front, the Web track is also forward planning a new large-scale dataset to replace ClueWeb09. Indications are that this new dataset will be about the same scale as ClueWeb09 but might provide more temporal information (multiple versions of a page or site over time). Moreover, we have suggested that this might be the heart of a larger dataset comprised of multiple parallel/aligned corpora, for example blogs and news feeds covering the same timeframe.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;font-size:130%;" &gt;TREC Assessors, Relevant?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In terms of evaluation, 2010 marks the first year where evaluation judgments were crowdsourced using an online worker marketplace, as opposed to relying on TREC assessors, the participants themselves, or a select group of experts. Indeed, both the Blog track and the Relevance Feedback track crowdsourced some of their evaluation (although the Relevance Feedback track suffered many setbacks and its crowdsourcing process is still incomplete). Furthermore, to investigate the challenges in this new field of crowdsourcing, a specific Crowdsourcing track has been created and will run in 2011. More details can be found &lt;a href="http://groups.google.com/group/trec-crowd"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-size:130%;" &gt; &lt;span style="font-weight: bold;"&gt;Themes&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As  usual, themes emerged within the various tracks. Learned approaches were  far more prevalent this year, now that training data was available for  the ClueWeb09 dataset. Indeed, the Web track was dominated by trained  models mostly based on link and proximity search features. Diversification, on the other  hand, remains a challenging task, with the top groups leaving their  initial rankings as is. An outstanding exception is our own approach  using the &lt;a href="http://ir.dcs.gla.ac.uk/terrier/publications/santos10www.pdf"&gt;xQuAD framework&lt;/a&gt; under a &lt;a href="http://ir.dcs.gla.ac.uk/terrier/publications/santos2010cikm.pdf"&gt;selective diversification&lt;/a&gt; regime, which further improves our strongly performing adhoc baseline. &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig Macdonald&lt;/a&gt; presented our work in the Web track plenary session.&lt;br /&gt;&lt;br /&gt;In  the Blog track, voting model-based  and language  modeling approaches proved popular  for blog distillation. For faceted blog ranking, participants  employed variants of facet dictionaries to either train a classifier or  as features for learning. For the top news task, participants deployed a  wide variety of methods to rank news stories in a &lt;span style="font-style: italic;"&gt;real-time&lt;/span&gt; setting,  from probabilistic modeling to &lt;a href="http://terrierteam.dcs.gla.ac.uk/publications/richard10riao_168.pdf"&gt;blog post voting with historical evidence.&lt;/a&gt; Richard Mccreadie presented our work on the blog track as a poster during TREC 2010, which attracted very interesting discussions.&lt;br /&gt;&lt;br /&gt;During the TREC conference, &lt;a href="http://www.dcs.gla.ac.uk/%7Eounis"&gt;Iadh Ounis&lt;/a&gt;, &lt;a href="http://www.dcs.gla.ac.uk/%7Erichardm"&gt;Richard Mccreadie&lt;/a&gt; and others have done a fair amount of tweeting. You can follow some bits of the TREC conference through the &lt;a href="http://twitter.com/#search?q=%23trec2010"&gt;#trec2010&lt;/a&gt; hashtag.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-3693146802348032462?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/3693146802348032462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=3693146802348032462' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/3693146802348032462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/3693146802348032462'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/11/trec-2010-roundup.html' title='TREC 2010 Roundup'/><author><name>Richard McCreadie</name><uri>http://www.blogger.com/profile/11063287777854855902</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_DKIEnisEWSk/TO_I_1T_sXI/AAAAAAAAAAc/DzsBgE5fZeE/s72-c/microblogpostercraig.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-1062379729812288217</id><published>2010-11-03T18:03:00.006Z</published><updated>2010-11-03T19:50:29.410Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='CIKM 2010; Diversity; CIKM 2011; Glasgow'/><title type='text'>CIKM 2010 in Toronto, ON, Canada</title><content type='html'>&lt;p style="margin-bottom: 0cm;"&gt;I'm back from Toronto, where a few of us attended the CIKM 2010 conference last week. On Friday, I presented our paper on &lt;a href="http://ir.dcs.gla.ac.uk/terrier/publications/santos2010cikm.pdf"&gt;&lt;span style="font-style: italic;"&gt;"Selectively diversifying Web search results"&lt;/span&gt;&lt;/a&gt;, a joint work with &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/"&gt;Craig Macdonald&lt;/a&gt; and &lt;a href="http://www.dcs.gla.ac.uk/%7Eounis/"&gt;Iadh Ounis&lt;/a&gt;. This work extends our successful participation in the diversity task of the &lt;a href="http://trec.nist.gov/pubs/trec18/papers/uglasgow.BLOG.ENT.MQ.RF.WEB.pdf"&gt;TREC 2009 Web track&lt;/a&gt;, by investigating the need for search result diversification in the first place. In particular, we proposed a novel supervised learning approach to predict not only whether promoting diversity is beneficial, but also how much diversification should be applied to attain an effective retrieval performance on a per-query basis. After thorough, large-scale experiments with over 900 query features, we found that our selective approach can substantially improve existing diversification approaches, including our &lt;a href="http://ir.dcs.gla.ac.uk/terrier/publications/santos10www.pdf"&gt;state-of-the-art xQuAD framework&lt;/a&gt;. Nonetheless, we believe the significance of our contribution goes beyond these successful results. Indeed, it was with great pleasure that we heard from the NTCIR organisers that NTCIR-9 will run an &lt;a href="http://www.thuir.org/intent/ntcir9/"&gt;Intent task&lt;/a&gt;, aimed---among other things---at selectively diversifying search results, an area where we are proud to be pioneers.&lt;/p&gt;  &lt;p style="margin-bottom: 0cm;"&gt;Besides our own paper, a few other papers caught my attention:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Web Search Solved? All Result Rankings the Same?&lt;/span&gt; by Hugo Zaragoza, B. Barla Cambazoglu and Ricardo Baeza-Yates&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Reverted Indexing for Feedback and Expansion&lt;/span&gt;, by Jeremy Pickens, Matthew Cooper and Gene Golovchinsky&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Rank Learning for Factoid Question Answering with Linguistic and Semantic Constraints&lt;/span&gt;, by Matthew Bilotti, Jonathan Elsas, Jaime Carbonell and Eric Nyberg&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Organizing Query Completions for Web Search&lt;/span&gt;, by Alpa Jain and Gilad Mishne&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models&lt;/span&gt;, by Jianfeng Gao, Xiaodong He and Jian-Yun Nie&lt;/li&gt;&lt;/ul&gt;        &lt;p style="margin-bottom: 0cm;"&gt;The conference also featured great keynotes, of which those by Jamie Callan and Susan Dumais deserve a particular mention. Jamie talked about his view for the future of search, in which search engines capable of fully leveraging the structure of queries and documents would enable more sophisticated applications built on top of them. Susan addressed the temporal evolution of Web content, how it impacts the way users access this content, and how test collections should account for it. For more details, have a look at the excellent posts by Gene Golovchinsky on &lt;a href="http://palblog.fxpal.com/?p=4866"&gt;Jamie&lt;/a&gt; and &lt;a href="http://palblog.fxpal.com/?p=4873"&gt;Susan&lt;/a&gt;'s talks.&lt;br /&gt;&lt;/p&gt;&lt;p style="margin-bottom: 0cm;"&gt;Last but not least, many of us were involved in promoting the next edition of CIKM, to be held here in Glasgow. There was a lot of excitement from the several people that visited our booth, and also during the hand-over talk at the end of the conference. Well done Jon, Mary, Craig, and Iadh for the hard work! The arrangements for &lt;a href="http://www.cikm2011.org/"&gt;CIKM 2011&lt;/a&gt; are well advanced, and the &lt;a href="http://www.cikm2011.org/callforpapers"&gt;call for papers&lt;/a&gt; is now online. You can also follow the latest news about CIKM 2011 on &lt;a href="http://twitter.com/CIKM2011"&gt;Twitter&lt;/a&gt;,  &lt;a href="http://www.facebook.com/group.php?gid=171830502274"&gt;Facebook&lt;/a&gt;, &lt;a href="http://events.linkedin.com/CIKM-2011-20th-ACM-Conference/pub/162795"&gt;LinkedIn&lt;/a&gt;, and &lt;a href="http://lanyrd.com/2011/cikm/"&gt;Lanyrd&lt;/a&gt;. We look forward to welcoming you all to Glasgow next year! &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-1062379729812288217?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/1062379729812288217/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=1062379729812288217' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/1062379729812288217'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/1062379729812288217'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/11/cikm-2010-in-toronto-on-canada.html' title='CIKM 2010 in Toronto, ON, Canada'/><author><name>Rodrygo L.T. Santos</name><uri>http://www.blogger.com/profile/09502952528669992135</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://4.bp.blogspot.com/_JtuxhJ3QzZg/STMMxScZpiI/AAAAAAAAAEY/Zi4Nmre6mfk/S220/n767603947_216647_5827.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-3268180482641530828</id><published>2010-07-20T08:49:00.006+01:00</published><updated>2010-07-20T09:02:02.130+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SIGIR 2010'/><title type='text'>Terrier Team at SIGIR 2010 in Geneva</title><content type='html'>&lt;a href="http://sigir2010.org/doku.php"&gt;SIGIR 2010&lt;/a&gt; has just started in Geneva. From the &lt;a href="http://terrierteam.dcs.gla.ac.uk/"&gt;TerrierTeam&lt;/a&gt;, &lt;a href="http://twitter.com/richardm_"&gt;Richard&lt;/a&gt; and &lt;a href="http://twitter.com/craig_macdonald"&gt;myself&lt;/a&gt; are attending.&lt;br /&gt;&lt;br /&gt;On Monday, Richard presented his PhD topic, &lt;a href="http://portal.acm.org/citation.cfm?id=1835449.1835692"&gt;Leveraging User-generated Content for News Search&lt;/a&gt; at the doctoral consortium.&lt;br /&gt;&lt;br /&gt;Later, at the &lt;a href="http://research.microsoft.com/en-us/events/webngram/"&gt;Web Ngram workshop&lt;/a&gt;, I'll be presenting a paper on &lt;a href="http://www.dcs.gla.ac.uk/~craigm/publications/macdonald10_prox.pdf"&gt;Global Statistics in Proximity Weighting Models&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;About the same time, Richard will be presenting at the &lt;a href="http://www.ischool.utexas.edu/~cse2010/"&gt;Crowdsourcing for Search Evaluation&lt;/a&gt; workshop. His paper on &lt;a href="http://www.dcs.gla.ac.uk/~richardm/papers/CrowdsourcingNQC.pdf"&gt;Crowdsourcing a News Query Classification Dataset&lt;/a&gt; examines the effectiveness of different interfaces for having Mechanical Turkers classify queries as news-related or not.&lt;br /&gt;&lt;br /&gt;Last but not least, and continuing on our proximity theme, &lt;a href="http://hpc.isti.cnr.it/~khast"&gt;Nicola Tonellotto&lt;/a&gt; from CNR is presenting our joint work titled &lt;a href="http://www.dcs.gla.ac.uk/~craigm/publications/tonellotto_lsds2010.pdf"&gt;Efficient Dynamic Pruning with Proximity Support&lt;/a&gt; at the &lt;a href="http://www.lsdsir.org/"&gt;Large Scale &amp; Distributed Systems&lt;/a&gt; workshop.&lt;br /&gt;&lt;br /&gt;Meanwhile, please say hello if you see us at the conference, or stay up to date by following &lt;a href="http://twitter.com/#search?q=%23sigir2010"&gt;#sigir2010&lt;/a&gt;. And remember, if you are near the registration desk, please pick up flyers for &lt;a href="http://terrier.org"&gt;Terrier&lt;/a&gt; and &lt;a href="http://cikm2011.org"&gt;CIKM 2011&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-3268180482641530828?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/3268180482641530828/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=3268180482641530828' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/3268180482641530828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/3268180482641530828'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/07/terrier-team-at-sigir-2010-in-geneva.html' title='Terrier Team at SIGIR 2010 in Geneva'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5268153740599849646</id><published>2010-07-19T13:04:00.017+01:00</published><updated>2010-07-19T13:41:07.132+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Top authors in IR; TerrierTeam; University of Glasgow'/><title type='text'>Top Authors in Information Retrieval</title><content type='html'>Thanks to &lt;span class="fn"&gt;&lt;a href="http://twitter.com/SSN"&gt;Sérgio Nunes&lt;/a&gt; who alerted us to this ranking by &lt;/span&gt;&lt;a href="http://academic.research.microsoft.com/"&gt;Microsoft Academic Search&lt;/a&gt; of the &lt;a href="http://academic.research.microsoft.com/CSDirectory/author_category_8_last5.htm"&gt;Top Authors in Information Retrieval&lt;/a&gt;, in the past 5 years.&lt;br /&gt;&lt;br /&gt;According to this recent &lt;a href="http://academic.research.microsoft.com/CSDirectory/author_category_8_last5.htm"&gt;ranking&lt;/a&gt;, two members of the &lt;a href="http://terrierteam.dcs.gla.ac.uk/"&gt;TerrierTeam,&lt;/a&gt; namely &lt;a href="http://www.dcs.gla.ac.uk/%7Eounis"&gt;Iadh Ounis&lt;/a&gt; and &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig Macdonald&lt;/a&gt;, are in the top 5 authors in Information Retrieval in the past 5 years (position #1 and #4, respectively). The ranking is based on &lt;a href="http://academic.research.microsoft.com/About/Help.htm#Ranking"&gt;in-domain citations&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This good news comes just at the start of the &lt;a href="http://www.sigir2010.org/doku.php"&gt;SIGIR 2010&lt;/a&gt; Conference, which will be held in Geneva, Switzerland this week (19-23 July 2010). Several members of the team will be in attendance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5268153740599849646?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5268153740599849646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5268153740599849646' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5268153740599849646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5268153740599849646'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/07/top-authors-in-information-retrieval.html' title='Top Authors in Information Retrieval'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-8650440671978066990</id><published>2010-05-04T01:48:00.014+01:00</published><updated>2010-05-04T10:31:00.869+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Raleigh'/><category scheme='http://www.blogger.com/atom/ns#' term='Diversity'/><category scheme='http://www.blogger.com/atom/ns#' term='WWW 2010'/><title type='text'>WWW 2010 in Raleigh, NC, USA</title><content type='html'>&lt;span style="" lang="EN-GB"&gt;I am back from the sunny &lt;a href="http://www.visitraleigh.com/"&gt;Raleigh, NC, USA&lt;/a&gt;. Besides the nice weather, I had a great time last week attending the &lt;a href="http://www2010.org/"&gt;19th International World Wide Web Conference (WWW 2010)&lt;/a&gt;, where I presented our paper on &lt;i style=""&gt;&lt;a href="http://ir.dcs.gla.ac.uk/terrier/publications/santos10www.pdf"&gt;Exploiting query reformulations for Web search result diversification&lt;/a&gt;&lt;/i&gt;, a joint work with &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig Macdonald&lt;/a&gt; and &lt;a href="http://www.dcs.gla.ac.uk/%7Eounis"&gt;Iadh Ounis&lt;/a&gt;. The paper introduces a probabilistic formulation of our xQuAD framework for search result diversification, and analyses the effectiveness of query reformulations provided by three commercial search engines for the diversification task. My talk was very well received, with lots of questions from the audience, and subsequent chatting with many people from both academia and industry.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_JtuxhJ3QzZg/S993IPRHQXI/AAAAAAAAAMI/tB87Q8WjVj8/s1600/DSC01010.JPG"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 225px;" src="http://1.bp.blogspot.com/_JtuxhJ3QzZg/S993IPRHQXI/AAAAAAAAAMI/tB87Q8WjVj8/s400/DSC01010.JPG" alt="" id="BLOGGER_PHOTO_ID_5467219456072040818" border="0" /&gt;&lt;/a&gt;&lt;span style="" lang="EN-GB"&gt;&lt;br /&gt;The blend academia-industry was indeed a signature of WWW. I was also impressed with the multidisciplinary nature of the confere&lt;/span&gt;nce&lt;span style="" lang="EN-US"&gt;—&lt;/span&gt;wit&lt;span style="" lang="EN-GB"&gt;h up to five parallel sessions, there was always something for everyone! In particular, from the sessions I attended, a few papers caught my attention:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/i&gt;&lt;ul&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Clustering query refinements by user intent&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Eldar Sadikov et al. (Stanford University and Google)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Optimal rare query suggestion with implicit user feedback&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Yang Song and Li-wei He (Microsoft Research)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Building taxonomy of Web search intents for name entity queries&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Xiaoxin Yin and Sarthak Shah (Microsoft Research)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Exploring Web scale language models for search query processing&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Jian Huang et al. (Microsoft Research Asia, Facebook, and Penn State University)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Classification-enhanced ranking&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Paul N. Bennett et al. (Microsoft Research)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-US"&gt;Ranking specialization for Web search: A divide-and-conquer approach by using topical RankSVM&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-US"&gt;, by Jiang Bian et al. (Georgia Tech and Yahoo! Labs)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Generalized distances between rankings&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Ravi Kumar and Sergei Vassilvitskii (Yahoo! Research)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Relational duality: Unsupervised extraction of semantic relations between entities on the Web&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Danushka T. Bollegala et al. (University of Tokyo)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;The conference also featured three passionate keynotes:&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="" lang="EN-GB"&gt;&lt;a href="http://www.google.com/corporate/execs.html#vint"&gt;Vint Cerf&lt;/a&gt; discussed a broad range of topics of interest on today's Web, where &lt;a href="http://analytics.ncsu.edu/reports/www/www2010-cerf.pdf"&gt;everything is connected&lt;/a&gt;: 1.8 billion users, around a billion Web-enabled mobile devices, and still a large room for growth in developing countries. Touched points included the implications of the explosion of data production on mobility, accessibility, security and privacy, intellectual property, digital preservation, as well as new technologies (e.g., cloud computing).&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="" lang="EN-GB"&gt;&lt;span style=""&gt;&lt;/span&gt;&lt;a href="http://www.danah.org/"&gt;dannah boyd&lt;/a&gt; discussed &lt;a href="http://www.danah.org/papers/talks/2010/WWW2010.html"&gt;privacy implications of the availability of "big data"&lt;/a&gt;. Her keynote revolved around common misconceptions associated with the analysis of data produced by online social activities, as well as ethical concerns related to using this data in the first place, "just because it is accessible".&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;Carl Malamud&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt; from &lt;a href="http://public.resource.org/"&gt;public.resource.org&lt;/a&gt; described his experiences trying to convince seven bureaucratic institutions to make public data publicly accessible. His keynote was organised around &lt;a href="http://www.elon.edu/e-web/predictions/futureweb2010/carl_malamud_www_keynote.xhtml"&gt;"10 rules for radicals"&lt;/a&gt;, a guide on how to break the barriers towards negotiating with bureaucrats.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;On Thursday night, the conference banquet featured an exciting performance by the North Carolina string band &lt;a href="http://www.carolinachocolatedrops.com/"&gt;Carolina Chocolate Drops&lt;/a&gt;. Check out &lt;i style=""&gt;&lt;a href="http://www.youtube.com/watch?v=_Sk3mNm2Mfs"&gt;Snowden's Jig (Genuine Negro Jig)&lt;/a&gt;&lt;/i&gt; and &lt;i style=""&gt;&lt;a href="http://www.youtube.com/watch?v=EKzbVi9hOjU"&gt;Don't get trouble in your mind&lt;/a&gt;&lt;/i&gt; for a taste.&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_JtuxhJ3QzZg/S994hpb0-rI/AAAAAAAAAMQ/GyC5mFrTP1M/s1600/DSC01018.JPG"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 225px;" src="http://4.bp.blogspot.com/_JtuxhJ3QzZg/S994hpb0-rI/AAAAAAAAAMQ/GyC5mFrTP1M/s400/DSC01018.JPG" alt="" id="BLOGGER_PHOTO_ID_5467220992104659634" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;Friday held the closing ceremony, with the announcement of the award winners.&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/o:p&gt;Best Paper:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;p class="MsoNoSpacing"&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Factorizing personalized Markov chains for next-basket recommendation&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme (Osaka University and University of Hildesheim)&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="" lang="EN-GB"&gt;Best Student Paper:&lt;/span&gt;    &lt;ul&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;Privacy wizards for social networking sites&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Lujun Fang and Kristen LeFevre (University of Michigan)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="" lang="EN-GB"&gt;Best Posters:&lt;/span&gt;  &lt;ul&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;How much is your personal recommendation worth&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Paul Dütting, Monika Henzinger and Ingmar Weber (EPFL Lausanne, University of Vienna, and Yahoo! Research)&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;i style=""&gt;&lt;span style="" lang="EN-GB"&gt;SourceRank: Relevance and trust assessment for deep Web sources based on inter-source agreement&lt;/span&gt;&lt;/i&gt;&lt;span style="" lang="EN-GB"&gt;, by Raju Balakrishnan and Subbarao Kambhampati (Arizona State University)&lt;/span&gt;&lt;span style=";font-family:NimbusSanL-Regu;font-size:10pt;"  lang="EN-US" &gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;The closing ceremony also featured a short presentation of &lt;a href="http://www2011.org/"&gt;WWW 2011&lt;/a&gt;, to be held in Hyderabad, India. &lt;a href="http://www2012.org/"&gt;WWW 2012&lt;/a&gt; will take place in Lyon, France.&lt;br /&gt;&lt;br /&gt;Finally, on Saturday, the &lt;a href="http://www.iw3c2.org/"&gt;IW3C2&lt;/a&gt; announced the Brazilian bid as the winner to host &lt;a href="http://www2013.org/"&gt;WWW 2013&lt;/a&gt;, which I was very glad to hear about!&lt;/span&gt;&lt;br /&gt;&lt;span style="" lang="EN-GB"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-8650440671978066990?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/8650440671978066990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=8650440671978066990' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/8650440671978066990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/8650440671978066990'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/05/www-2010-in-raleigh-nc-usa.html' title='WWW 2010 in Raleigh, NC, USA'/><author><name>Rodrygo L.T. Santos</name><uri>http://www.blogger.com/profile/09502952528669992135</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://4.bp.blogspot.com/_JtuxhJ3QzZg/STMMxScZpiI/AAAAAAAAAEY/Zi4Nmre6mfk/S220/n767603947_216647_5827.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_JtuxhJ3QzZg/S993IPRHQXI/AAAAAAAAAMI/tB87Q8WjVj8/s72-c/DSC01010.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-8036614906908156932</id><published>2010-04-28T13:47:00.016+01:00</published><updated>2010-04-28T17:45:33.847+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='RIAO 2010; Voting Model; News search; Blog search'/><title type='text'>RIAO 2010 in Paris, France.</title><content type='html'>The 9th International &lt;a href="http://www.riao2010.org/"&gt;RIAO&lt;/a&gt; Conference has started in Paris, France (28-30 April, 2010).  It is unfortunate that it is being held concurrently with &lt;a href="http://www2010.org/www/"&gt;WWW 2010&lt;/a&gt; in Raleigh.&lt;br /&gt;&lt;br /&gt;The first RIAO conference was held in Grenoble in 1985. RIAO is currently a triennial conference, addressing Information Retrieval research topics of interest to both Academia and Industry. This year, the conference focuses on Adaptivity, Personalization and Fusion of Heterogeneous Information.&lt;br /&gt;&lt;br /&gt;The following papers have caught my eyes, while browsing the &lt;a href="http://www.riao2010.org/?action=programme.nouveau&amp;amp;lang=en"&gt;RIAO 2010 program&lt;/a&gt;:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Boiling down information retrieval test collections&lt;/span&gt;. T. Sakai et al. (Microsoft Research Asia, CMU)&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Improving tag recommendation using social networks&lt;/span&gt;. A. Rae et al. (The Open University, Yahoo! Research Barcelona).&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Analysis of robustness in trust-based recommender systems&lt;/span&gt;. Z. Cheng and N. Hurley (UCD)&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Opinion-finding in blogs: A passage-based language modelling approach&lt;/span&gt;. M. Saad Missen et al (IRIT)&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Predicting query performance using query, result, and user interaction features&lt;/span&gt;. Q. Guo et al. (Emory University/Microsoft Research)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Towards a collection-based results diversification&lt;/span&gt;. J.A. Akinyemi et al. (University of Waterloo)&lt;/li&gt;&lt;/ul&gt;In addition, the &lt;a href="http://terrierteam.dcs.gla.ac.uk/"&gt;TerrierTeam&lt;/a&gt; has two full papers, which are being presented today at the conference (hopefully, the slides will follow shortly):&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://terrierteam.dcs.gla.ac.uk/publications/santos2010riao.pdf"&gt;Voting for Related Entities&lt;/a&gt; by R.L.T. Santos, C. Macdonald and I. Ounis.  The paper addresses the problem of entity search, where the goal is to rank not documents, but entities in response to a given query. The paper proposes to tackle this problem as a &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/thesis.shtml"&gt;voting process&lt;/a&gt;, by considering the occurrence of an entity among the top ranked documents for a given query as a vote for the existence of a relationship between this and the entity in the query. The approach led to    high precision and unparalleled recall compared to TREC 2009 systems.    &lt;!--StartFragment--&gt;  &lt;/li&gt;&lt;li&gt;&lt;a href="http://terrierteam.dcs.gla.ac.uk/publications/richard10riao_168.pdf"&gt;News Article Ranking: Leveraging the Wisdom of Bloggers&lt;/a&gt; by R. McCreadie, C.Macdonald and I. Ounis. The paper investigates how news article ranking can be performed automatically, so as to assist editors in selecting the articles, which should make the front page of their newspaper. In particular, the paper investigates the blogosphere as a prime source of evidence, on the intuition that bloggers, and by extension their blog posts, can indicate interest in one news article or another. The paper proposes to model the automatic news article ranking task as a &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/thesis.shtml"&gt;voting process&lt;/a&gt;, where each relevant blog post acts as a vote for one or more news articles. The approach led to the best TREC 2009 retrieval performance in the &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;Blog track&lt;/a&gt;.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt; &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig Macdonald&lt;/a&gt; is tweeting the conference, pending an appropriate wireless signal. You can follow some bits of the RIAO conference through the &lt;a href="http://twitter.com/#search?q=%23riao2010"&gt;#riao2010&lt;/a&gt; hashtag.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-8036614906908156932?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/8036614906908156932/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=8036614906908156932' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/8036614906908156932'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/8036614906908156932'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/04/riao-2010-in-paris-france.html' title='RIAO 2010 in Paris, France.'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5496207436137157729</id><published>2010-04-07T13:04:00.043+01:00</published><updated>2010-04-08T01:11:02.108+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Ecir2010; No-shows; Best paper award; BCS; IRSG; KSJ award'/><title type='text'>ECIR 2010 in Milton Keynes: A Report</title><content type='html'>Last week, five of us attended the &lt;a href="http://kmi.open.ac.uk/events/ecir2010/"&gt;ECIR 2010&lt;/a&gt; conference in &lt;a href="http://en.wikipedia.org/wiki/Milton_Keynes"&gt;Milton Keynes&lt;/a&gt;.  The conference was fairly well-organised, although it markedly lacked the lustre of the prev&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Tw-GvQO7xww/S7yCC1l1WYI/AAAAAAAAABQ/E43W2A6SnuQ/s1600/DSC05128.JPG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 281px; height: 211px;" src="http://3.bp.blogspot.com/_Tw-GvQO7xww/S7yCC1l1WYI/AAAAAAAAABQ/E43W2A6SnuQ/s320/DSC05128.JPG" alt="" id="BLOGGER_PHOTO_ID_5457379833723312514" border="0" /&gt;&lt;/a&gt;ious three editions of the conference. In terms of attendance, only about 170 delegates have &lt;span style="font-style: italic;"&gt;registered&lt;/span&gt;, much less than Glasgow 2008 (210+), and Toulouse 2009 (180+).  Perhaps, the exotic town of Milton Keynes was not deemed to be a very attractive venue for a conference. In fact, apart from attending the conference, there was not much else to do -- e.g. the nearest proper pub was at about 2 miles from the conference venue.&lt;br /&gt;&lt;br /&gt;The ECIR 2010 conference has suffered from a new and previously unseen problem: several authors and presenters did not make it to the conference, preferring to give their presentation by proxy or using a pre-recorded talk. No less than 5 no-shows were recorded during the conference. Even the keynote speaker and winner of the first  &lt;a href="http://irsg.bcs.org/ksjaward.php"&gt;BCS IRSG Karen Sparck Jones award&lt;/a&gt;, &lt;a href="http://homepages.inf.ed.ac.uk/mlap/index.html"&gt;Mirella Lapata&lt;/a&gt;, did not show up and gave her presentation through a pre-recorded video. While Lapata certainly had a valid reason (as probably did the other speakers) not to show up, it is clear that ECIR should concretely deal with such a problem, e.g., by making it compulsory that at least one author of each accepted paper be present during the conference.   &lt;!--StartFragment--&gt;&lt;br /&gt;&lt;br /&gt;In addition, the organisers decided not to have parallel sessions (because of lack of facilities?) during ECIR 2010. Therefore, several full papers were turned into poster presentations, which were held during the short lunch period. This was a very bad move, as because of the setting, these papers received much less attention and credit, even compared to the actual posters, the session of which was rather successful.  Some delegates argued that some of the full-papers-turned-posters should have been given a full presentation slot, in lieu of those full papers with a no-show author.&lt;br /&gt;&lt;br /&gt;Other than the problems mentioned above, the conference program was generally of a very good quality. In the first day, we enjoyed an excellent tutorial by two MSR researchers on &lt;a href="http://research.microsoft.com/en-us/events/ecir-2010-mlir-tutorial/"&gt;Machine Learning for IR&lt;/a&gt;. The tutorial was given by Paul Bennett and Kevyn Collins-Thompson. We also enjoyed an equally excellent tutorial on &lt;a href="http://en.wikipedia.org/wiki/Crowdsourcing"&gt;Crowdsourcing&lt;/a&gt; by Omar Alonso from Bing.&lt;br /&gt;&lt;br /&gt;In the next days, there were also several good papers that are worth reading:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A language modeling approach for temporal information needs (from Max-Planck)&lt;/li&gt;&lt;li&gt;The role of query sessions in extracting instance attributes from web search queries (from Google)&lt;/li&gt;&lt;li&gt;Interpreting user inactivity on search results (from Univ. of Washington, Univ. of Patras)&lt;/li&gt;&lt;li&gt;Learning to distribute queries onto Web search nodes (from Yahoo!)&lt;/li&gt;&lt;li&gt;Temporal shingling for version identification in Web archives (from Max-Planck)&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Evaluation and user preference study on spatial diversity&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;(University of Sheffield)&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;The best paper award was &lt;span style="font-style: italic;"&gt;jointly&lt;/span&gt; awarded to:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Promoting ranking diversity for biomedical information retrieval using Wikipedia. Jimmy Huang and Xiaoshi Yin (York University)&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;Evaluation of an adaptive search suggestion system. Sascha Kriewel and Norbert Fuhr (University of Duisburg-Essen, Germany)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;We have also had the chance to present our two full-papers on search result diversification, and learning to select:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://terrierteam.dcs.gla.ac.uk/publications/ecir2010_rodrygo_div.pdf"&gt;Explicit search result diversification through sub-queries&lt;/a&gt; by Rodrygo L. T. Santos, Jie Peng, Craig Macdonald, and Iadh Ounis. Rodrygo presented our xQuAD search results diversification framework, and the talk was very well received by the delegates, leading to several questions, and many comments that this was arguably the best presentation of the conference.&lt;/li&gt;&lt;li&gt;&lt;a href="http://terrierteam.dcs.gla.ac.uk/publications/ecir2010_pj_selective.pdf"&gt;Learning to select a ranking function&lt;/a&gt; by Jie Peng, Craig Macdonald and Iadh Ounis. This was one of the full-paper-turned-poster presentations. Jie presented the poster, which attracted a lot of attention and led to some very interesting discussions.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Finally, during the posters/demos session,  two good contributions particularly caught our attention:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;An Empirical Study of Query Specificity (Poster) - Avi Arampatzis and Jaap Kamps&lt;/li&gt;&lt;li&gt;NEAT :News Exploration Along Time (Demo) - Omar Alonso, Klaus Berberich, Srikanta Bedathur and Gerhard Weikum&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The conference had also an Industry day, which we missed. You can see a report on the Industry day in the following &lt;a href="http://blog.twigkit.com/ecir-industry-day-2010/"&gt;blog post&lt;/a&gt;.  During the conference, a few of us actively twittered the conference sessions. You can look at the &lt;a href="http://twapperkeeper.com/hashtag/ecir2010"&gt;archived ecir2010 hashtag&lt;/a&gt; for more details.&lt;br /&gt;&lt;br /&gt;One of the most exciting moments of the conference was our visit to the &lt;a href="http://en.wikipedia.org/wiki/Bletchley_Park"&gt;Bletchley Park&lt;/a&gt; as part of the ECIR 2010 social dinner. This was an excellent venue with a lot of history, and the food was also good! During the dinner, we were given an impossible quiz to answer. Despite the wine, and a long day, some delegates did manage to find the &lt;a href="http://kmi.open.ac.uk/events/ecir2010/ECIR-quiz-answers.pdf"&gt;answers&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Usually, when ECIR is held in the UK, the last day of the conference is the venue for Annual General Meeting of the &lt;a href="http://irsg.bcs.org/"&gt;BCS IRSG&lt;/a&gt; - the umbrella group for ECIR. However, in 2010, there was no AGM. We can only suppose that this was because the 2009 AGM was only held in October, co-located with Search Solutions 2009 at BCS HQ. We say &lt;span style="font-style: italic;"&gt;suppose&lt;/span&gt;, because at the time of writing, the 2009 AGM minutes are not yet available!&lt;br /&gt;&lt;br /&gt;Finally, we would like to thank the organisers for their hard work during the conference, for the idea of the &lt;span title="processed" id="ptLastEntry" class="status-body"&gt;&lt;span class="status-content"&gt;&lt;span class="entry-content"&gt;ball-bouncer game during the session breaks, which was really cool/fun &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;and for an overall reasonably organised conference. We look forward to &lt;a href="http://ecir2011.dcu.ie/"&gt;ECIR 2011&lt;/a&gt; in Dublin!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5496207436137157729?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5496207436137157729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5496207436137157729' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5496207436137157729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5496207436137157729'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/04/ecir-2010-in-milton-keynes-report.html' title='ECIR 2010 in Milton Keynes: A Report'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_Tw-GvQO7xww/S7yCC1l1WYI/AAAAAAAAABQ/E43W2A6SnuQ/s72-c/DSC05128.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-4489151203024845065</id><published>2010-03-10T18:56:00.004Z</published><updated>2010-03-11T10:54:50.867Z</updated><title type='text'>Terrier 3.0 released</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.dcs.gla.ac.uk/%7Erichardm/images/Terrier3logo.jpg"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 258px; height: 181px;" src="http://www.dcs.gla.ac.uk/%7Erichardm/images/Terrier3logo.jpg" alt="" border="0" /&gt;&lt;/a&gt;Firstly, we have a new website for Terrier: &lt;a href="http://terrier.org/"&gt;http://terrier.org&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Also, we have just released Terrier 3.0!&lt;br /&gt;&lt;span style="font-family:monospace;"&gt;&lt;/span&gt;&lt;br /&gt;This is a major update to Terrier, including:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;support for indexing WARC collections (such as ClueWeb09)&lt;/li&gt;&lt;li&gt;improved MapReduce mode indexing&lt;/li&gt;&lt;li&gt;improved and more scalable index structures&lt;/li&gt;&lt;li&gt;added field-based and proximity term dependence models, such as BM25F, PL2F and Markov Random Fields&lt;/li&gt;&lt;li&gt;new Web-based retrieval interface&lt;/li&gt;&lt;/ul&gt;Fuller changelog at &lt;a href="http://terrier.org/docs/current/whats_new.html"&gt;http://terrier.org/docs/current/whats_new.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If your looking for our team publications, etc., please see our new team website: &lt;a href="http://terrierteam.dcs.gla.ac.uk/"&gt;http://terrierteam.dcs.gla.ac.uk/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Thanks are due to everyone in the Terrier Team for their hard work to make this release, as well as the contributions and feedback about Terrier from our users and collaborators.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-4489151203024845065?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/4489151203024845065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=4489151203024845065' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4489151203024845065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4489151203024845065'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/03/terrier-30-released.html' title='Terrier 3.0 released'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-8076489262231109523</id><published>2010-02-23T10:31:00.021Z</published><updated>2010-03-11T10:29:07.484Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='TREC Blog track; News search; Faceted search; Adhoc search'/><title type='text'>TREC Blog Track 2010</title><content type='html'>&lt;span style="font-family:arial;"&gt;The &lt;/span&gt;&lt;a style="font-family: arial;" href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt;&lt;span style="font-family:arial;"&gt; Blog track will be continuing in 2010. In  2009,  the  Blog  track  has  been  markedly  revamped , addressing  more  reﬁned  Blog  search  scenarios  using  the new &lt;/span&gt;&lt;a style="font-family: arial;" href="http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html"&gt;Blogs08&lt;/a&gt;&lt;span style="font-family:arial;"&gt; collection, a  large  sample  of  the  blogosphere covering the period of 14th January 2008 to 10th February 2009.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;A summary of the TREC Blog track 2009 edition has been presented by &lt;/span&gt;&lt;a style="font-family: arial;" href="http://www.dcs.gla.ac.uk/%7Eounis"&gt;Iadh Ounis&lt;/a&gt;&lt;span style="font-family:arial;"&gt; at the main TREC conference (&lt;/span&gt;&lt;a style="font-family: arial;" href="http://ir.dcs.gla.ac.uk/terrier/TREC2009Blog-overview.pdf"&gt;Slides&lt;/a&gt;&lt;span style="font-family:arial;"&gt;). The Blog track 2009 overview paper will be available on the TREC website shortly, once it is updated and reviewed.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;The details of the TREC 2010 Blog track are still being finalised by the organisers. However, following the discussions at the TREC 2009 Blog track workshop, here are some salient details (see also the TREC 2009 &lt;/span&gt;&lt;a style="font-family: arial;" href="http://ir.dcs.gla.ac.uk/terrier/Blog-track-2009-Wrap-up.pdf"&gt;Wrap-up Slides&lt;/a&gt;&lt;span style="font-family:arial;"&gt;):&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;1. Faceted blog search task will run again in 2010: The task addresses  the  quality aspect  of  the  retrieved blogs . It is a feed search task.&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li style="font-family: arial;"&gt;We will adopt a two-stage submission procedure: (1) a participating group submits "topically-relevant"blogs for each query; (2) a few standard baselines will be distributed to participants, so that they can re-rank them with respect to various facet inclinations (e.g. opinionated, in-depth, personal).&lt;br /&gt;&lt;/li&gt;&lt;li style="font-family: arial;"&gt;Groups can participate in stage 2 without stage 1, and vice-versa.  Stage 1 is akin to an adhoc blog search task.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;More topics for various facet inclinations.&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;2. Top news story identification task will run again in 2010: The task addresses the  news‐related  dimension  of  the  blogosphere. In particular, it investigates whether the blogosphere can be used to identify the most important news stories of the day. &lt;br /&gt;&lt;br /&gt;&lt;ul style="font-family: times new roman;"&gt;&lt;li style="font-family: arial;"&gt;Real-time news search task rather than retrospective.&lt;/li&gt;&lt;li style="font-family: arial;"&gt;Much larger and a more comprehensive headlines sample, provided by a major news organisation.&lt;/li&gt;&lt;li style="font-family: arial;"&gt;A two-stage submission procedure: (1) Groups submit a ranking of top stories for some days per-category (e.g. sport, politics, business, etc.) (2) We will then select some top relevant stories, for which we will ask the participating groups to identify the related blog posts, in a manner that covers the various/diverse aspects of each story.&lt;/li&gt;&lt;li style="font-family: arial;"&gt;Groups can participate in stage 2 without stage 1. In the latter case, its is an adhoc diversity blog post search task, where the headline is the query.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;We welcome any feedback and comments on the tasks above to trecblog-organisers (at) dcs.gla.ac.uk&lt;br /&gt;&lt;br /&gt;Finally, note that if you wish to participate in TREC 2010,  you should answer the &lt;a href="http://trec.nist.gov/call2010.html"&gt;TREC 2010 call for participation&lt;/a&gt;. We will update the Blog track wiki as things become more refined  - keep following the Blog track developments as they happen on our dedicated &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;Wiki web site&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-8076489262231109523?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/8076489262231109523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=8076489262231109523' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/8076489262231109523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/8076489262231109523'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2010/02/trec-blog-track-2010_23.html' title='TREC Blog Track 2010'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-2027413637234362848</id><published>2009-08-04T14:16:00.017+01:00</published><updated>2009-08-04T17:17:04.985+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SIGIR 2009'/><category scheme='http://www.blogger.com/atom/ns#' term='expert search'/><category scheme='http://www.blogger.com/atom/ns#' term='Faceted Search'/><title type='text'>AcademTech: Faceted People Search</title><content type='html'>&lt;a href="http://ir.dcs.gla.ac.uk/terrier/academtech"&gt;AcademTech&lt;/a&gt; is a Computing Science-specific expert search engine based on the &lt;a href="http://ir.dcs.gla.ac.uk/terrier/"&gt;Terrier IR Platform&lt;/a&gt;. Persons working at Computing Science departments in Scottish Universities are considered as candidate experts by the system. Profiles of their expertise evidence are then mined from their homepages, publicly available digital libraries (e.g.  DBLP) and related information found on the Web through Yahoo! BOSS. The ranking of experts is provided by a variant of the &lt;a href="http://portal.acm.org/citation.cfm?id=1183671"&gt;Voting Model&lt;/a&gt; expert search approach.&lt;br /&gt;&lt;br /&gt;The system is integrated with a novel faceted search interface to allow users to browse and explore the results using a number of categories such as Location or Conference/Journal publications. Each expert in the system has a profile page containing a number of elements including query specific supporting publications, most informative associated terms displayed as a tag cloud, co-authors and web links. Although the system is currently applied in the context of Scottish Computing Science Academia, it can easily be expanded to go beyond its current Scottish scope, cover other academic fields, and people in general.&lt;br /&gt;&lt;br /&gt;I was lucky enough to be able to demo AcademTech at &lt;a href="http://www.sigir2009.org/"&gt;SIGIR 2009&lt;/a&gt; in Boston on July 20th. Thankfully, I spoke to a large number of attendees receiving largely very helpful feedback.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PCMv-VCsbus/SnhSTVUeC0I/AAAAAAAAABQ/b0QBvnE7-qs/s1600-h/duncan_mcdougall_academtech_sigir.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 200px; height: 150px;" src="http://2.bp.blogspot.com/_PCMv-VCsbus/SnhSTVUeC0I/AAAAAAAAABQ/b0QBvnE7-qs/s200/duncan_mcdougall_academtech_sigir.jpg" alt="" id="BLOGGER_PHOTO_ID_5366129448105937730" border="0" /&gt;&lt;/a&gt;A popular suggestion was to utilize AcademTech's core system in the scope of biology. This would meet the medical field's need for finding related organisms, diseases etc. Possible facets in the area would likely be &lt;a href="http://en.wikipedia.org/wiki/Biological_classification"&gt;biological classifications&lt;/a&gt; such as species and genus.&lt;br /&gt;&lt;br /&gt;Daniel Tunkelang from &lt;a href="http://thenoisychannel.com/"&gt;The Noisy Channel&lt;/a&gt; suggested providing profile page-located facets, allowing filtering of search results by features present in a selected expert's page such as co-authors. This would satisfy an example scenario such as "Show me co-authors of this expert who work for the University of Glasgow." Profile facets could also allow the experts publications list to be filtered by a number of fields such as co-author location, conference etc.&lt;br /&gt;&lt;br /&gt;Much of the feedback mirrored that of intended future work. Name disambiguation is a high priority update as a current problem with AcademTech is the publication mismatch when multiple experts have the same name. In fact, the system is specifically designed to allow for expansion of facets, and name disambiguation. With a large amount of publication collaborators working in industry a useful move would be to expand to accommodate these experts.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.dcs.gla.ac.uk/~craigm/publications/mcdougall09academtech-poster.pdf"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 200px; height: 142px;" src="http://1.bp.blogspot.com/_PCMv-VCsbus/Snhdke71Y7I/AAAAAAAAABY/ghfJmmQsmI4/s200/academtech_poster.png" alt="AcademTech Sigir 2009 Poster" id="BLOGGER_PHOTO_ID_5366141837372646322" border="1" /&gt;&lt;/a&gt;AcademTech is now publicly accessible from &lt;a href="http://owa1.dcs.gla.ac.uk/exchweb/bin/redir.asp?URL=http://www.terrier.org/academtech" target="_blank"&gt;http://www.terrier.org/academtech&lt;/a&gt;&lt;br /&gt;A description of the system is available in the &lt;a href="http://portal.acm.org/citation.cfm?id=1571941.1572154"&gt;SIGIR'09 proceedings&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Thank you to all those who spoke to me and gave me some great feedback.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-2027413637234362848?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/2027413637234362848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=2027413637234362848' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2027413637234362848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2027413637234362848'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/08/academtech-faceted-people-search.html' title='AcademTech: Faceted People Search'/><author><name>Duncan McDougall</name><uri>http://www.blogger.com/profile/06662063389408020091</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://3.bp.blogspot.com/_PCMv-VCsbus/Sm6z2_h_7HI/AAAAAAAAAAM/RCiIJlcxNUU/S220/5249_510084652005_289400851_523521_7857962_n.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PCMv-VCsbus/SnhSTVUeC0I/AAAAAAAAABQ/b0QBvnE7-qs/s72-c/duncan_mcdougall_academtech_sigir.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-1931996985224713769</id><published>2009-07-21T16:08:00.005+01:00</published><updated>2009-08-03T10:05:30.850+01:00</updated><title type='text'>SIGIR 2009: Expert Search from Glasgow</title><content type='html'>&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;A short update from &lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;a href="http://www.sigir2009.org/"&gt;&lt;span class="Apple-style-span"&gt;SIGIR09&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt; to announce our recently published work on expert search. This should hopefully be the first of a series of a few posts about SIGIR this year.&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;In &lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/publications/macdonald09perfect.pdf"&gt;&lt;span class="Apple-style-span"&gt;On Perfect Document Rankings for Expert Search&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt; (Craig Macdonald &amp;amp; Iadh Ounis), we examine the effect of the document ranking to an expert search engine. Intuitively, improving the topical relevance properties of the document ranking usually leads to an improvement in the performance of the generated ranking of documents. In this poster, we examine the extreme case, by making the document ranking component perfect with respect to topical relevance.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;In &lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/publications/macdonald09exclicks.pdf"&gt;Usefulness of Click-through data in Expert Search&lt;/a&gt; (Craig Macdonald &amp;amp; Ryen White), we examine how user clicks on an intranet search engine can be used as features by an expert search engine. The proposed techniques are based on the voting techniques from the Voting Model, but examine documents clicks instead of weighting model scores. To our knowledge, this is the first work examining how clicks can be integrated into expert search.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;&lt;span class="Apple-style-span"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:100%;"&gt;&lt;span class="Apple-style-span"&gt;Finally, the Voting Model was show-cased in the&lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/publications/mcdougall09academtech.pdf"&gt; &lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span"&gt;&lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/publications/mcdougall09academtech.pdf"&gt;Expertise Search in Academia using Facets&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"&gt; (Duncan McDougall &amp;amp; Craig Macdonald), which demoed &lt;/span&gt;&lt;a href="http://terrier.org/academtech/"&gt;&lt;span class="Apple-style-span"&gt;AcademTech&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"&gt;, a faceted search interface for expert search in academia.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-1931996985224713769?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/1931996985224713769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=1931996985224713769' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/1931996985224713769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/1931996985224713769'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/07/sigir-2009-expert-search-from-glasgow.html' title='SIGIR 2009: Expert Search from Glasgow'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-7911159269686032781</id><published>2009-06-04T10:38:00.003+01:00</published><updated>2009-06-04T13:27:04.079+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CIKM 2011; Glasgow'/><title type='text'>CIKM 2011 in Glasgow!</title><content type='html'>We are delighted that our bid to host the &lt;a href="http://www.cs.umbc.edu/cikm/"&gt;ACM Conference on Information and Knowledge Management&lt;/a&gt; (CIKM 2011)  in Glasgow has been successful.&lt;br /&gt;&lt;br /&gt;After the highly successful &lt;a href="http://www.dcs.gla.ac.uk/essir2007"&gt;ESSIR 2007&lt;/a&gt; and &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR 2008&lt;/a&gt; events, we are excited at the prospect of hosting the prestigious ACM CIKM Conference in Glasgow in 2011. We look forward to having our colleagues gather in Glasgow, and to surpassing their expectations.&lt;br /&gt;&lt;br /&gt;Further information about the conference (dates, venues, etc.) will be available in due course.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.comp.polyu.edu.hk/conference/cikm2009/about/"&gt;CIKM 2009&lt;/a&gt; will be  held on November 2-6, 2009, in Hong Kong. Hope to see you there!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-7911159269686032781?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/7911159269686032781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=7911159269686032781' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/7911159269686032781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/7911159269686032781'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/06/cikm-2011-in-glasgow.html' title='CIKM 2011 in Glasgow!'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5057301719779190923</id><published>2009-04-29T12:08:00.005+01:00</published><updated>2009-04-29T12:31:54.021+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Blog track; TREC 2009; Search tasks'/><title type='text'>TREC Blog track 2009</title><content type='html'>We have just released a &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;draft of the guidelines&lt;/a&gt; for the TREC 2009 Blog track.&lt;br /&gt;&lt;br /&gt;Compared to previous years, the Blog track 2009 aims to investigate more refined and complex search scenarios. In particular, we propose to run two tasks in TREC 2009:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Faceted blog distillation:  a more refined version of the blog distillation task that addresses the quality aspect of the retrieved blogs and mimics an exploratory search task. The task can be summarised as "&lt;em&gt;Find me a &lt;strong&gt;good&lt;/strong&gt; blog with a principal, recurring interest in X&lt;/em&gt;". We propose several facets for the TREC 2009 blog distillation task, which may be of varying difficulty to identify for the participant systems.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Top stories identification: a new pilot task that addresses the news dimension in the blogosphere. Systems are asked to identify the top news stories of a given day, and to provide a list of relevant blog posts discussing each news story. The ranked list of blog posts should have a &lt;strong&gt;diverse&lt;/strong&gt; nature, covering different/diverse aspects, perspectives or opinions of the news story. &lt;/li&gt;&lt;/ul&gt;The new &lt;a href="http://terrierteam.blogspot.com/2009/04/blogs08-collection-released.html"&gt;Blogs08 collection&lt;/a&gt;, an up-to-date and large sample of the blogosphere from January 2008 to February 2009, will be used for both tasks.&lt;br /&gt;&lt;br /&gt;We welcome feedback. Please feel free to post feedback and comments about the proposed tasks for 2009.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5057301719779190923?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5057301719779190923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5057301719779190923' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5057301719779190923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5057301719779190923'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/04/trec-blog-track-2009.html' title='TREC Blog track 2009'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-912855941196019874</id><published>2009-04-09T20:05:00.003+01:00</published><updated>2009-04-09T20:10:18.131+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Blogs08 collection'/><category scheme='http://www.blogger.com/atom/ns#' term='TREC Blog track'/><title type='text'>Blogs08 Collection Released</title><content type='html'>We are pleased to announce that the Blogs08 collection is now ready for distribution. As announced before, Blogs08 is one order of magnitude bigger than Blogs06, and samples the blogosphere from January 2008 to February 2009. The uncompressed permalink size is approx 1.3TB, while including feeds, this amounts to over 2TB of data. As usual, the data is shipped compressed on a SATA hard drive.&lt;br /&gt;&lt;br /&gt;The distribution mechanism will be the same as for Blogs06. There is specific information about the size of the collection &lt;a href="http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html"&gt;here&lt;/a&gt;, while the instructions for obtaining the collection are &lt;a href="http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you intend on participating in the &lt;a href="http://trec.nist.gov"&gt;TREC&lt;/a&gt; 2009 Blog track, please start working on the paperwork right away, so that you can get the collection as soon as possible. Due to the larger size of the collection, we will operate a queuing system for shipping the data. Moreover, if you haven't done so already, respond to the &lt;a href="http://trec.nist.gov/call09.html"&gt;TREC 2009 Call for Participation.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Blog track co-ordinators are finalising the guidelines for this year's tasks and will continue to update the &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-Blog"&gt;TREC Blog wiki&lt;/a&gt;, the TREC blog track mailing list and this blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-912855941196019874?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/912855941196019874/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=912855941196019874' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/912855941196019874'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/912855941196019874'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/04/blogs08-collection-released.html' title='Blogs08 Collection Released'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-2037982735422834035</id><published>2009-03-03T16:52:00.007Z</published><updated>2009-03-05T11:22:08.033Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='expert search'/><category scheme='http://www.blogger.com/atom/ns#' term='thesis'/><category scheme='http://www.blogger.com/atom/ns#' term='craig macdonald'/><category scheme='http://www.blogger.com/atom/ns#' term='blog search'/><title type='text'>Craig's Thesis Available</title><content type='html'>Following up from my &lt;a href="http://terrierteam.blogspot.com/2009/01/craig-successfully-defends-his-thesis.html"&gt;successful defence&lt;/a&gt;, I'm pleased to announce that my thesis, titled &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/thesis.shtml"&gt;The Voting Model for People Search&lt;/a&gt; is now available online.&lt;br /&gt;&lt;br /&gt;My thesis proposes the Voting Model for various people search problems, such as expert search in enterprise settings (&lt;span style="font-style: italic;"&gt;find me someone who knows about...&lt;/span&gt;) , or blog(ger) search (&lt;span style="font-style: italic;"&gt;find me a blog about the general topic...&lt;/span&gt;). I also examine the reviewer assignment problem (&lt;span style="font-style: italic;"&gt;suggest for me reviewers for this paper...&lt;/span&gt;). In general, the Voting Model is concerned with the ranking of aggregates of documents.&lt;br /&gt;&lt;br /&gt;Experimental chapters are mainly carried out using TREC Enterprise track and Blog track test collections.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-2037982735422834035?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/2037982735422834035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=2037982735422834035' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2037982735422834035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2037982735422834035'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/03/craigs-thesis-available.html' title='Craig&apos;s Thesis Available'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-2892715340551425413</id><published>2009-02-22T11:10:00.013Z</published><updated>2009-02-22T15:02:26.440Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Correlator'/><category scheme='http://www.blogger.com/atom/ns#' term='Entity search'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Computational Linguistics'/><title type='text'>Correlator launched</title><content type='html'>&lt;a href="http://research.yahoo.com/"&gt;Yahoo! Research&lt;/a&gt; has launched a new search engine called &lt;a href="http://correlator.sandbox.yahoo.net/"&gt;Correlator&lt;/a&gt;.  It uses advanced techniques from Natural Language Processing and Computational Linguistics to locate entities within text and to group sentences about these entities from different documents. In his talk at the &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR 2008&lt;/a&gt; &lt;a href="http://ecir2008.dcs.gla.ac.uk/industry.html"&gt;Industry Day&lt;/a&gt;,  &lt;a href="http://research.yahoo.com/bouncer_user/37"&gt;Hugo Zaragoza&lt;/a&gt; who championed the Correlator project at Yahoo! Research Barcelona, described some of the system's &lt;a href="http://ecir2008.dcs.gla.ac.uk/id_slides/ECIR_08_Zaragoza.pdf"&gt;underlying approaches and technologies&lt;/a&gt;. In a blog &lt;a href="http://sandbox.yahoo.com/what-is-correlator"&gt;post&lt;/a&gt; introducing the search engine, he states:&lt;br /&gt;&lt;blockquote&gt;The core of Correlator is a search engine capable of returning not only relevant documents, but also relevant sentences and entities.&lt;br /&gt;&lt;/blockquote&gt;Currently, Correlator uses &lt;a href="http://www.wikipedia.org/"&gt;Wikipedia&lt;/a&gt; as the underlying document collection. However, the Correlator team contends that this can be extended to other collections and types of documents such as blogs.&lt;br /&gt;&lt;br /&gt;I have quickly tried Correlator this morning. My first impression of the system is that it does extremely well on many queries - e.g. results for queries such as "precision and recall" are pretty good and informative. However, there are several areas for improvement when it comes to identifying relationships between entities. For example, for the query "Tony Blair", when searching for names, the system suggests many entities as '&lt;span style="font-style: italic;"&gt;probably related to Tony Blair&lt;/span&gt;', however the precise nature of the relationship between the two entities is not stated, e.g. Cherie Blair should be presented as the definite wife of Tony Blair. Indeed,  it is left to the user the task of browsing through various possible suggested relationships between the named entities. However, this might be a design choice by the designers of the system,  favouring high coverage over high precision.&lt;br /&gt;&lt;br /&gt;Relatedly, it is of note that &lt;a href="http://trec.nist.gov/"&gt;TREC &lt;/a&gt;2009 will include a new &lt;a href="http://ilps.science.uva.nl/trec-entity/"&gt;Entity track&lt;/a&gt;. One of the currently proposed search tasks is the identification of relationships between entities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-2892715340551425413?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/2892715340551425413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=2892715340551425413' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2892715340551425413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2892715340551425413'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/02/correlator-launched.html' title='Correlator launched'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-481191796392422682</id><published>2009-02-20T21:31:00.000Z</published><updated>2009-02-20T16:38:37.548Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='mantis'/><category scheme='http://www.blogger.com/atom/ns#' term='jira'/><category scheme='http://www.blogger.com/atom/ns#' term='issues tracking'/><category scheme='http://www.blogger.com/atom/ns#' term='scarab'/><title type='text'>Choosing an issue tracking system</title><content type='html'>I recently posted about our &lt;a href="http://terrierteam.blogspot.com/2009/02/building-terrier-by-open-collaboration.html"&gt;deployment&lt;/a&gt; of an &lt;a href="http://ir.dcs.gla.ac.uk/terrier/issues/"&gt;issue tracking system&lt;/a&gt; for use by the &lt;a href="http://ir.dcs.gla.ac.uk/terrier/"&gt;Terrier platform&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;When we recently made the decision process to  an issue tracking system, I reviewed several alternative products, namely &lt;a href="http://scarab.tigris.org/"&gt;Scarab&lt;/a&gt;, &lt;a href="http://www.mantisbt.org/"&gt;Mantis&lt;/a&gt;, and &lt;a href="http://www.blogger.com/%3Ccite%3Ehttp://www.atlassian.com/software/%3Cb%3Ejira"&gt;JIRA&lt;/a&gt;. Firstly, I should say that I was already familiar with JIRA through its extensive usage by the &lt;a href="http://www.apache.org/"&gt;Apache Software Foundation&lt;/a&gt;, e.g. by &lt;a href="http://hadoop.apache.org/"&gt;Hadoop&lt;/a&gt; and &lt;a href="http://hadoop.apache.org/pig/"&gt;Pig&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;When I evaluated all three products, I found Scarab to be the nearest to JIRA, however it required much work to setup an issue tracking system that was full of "attributeGroup1" etc. labels. Scarab also lacked querying features: e.g. it had an advanced SQL inspired querying interface, but no easy way to see "All Issues" or "Most Recently Updated Issues". For me, this is a killer problem, as Bugzilla faces the same issue. It should be easy to see what the current problems are, or what issues people are working on. At the end of the day, as a developer, you spend more time querying an issue tracker than you do filing issues.&lt;br /&gt;&lt;br /&gt;I'm no expert at HTML &amp;amp; CSS, so for Mantis, I was pleased when it installed with a cleaner default theme. However, with Mantis, the fields for an issue were hard-coded in the submission and display pages. This was ultimately the downfall for Mantis, as I'm keen on minimising the unnecessary fields, and Mantis had many fields that were not appropriate for Terrier. Another downside for Mantis was that issues were numbered 0000001. In JIRA and Scarab, issues have a project prefix, and no leading zeros, making them instantly recognisable outside of the issue tracker - e.g. if I write TR-5 (e.g. in a SVN commit message), then people are likely to know what I'm on about. In contrast 0000005 is not something that using a search engine they would find quickly.&lt;br /&gt;&lt;br /&gt;Finally, I installed a trial for Atlassian's JIRA. We found JIRA easy to browse and query existing issues. The dashboard functionality it also useful, and customisable. Finally, we also liked the user/SE friendly URLs used by JIRA for issues: e.g. &lt;a href="http://ir.dcs.gla.ac.uk/terrier/issues/browse/TR-1"&gt;http://ir.dcs.gla.ac.uk/terrier/issues/browse/TR-1&lt;/a&gt; for issue TR-1. I'm pleased with JIRA, which exhibits an overall very polished UI. Atlassian have also very kindly provided Terrier with an open source license for JIRA. Thanks Atlassian!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-481191796392422682?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/481191796392422682/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=481191796392422682' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/481191796392422682'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/481191796392422682'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/01/choosing-issue-tracking-system.html' title='Choosing an issue tracking system'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-1289496982492305192</id><published>2009-02-18T10:32:00.005Z</published><updated>2009-02-18T12:35:35.208Z</updated><title type='text'>WSDM 09 : Your mileage may vary</title><content type='html'>So, for the second year &lt;a href="http://wsdm2009.org/index.php"&gt;WSDM&lt;/a&gt; arrives, attendance has remained constant (despite the economic downturn) and we're all packing t-shirts rather than bags. However, unwrapping the packaging, what do we find?&lt;br /&gt;&lt;br /&gt;Initially, the conference seemed promising, beginning with an excellent speech by &lt;a href="http://research.google.com/people/jeff/index.html"&gt;Jeff Dean&lt;/a&gt;. A 101 on efficiency at Google since 1999, well rounded with explanations and statistics in equal measure. Unfortunately, this staring performance seemed to overshadow the rest of the conference - a benchmark never surpassed.&lt;br /&gt;&lt;br /&gt;If I was to use one word to describe WSDM'09, then it would be inconsistent. There were some nice speeches,   &lt;a href="http://www.cond.org/"&gt;Eytan Adar&lt;/a&gt;'s talk on detecting how the web changes over time springs to mind (also best student paper), and &lt;a href="http://eventseer.net/p/songhua_xu/"&gt;Songhua Xu&lt;/a&gt; gets bonus points for turning up with an &lt;a href="http://www.gceel.com/"&gt;interface&lt;/a&gt; paper to a predominantly text-based conference. However, many were poorly presented, insubstantial or both.&lt;br /&gt;&lt;br /&gt;The dominating topic of the conference was unsurprisingly Wikipedia, with well over 40% of papers giving it a mention. Ignoring the &lt;span class="snippet"&gt;&lt;span class="b2"&gt;proliferation&lt;/span&gt;&lt;/span&gt; of Wikipedia papers over the last year, high point here was &lt;a href="http://www.cond.org/"&gt;Eytan Adar&lt;/a&gt;'s paper on Information Arbitrage Across Multi-lingual Wikipedia, for coming up with something which might actually be useful in practice.&lt;br /&gt;&lt;br /&gt;The videos from the conference should be up soon on &lt;a href="http://videolectures.net/site/list/events/"&gt;videolectures.net&lt;/a&gt;, and I would recommend Jeff Deans opening speech - for those of you who can survive watching in tiny eye-strain-o vision which comes with flash. As for the rest remember -  your mileage may vary.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-1289496982492305192?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/1289496982492305192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=1289496982492305192' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/1289496982492305192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/1289496982492305192'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/02/wsdm-09-your-mileage-may-vary.html' title='WSDM 09 : Your mileage may vary'/><author><name>Richard McCreadie</name><uri>http://www.blogger.com/profile/11063287777854855902</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5547990178089322234</id><published>2009-02-17T23:30:00.006Z</published><updated>2009-02-18T00:35:07.009Z</updated><title type='text'>WSDM 2009 highlights</title><content type='html'>&lt;a href="http://www.dcs.gla.ac.uk/%7Erichardm"&gt;Richard&lt;/a&gt; and I went to Barcelona last week to attend the &lt;a href="http://research.microsoft.com/en-us/um/people/nickcr/wscd09/"&gt;WSCD 2009&lt;/a&gt; workshop and the &lt;a href="http://www.wsdm2009.org/"&gt;WSDM 2009&lt;/a&gt; conference. &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig&lt;/a&gt; was also there on Monday to present an interesting poster on the usefulness of click-through data for training.&lt;br /&gt;&lt;br /&gt;Besides being held in an &lt;a href="http://en.wikipedia.org/wiki/Barcelona"&gt;exciting city&lt;/a&gt; (!), WSDM 2009 kept up with its previous edition in bringing together industry and academia to a common, quality forum for Web IR and data mining, with papers covering a wide range of trendy topics -- fairly well summarised by the tag cloud printed on the t-shirts given to the participants! -- from query intent detection, through search results diversification, to tagging-based clustering and classification, and social network-driven marketing analysis, to name a few.&lt;br /&gt;&lt;br /&gt;The best paper award went to &lt;a href="http://doi.acm.org/10.1145/1498759.1498825"&gt;Fernando Diaz&lt;/a&gt; for his work on the selective integration of news content into Web results based on the classification of the newsworthiness of each query. &lt;a href="http://doi.acm.org/10.1145/1498759.1498837"&gt;Eytan Adar et al.&lt;/a&gt; received the best student paper award for their study of the dynamics of the content and structure of Web documents of varying popularity over a fine-grained timescale. In the new &lt;a href="http://www.wsdm2009.org/late_results.php"&gt;late breaking results&lt;/a&gt; session, the award went to &lt;a href="http://www.wsdm2009.org/arikan_2009_temporal_expressions.pdf"&gt;Irem Arikan et al.&lt;/a&gt;'s paper on applying a language model approach for improving the retrieval effectiveness for queries with temporal expressions. The invited talks by &lt;a href="http://www.wsdm2009.org/dean_abs_bio.php"&gt;Jeff Dean&lt;/a&gt; and &lt;a href="http://www.wsdm2009.org/weikum_abs_bio.php"&gt;Gerhard Weikum&lt;/a&gt; were also insightful -- we couldn't attend &lt;a href="http://www.wsdm2009.org/kumar_abs_bio.php"&gt;Ravi Kumar&lt;/a&gt;'s though. All talks should be available soon from &lt;a href="http://www.videolectures.net/"&gt;VideoLectures.net&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Overall, WSDM is rapidly moving towards establishing itself among the major IR conferences. In 2010, it will probably be held in Los Angeles, CA, USA.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5547990178089322234?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5547990178089322234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5547990178089322234' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5547990178089322234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5547990178089322234'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/02/wsdm-2009-highlights.html' title='WSDM 2009 highlights'/><author><name>Rodrygo L.T. Santos</name><uri>http://www.blogger.com/profile/09502952528669992135</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://4.bp.blogspot.com/_JtuxhJ3QzZg/STMMxScZpiI/AAAAAAAAAEY/Zi4Nmre6mfk/S220/n767603947_216647_5827.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5465109307588054127</id><published>2009-02-16T23:50:00.014Z</published><updated>2009-02-17T18:52:53.706Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='twitter; CEO; Europe; Interface; Public relations'/><title type='text'>Twitter and CEOs</title><content type='html'>Thanks to &lt;a href="http://home.planet.nl/%7Ehuibe073/"&gt;Theo Huibers&lt;/a&gt;  for pointing out to an article in &lt;a href="http://www.forbes.com/"&gt;Forbes&lt;/a&gt; about &lt;a href="http://www.forbes.com/2009/01/18/twitter-europe-blog-tech-ebiz-cx_mb_0119twitter.html"&gt;Why Europe's CEOs should Twitter&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The article reports that unlike their counterpart in the USA, the CEOs of European companies are being slow in embracing the &lt;a href="http://www.twitter.com/"&gt;Twitter&lt;/a&gt; tool.  In general, the article argues that European chief executives are not very aware of &lt;a href="http://www.cio.co.uk/concern/change/features/index.cfm?articleid=706"&gt;the benefits of social networking tools to their businesses&lt;/a&gt;, missing out on opportunities to engage with their customers.&lt;br /&gt;&lt;br /&gt;If this is true, than this is rather worrying. Indeed, I can easily see many scenarios where social networking tools such as Twitter could be helpful for businesses. The Forbes article mentions several of these. For example, a case where the public relations office of General Motors has used Twitter to clamp down on rumours affecting the company. In his blog, Daniel Tunkelang reported a &lt;a href="http://thenoisychannel.com/2008/09/10/fun-with-twitter/"&gt;first-hand experience&lt;/a&gt;, when one of his technical questions posted on Twitter received care from the president and COO of GoDaddy.com, albeit with a degree of attention that goes beyond what Daniel bargained for.&lt;br /&gt;&lt;br /&gt;It is of interest to note that the Forbes article suggests that the Twitter's interface is still too complex for a widespread adoption by end-users and businesses. While I have only been an occasional user of Twitter, I have never had the feeling that the tool was difficult to use. However, I'm happy to stand corrected by HCI experts!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5465109307588054127?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5465109307588054127/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5465109307588054127' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5465109307588054127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5465109307588054127'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/02/twitter-and-ceos.html' title='Twitter and CEOs'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-660621752734978919</id><published>2009-02-11T21:11:00.008Z</published><updated>2009-02-13T14:23:25.973Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Terrier'/><category scheme='http://www.blogger.com/atom/ns#' term='information retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='Grid_CLEF'/><category scheme='http://www.blogger.com/atom/ns#' term='CLEF'/><title type='text'>Grid@CLEF track : a framework for IR experimentation</title><content type='html'>Don't be put of by the title, this isn't a post about &lt;a href="http://en.wikipedia.org/wiki/LHC_Computing_Grid"&gt;Grid Computing&lt;/a&gt;. Instead, I'm going to talk about the &lt;a href="http://www.blogger.com/post-create.g?blogID=6043705792807544709"&gt;Grid@CLEF task&lt;/a&gt;, which defines a framework and TREC-style track for experimentation with various components of IR systems. &lt;span style="font-weight: bold;"&gt;Disclaimer&lt;/span&gt;: I'm pleased to be on the advisory committee of the Grid@CLEF task.&lt;br /&gt;&lt;br /&gt;Firstly, I'll give a bit of background. &lt;a href="http://clef-campaign.org/"&gt;Cross-Language Evaluation Forum (CLEF)&lt;/a&gt; is a spin-off from TREC which concentrates on the evaluation of mono-lingual (non-English) and cross-lingual retrieval. CLEF has been running since 2000, and attracts a wide spread of participating research groups from across the globe, reaching 130 for CLEF 2008.&lt;br /&gt;&lt;br /&gt;The tracks have now been defined for &lt;a href="http://www.clef-campaign.org/2009.html"&gt;CLEF 2009&lt;/a&gt;, which includes the Grid track. &lt;a href="http://ims.dei.unipd.it/members/ferro/"&gt;Nicola Ferro&lt;/a&gt; (Univ. of Padova) and Donna Harman (NIST) are the big-wigs for this task, with suggestions from the advisory committee. So what does Grid mean in this context? Well, the idea (in my own words) is that the components of an IR system that have effect can be roughly categorised as follows: tokeniser, stopword list, word-decompounder, stemmer, and ranking function. In the Grid track, the concept is that these components can be interchanged, and a fuller understanding of their impact derived. The Grid framework facilitates such interchanges, by defining a way to allow various mixes of components to be attempted, thus creating a "grid" of experimental results.&lt;br /&gt;&lt;br /&gt;However, the problem with such an experiment is that often each of these components is tied to an IR system, and that having the IR system itself can have an impact on the results. Instead, the idea behind the Grid track is that the output from each component (tokeniser, stopword list etc) of a given IR system is saved in an XML format, and shared among participants. In this way, every combination of each component can be investigated.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://ims.dei.unipd.it/gridclef/"&gt;Grid@CLEF site&lt;/a&gt; describes more  the intuitions of the task, including an example of how results will be presented.&lt;br /&gt;&lt;br /&gt;Here in Glasgow, we like the concept behind the Grid track. Indeed, it has some similarities to the way we ran the opinion finding task in the &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;TREC 2008 Blog track&lt;/a&gt;. In the opinion finding task (where the aim is to retrieve relevant &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; opinionated blog posts about the target topic), the retrieval performance of opinion identification approaches appears to be linked to the ability of the underlying "topical relevance" retrieval approach. To investigate this in TREC 2008, we provided 5 standard topical relevance baselines, which participants were able to use as input to their opinion finding technique(s). You can read more in the &lt;span style="font-style: italic;"&gt;Overview of the TREC 2008 Blog track &lt;/span&gt;(Iadh Ounis, Craig Macdonald and Ian Soboroff), which should be released in a few weeks time.&lt;br /&gt;&lt;br /&gt;I have committed to implementing Terrier support for the Grid@CLEF track. The XML specification is being agreed by the Grid@CLEF organisers and advisers. However, if you are interested in using Terrier on this task, you can follow the progress on the &lt;a href="http://ir.dcs.gla.ac.uk/terrier/issues/browse/TR-9"&gt;TR-9&lt;/a&gt; issue concerning Terrier's Grid@CLEF support. The exact specification for the Grid@CLEF XML interchange format is still in flux, but once its settled down, Terrier support should be forthcoming.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-660621752734978919?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/660621752734978919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=660621752734978919' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/660621752734978919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/660621752734978919'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/02/gridclef-track-framework-for-ir.html' title='Grid@CLEF track : a framework for IR experimentation'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-4578697538778535935</id><published>2009-02-11T12:55:00.010Z</published><updated>2009-02-11T21:09:41.648Z</updated><title type='text'>Building Terrier by Open Collaboration</title><content type='html'>&lt;span style="font-size:100%;"&gt;&lt;span class="mw-headline"&gt;An important benefit of having an open source IR platform, like Terrier, is that users of the platform can contribute code to the platform, and overall, everyone gains. IR platforms which are not open source may be popular, but can stagnate if it does not evolve to meet modern needs. Open source is a good way of building such a critical mass of people to evolve a project.&lt;br /&gt;&lt;br /&gt;To facilitate the task of our users who contribute to Terrier, we are in the process of making changes that will also make the development process easier:&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;Firstly, we have deployed &lt;a href="http://www.atlassian.com/software/jira/"&gt;Atlassian's JIRA&lt;/a&gt; as an &lt;a href="http://ir.dcs.gla.ac.uk/terrier/issues/"&gt;issue tracker for Terrier&lt;/a&gt;.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Secondly, we are opening the source code repository for the Terrier platform (in progress).&lt;/li&gt;&lt;/ul&gt;An issue tracker allows issues (bugs or feature requests) to be named, discussed, and patches proposed. Other contributors may review and discuss these patches before they are committed. All development work on the Terrier open source platform will now be done via the &lt;a href="http://ir.dcs.gla.ac.uk/terrier/issues/"&gt;issue tracker&lt;/a&gt;. In deciding to deploy JIRA, we did take some time to review several issue trackers. I'll describe these and how we came to our decision in a future post.&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span class="mw-headline"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;The goal of opening our source code repository is that patches submitted by contributors can be made against the latest (trunk) Terrier source, thus ensuring that no stale patches are received. As a committer this will make my job easier.&lt;br /&gt;&lt;br /&gt;I recently announced these changes in Rome at the &lt;span style="font-style: italic;"&gt;New challenges in Information Retrieval and Text Mining in an open source initiative&lt;/span&gt; workshop. You can see my slides from the workshop below:&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;br /&gt;&lt;div style="width: 425px; text-align: left;" id="__ss_1016393"&gt;&lt;object style="margin: 0px;" height="355" width="425"&gt;&lt;param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=buildingterriercollaboration-05022009-1234357855365776-2&amp;amp;stripped_title=building-terrier-by-open-collaboration"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=buildingterriercollaboration-05022009-1234357855365776-2&amp;amp;stripped_title=building-terrier-by-open-collaboration" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="355" width="425"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;&lt;img style="visibility: hidden; width: 0px; height: 0px;" src="http://counters.gigya.com/wildfire/IMP/CXNID=2000002.0NXC/bT*xJmx*PTEyMzQzNzg*NjQ3NjUmcHQ9MTIzNDM3ODQ3Mjk1NCZwPTEwMTkxJmQ9Jmc9MiZ*PSZvPWVjMjJkYmE3ZTQ1ODQ3ODg5ZWIxMTliODRlZGM2YTYz.gif" border="0" height="0" width="0" /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-4578697538778535935?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/4578697538778535935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=4578697538778535935' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4578697538778535935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4578697538778535935'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/02/building-terrier-by-open-collaboration.html' title='Building Terrier by Open Collaboration'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-2233931482797756821</id><published>2009-01-22T21:09:00.006Z</published><updated>2009-01-23T11:18:26.197Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='phd'/><category scheme='http://www.blogger.com/atom/ns#' term='thesis'/><category scheme='http://www.blogger.com/atom/ns#' term='craig macdonald'/><title type='text'>Craig successfully defends his thesis</title><content type='html'>I am pleased to announce that last Thursday (15th January), I successfully defended my thesis, titled the &lt;span style="font-style: italic;"&gt;Voting Model for People Search&lt;/span&gt;. I want to give many thanks to &lt;a href="http://www.dcs.gla.ac.uk/%7Eounis"&gt;Iadh Ounis&lt;/a&gt; for supervising my PhD, and also to my committee: convener &lt;a href="http://www.dcs.gla.ac.uk/%7Edaw/"&gt;David Watt&lt;/a&gt;, and, in particular, to examiners &lt;a href="http://www.dcc.uchile.cl/%7Erbaeza/"&gt;Ricardo Baeza-Yates&lt;/a&gt; and &lt;a href="http://www.dcs.gla.ac.uk/%7Epdg/"&gt;Phil Gray&lt;/a&gt;, for the rewarding discussion and constructive feedback.&lt;br /&gt;&lt;br /&gt;I have 4 weeks to make very minor corrections to my thesis, after which time it will be available online.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-2233931482797756821?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/2233931482797756821/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=2233931482797756821' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2233931482797756821'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2233931482797756821'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2009/01/craig-successfully-defends-his-thesis.html' title='Craig successfully defends his thesis'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-2038858284229142545</id><published>2008-12-24T14:13:00.000Z</published><updated>2009-01-05T21:54:34.647Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Terrier; Hadoop; information retrieval; indexing'/><title type='text'>Terrier 2.2. released, with support for Hadoop Map Reduce indexing</title><content type='html'>I am pleased to announce that &lt;a href="http://ir.dcs.gla.ac.uk/terrier/"&gt;Terrier&lt;/a&gt; 2.2 was &lt;a href="http://ir.dcs.gla.ac.uk/terrier/forum//read.php?3,1059"&gt;released&lt;/a&gt;, just before Christmas. While I have chosen only to increase the minor version number for this release, it a is substantial update, consisting of new support for &lt;a href="http://hadoop.apache.org/core/"&gt;Hadoop&lt;/a&gt;, a Hadoop Map Reduce indexing system, and various minor improvements and bug fixes. (I reserve major version numbers bumps for index format changes).&lt;br /&gt;&lt;br /&gt;Our Map Reduce distributed indexing strategy builds upon the single-pass indexing strategy first released in Terrier 2.0. In deployment with a Hadoop cluster, Terrier can index large collections of data in a distributed fashion, splitting the indexing process across various Map and Reduce tasks, which can be run on various nodes in the cluster.&lt;br /&gt;&lt;br /&gt;In particular, the input data files for the collection are split across many Map tasks. Each Map task indexes its allocated data files using a normal &lt;a href="http://ir.dcs.gla.ac.uk/terrier/doc/javadoc/uk/ac/gla/terrier/indexing/Collection.html"&gt;Collection&lt;/a&gt; implementation. Postings lists are built, compressed, in memory. Each time memory is exhausted, these miniature posting lists are emitted from the Map task.&lt;br /&gt;&lt;br /&gt;The Reduce task is responsible for aggregating the posting lists for the various terms. Firstly, the Reduce input keys are sorted by term, and the values are sorted by source Map task, to ensure that the posting lists for a given term are processed in the correct order. For each term, the temporary posting lists (the reduce input values) are merged into the final compressed inverted index.&lt;br /&gt;&lt;br /&gt;The indices created using the Map Reduce indexer are standard Terrier indices. Moreover, by controlling the number of Reduce tasks, the final index can be partitioned into separate indices, in the local inverted file layout (document partitioning). With a different partitioning scheme, global inverted file layout (term partitioning) would also be possible.&lt;br /&gt;&lt;br /&gt;You can see the detailed list of changes for Terrier 2.2. in the &lt;a href="http://ir.dcs.gla.ac.uk/terrier/doc/whats_new.html"&gt;documentation&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-2038858284229142545?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/2038858284229142545/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=2038858284229142545' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2038858284229142545'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2038858284229142545'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/12/terrier-22-released-with-support-for.html' title='Terrier 2.2. released, with support for Hadoop Map Reduce indexing'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-4368306917389729702</id><published>2008-12-10T16:28:00.000Z</published><updated>2008-12-10T17:28:35.045Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='query logs; data mining'/><title type='text'>Mining query logs</title><content type='html'>It is often reported in the literature how search engines can use their query logs to improve document ranking. However, the query logs could also be used for various mining activities. For example, an article in &lt;a href="http://www.nytimes.com/2007/06/03/business/yourmoney/03google.html?pagewanted=print"&gt;The New York Times&lt;/a&gt; described how a power cut in the New York area was reflected in the Google's query logs within 2 seconds after its occurrence, while it took about 15 minutes for newswire services to report the same event.&lt;br /&gt;&lt;br /&gt;Relatedly, Abdur &lt;em&gt;&lt;/em&gt;Chowdhury in his position talk at the &lt;a href="http://ir.mathcs.emory.edu/SSM2008/"&gt;SSM 2008&lt;/a&gt; Workshop mentioned that news about a major earthquake in China were reported on &lt;a href="http://www.twitter.com"&gt;Twitter&lt;/a&gt; well before the newswire services. A &lt;a href="http://www.bbc.co.uk/blogs/technology/2008/05/twitter_and_the_china_earthqua.html"&gt;BBC blog post&lt;/a&gt; commented on the same issue.&lt;br /&gt;&lt;br /&gt;Finally, the &lt;a href="http://news.bbc.co.uk/1/hi/programmes/world_news_america/7726048.stm"&gt;BBC&lt;/a&gt; recently reported that Google has developed a &lt;a href="http://www.google.org/flutrends/"&gt;system&lt;/a&gt; to detect flu outbreaks in the USA by analysing the query logs and identifying the location of people issuing flu-related queries.&lt;br /&gt;&lt;br /&gt;Unfortunately, query logs are scarcely available to researchers in academia, especially after the &lt;a href="http://query.nytimes.com/gst/fullpage.html?res=9E0CE3DD1F3FF93AA3575BC0A9609C8B63"&gt;AOL data debacle&lt;/a&gt;.  This limits scientific work in the field, as most current research results using query logs are not reproducible due to lack of publicly shared data.  As a consequence, I very much welcome the forthcoming &lt;a href="http://research.microsoft.com/%7Enickcr/wscd09/"&gt;Workshop on Web Search Click Data&lt;/a&gt; (WSCD 2009), where the issue of publicly releasing query logs is being addressed as one of the objectives of the workshop.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-4368306917389729702?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/4368306917389729702/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=4368306917389729702' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4368306917389729702'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4368306917389729702'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/12/mining-query-logs.html' title='Mining query logs'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-2869424145165061296</id><published>2008-11-30T12:21:00.000Z</published><updated>2008-11-30T13:04:23.448Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Blog track; TREC'/><title type='text'>The TREC 2008 Blog track workshop</title><content type='html'>We just came back from Gaithersburg a few days ago. It was a nice (and cold!) week at the &lt;a href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt; 2008 conference. Besides presenting the main results of &lt;a href="http://trec.nist.gov/pubs.html"&gt;our participation&lt;/a&gt; in the Blog, Enterprise, and Relevance Feedback tracks, we had fruitful discussions at the &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG/"&gt;Blog track&lt;/a&gt; workshop regarding the directions of the track for 2009.&lt;br /&gt;&lt;br /&gt;It was a consensus among the attendees that opinion retrieval and polarity detection are still open, relevant problems. Yet a few groups managed to deploy interesting techniques that achieved consistent opinion retrieval performances across several strongly performing baselines in the track this year, polarity detection approaches looked rather naive. It was suggested that polarity detection be investigated at a finer granularity (e.g., at the sentence rather than the document level). This, however, could result in crossing the boundaries with respect to the &lt;a href="http://www.nist.gov/tac/"&gt;TAC&lt;/a&gt; conference.&lt;br /&gt;&lt;br /&gt;Nonetheless, believing that, after three years, the Blog track has contributed a comprehensive experimental setting for those who wish to continue investigating these search scenarios, the organisers decided to discontinue the opinion finding and polarity tasks, at least in their current format. Instead, they propose to investigate the opinionated nature of blogs as one of many interesting facets of a broader search task. This task extends the current blog distillation task by moving beyond topic relevance and introducing different requirements in order to qualify "good" blogs, i.e., blogs that have a recurrent interest in a given topic and that also fulfil a set of predefined "facets". This way, for instance, one could search for humorous blogs about the government, or opinionated blogs about whisky.&lt;br /&gt;&lt;br /&gt;Besides this faceted blog distillation task, a second task was considered relevant and worth investigating by the workshop attendees, namely, tracking stories on the blogosphere. The aim is to investigate how stories emerge and evolve along the time frame of the blog corpus. It was also noted that this task could be linked to a news search task so as to draw a connection between stories published on the blogosphere and on the mainstream media.&lt;br /&gt;&lt;br /&gt;As pointed out, however, the 11-weeks time frame of the Blogs06 collection does not adequately support the story tracking task. Furthermore, the availability of a more representative sample of the blogosphere is an important step towards addressing blog search as a social media problem. For such, a new corpus will be used in 2009, with a much larger size and time frame.&lt;br /&gt;&lt;br /&gt;For those who did not attend the Blog track workshop at TREC, please feel free to post your comments about the proposed tasks for 2009.&lt;br /&gt;&lt;br /&gt;Hope you all join us in the TREC 2009 Blog track!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-2869424145165061296?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/2869424145165061296/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=2869424145165061296' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2869424145165061296'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/2869424145165061296'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/11/trec-2008-blog-track-workshop.html' title='The TREC 2008 Blog track workshop'/><author><name>Rodrygo L.T. Santos</name><uri>http://www.blogger.com/profile/09502952528669992135</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://4.bp.blogspot.com/_JtuxhJ3QzZg/STMMxScZpiI/AAAAAAAAAEY/Zi4Nmre6mfk/S220/n767603947_216647_5827.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-7702154364174561476</id><published>2008-11-15T16:31:00.001Z</published><updated>2008-11-16T15:16:38.379Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Blog track; TREC; Enterprise track; Relevance feedback track;'/><title type='text'>TREC 2008</title><content type='html'>Shortly, we will be travelling to attend the &lt;a href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt; 2008 conference in Gaithersburg, Maryland (18-21 November 2008).  We have been very busy analysing the sheer volume of data that was collected in the &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;Blog track&lt;/a&gt; this year. Indeed, this year, we ran a very large-scale experiment with the aim to draw a better understanding of the most effective and stable opinion-finding techniques. Moreover, we also tightened up the blog distillation task (feed search task), so as it truly runs as a distillation task.  Following the traditional TREC conference cycle, the Blog track 2008 results will be first presented to the TREC 2008 participating groups next year. They will then be made available to all interested parties around February 2009 when the TREC 2008 final Proceedings go &lt;a href="http://http//trec.nist.gov/proceedings/proceedings.html"&gt;online&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Plans for the &lt;a href="http://terrierteam.blogspot.com/2008/10/trec-blog-track-will-run-in-2009.html"&gt;TREC 2009 Blog track&lt;/a&gt; will be discussed and refined during the TREC Blog track workshop in the afternoon of Thursday 20th November.&lt;br /&gt;&lt;br /&gt;In addition to our involvement in the organisation of the Blog track, we will be giving a presentation on the work we did this year in the newly introduced Relevance Feedback track. We have also prepared  two posters summarising our results in the Enteprise and Blog tracks.&lt;br /&gt;&lt;br /&gt;It looks like we are set for a very exciting and busy week. We hope to see many of you in TREC.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-7702154364174561476?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/7702154364174561476/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=7702154364174561476' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/7702154364174561476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/7702154364174561476'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/11/trec-2008.html' title='TREC 2008'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-3910643565112647039</id><published>2008-11-10T22:31:00.000Z</published><updated>2008-11-10T23:16:18.412Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Virtual observatory; Astronomy; Semantic web; Information Retrieval; Workshop'/><title type='text'>SEMAST 2009</title><content type='html'>We are continuing organising events in Glasgow. After the &lt;a href="http://www.dcs.gla.ac.uk/essir2007"&gt;ESSIR2007&lt;/a&gt; summer school and the &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR2008&lt;/a&gt; conference, we will be organising the second &lt;a href="http://www.dcs.gla.ac.uk/workshops/semast09/"&gt;Practical Semantic Astronomy Workshop (SEMAST 2009)&lt;/a&gt; from 2nd to 5th March 2009.&lt;br /&gt;&lt;br /&gt;The Practical Semantic Astronomy 2009 is the second in a series of  workshops first held at &lt;a href="http://www.cacr.caltech.edu/semast/"&gt;Caltech&lt;/a&gt; in February 2008.  The workshop brings together experts from a broad range of disciplines using  semantic technologies, alongside practitioners experimenting with these techniques, to address current problems in astroinformatics.&lt;br /&gt;&lt;br /&gt;Our involvement in the organisation of this workshop is under the auspices of the  &lt;a href="http://explicator.dcs.gla.ac.uk/"&gt;Explicator&lt;/a&gt; project, where we have been working with astronomers and physicists on developing techniques to provide intelligent access to multiple sources. The Explicator project supports the efforts of the &lt;a href="http://www.ivoa.net/"&gt;Virtual Observatory &lt;/a&gt; community.&lt;br /&gt;&lt;br /&gt;The Virtual Observatory is a loose planet-wide collaboration of  astronomy computing projects, aiming to make available the high-volume and rich data of astronomy.  Although astronomical data is  generally well-described, it is very dispersed, so that there is a  substantial data-discovery problem, making it fertile ground for the  sorts of semantic approaches applied with such success in other  disciplines.&lt;br /&gt;&lt;br /&gt;The Explicator project aims to bridge the gap between information retrieval and semantic web technologies in a domain-specific application. The SEMAST 2009 workshop is a continuation of this effort. We hope to see many of you in Glasgow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-3910643565112647039?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/3910643565112647039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=3910643565112647039' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/3910643565112647039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/3910643565112647039'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/11/semast-2009.html' title='SEMAST 2009'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5298757434359359540</id><published>2008-10-21T20:14:00.000+01:00</published><updated>2008-10-21T21:47:38.740+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion-finding; branding; blogging; TREC Blog track'/><title type='text'>Blogging is also about branding</title><content type='html'>Recently, Daniel Tunkelang wrote  a blog post about &lt;a href="http://thenoisychannel.com/2008/10/05/why-do-i-blog/"&gt;why he was blogging&lt;/a&gt;. In a nutshell, he considers blogging to be fun and highlights how it can increase the "reputation capital" of the blogger. Daniel Lemire made a follow-up, stressing the &lt;a href="http://www.daniel-lemire.com/blog/archives/2008/10/18/blogging-is-networking/"&gt;networking benefit of blogging&lt;/a&gt;. I very much concur with both views. On the other hand, while blogging, both bloggers do not shun away from expressing their thoughts and opinions on various topics ranging from &lt;a href="http://thenoisychannel.com/2008/06/20/enterprise-search-done-right/"&gt;enterprise&lt;/a&gt; and &lt;a href="http://thenoisychannel.com/2008/09/14/is-blog-search-different/"&gt;blog search&lt;/a&gt; to&lt;a href="http://www.daniel-lemire.com/blog/archives/2007/08/06/on-the-upcoming-collapse-of-peer-review/"&gt; peer reviewing&lt;/a&gt; or the &lt;a href="http://www.daniel-lemire.com/blog/archives/2008/06/05/why-pure-theory-is-wasteful/"&gt;benefit of pure theoretical research,&lt;/a&gt; through opinions on a search engine such as &lt;a href="http://thenoisychannel.com/2008/10/16/duck-duck-go/"&gt;Duck Duck Go&lt;/a&gt;!    &lt;br /&gt;&lt;br /&gt;Such perspectives and opinions are not only informative and  valuable to readers like myself, but they are also extremely important for various organisations. Indeed, according to an article on the &lt;a href="http://www.bcs.org/"&gt;BCS&lt;/a&gt; news website, &lt;a href="http://www.bcs.org/server.php?show=conWebDoc.22346"&gt;blogging is very important for brands&lt;/a&gt;. The article quotes Rachel Hawkes, co-founder and editor of the &lt;a href="http://www.socialmediaportal.com/"&gt;Social Media Portal (SMP)&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Blogs provide an opportunity for a two-way interaction to take place between business and consumer.  This allows customers to provide 'incredibly valuable' feedback on how the brand is doing in the real world, which can help guide improvements and sales strategies.&lt;/blockquote&gt;&lt;br /&gt;The above scenario is one of the motivations for the opinion-finding search task that we have been investigating in the TREC Blog track since three years. The task addresses a search scenario where a user aims to uncover what the bloggers/consumers are saying or thinking about X. If the "user" is a business, and X is one of its products, then “taking the pulse of the blogosphere” is very important for this business's branding. In fact, the opinion-finding task can naturally be associated with settings such tracking consumer-generated content, brand monitoring, and, more generally, media analysis. Findings and insights gained from 3-years of the opinion-finding search task at the TREC Blog track will be discussed in the furthcoming TREC Conference (18-21 November 2008), held in NIST, USA.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5298757434359359540?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5298757434359359540/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5298757434359359540' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5298757434359359540'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5298757434359359540'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/10/blogging-is-also-about-branding.html' title='Blogging is also about branding'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-5098900302731493951</id><published>2008-10-20T18:52:00.000+01:00</published><updated>2008-10-21T09:34:54.708+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='travel; Blog search'/><category scheme='http://www.blogger.com/atom/ns#' term='CIKM'/><title type='text'>CIKM 2008</title><content type='html'>We will shortly be travelling to attend the &lt;a href="http://cikm2008.org/"&gt;CIKM 2008&lt;/a&gt; conference in Napa Valley. The organisers are announcing that it will be the biggest ever CIKM conference,  and hope that it will be the most memorable one.&lt;br /&gt;&lt;br /&gt;Following the &lt;a href="http://ecir2008.dcs.gla.ac.uk/"&gt;ECIR 2008&lt;/a&gt; conference example, I'm pleased to note that the organisers are making CIKM 2008 a  green conference, through optimal usage of logistics and resources.&lt;br /&gt;&lt;br /&gt;The conference has a very exciting scientific program, and an impressive social program, including a &lt;a href="http://en.wikipedia.org/wiki/Halloween"&gt;Halloween&lt;/a&gt; party.&lt;br /&gt;&lt;br /&gt;We will be presenting two full papers in the Blog session on Wednesday 29th October 2008, 10:15-11:45am:. Both papers tackle search tasks investigated within the &lt;a href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt; &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;Blog track: &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;Key Blog Distillation: Ranking Aggregate. &lt;/i&gt;Craig Macdonald, Iadh Ounis (University of Glasgow, UK)  - The paper addresses the blog distillation task, as task characterised as “Find me a blog with a principle, recurring interest in X.”&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;An Effective Statistical Approach to Blog Post Opinion Retrieval&lt;/i&gt;. Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (University of Glasgow, UK) - The paper tackles the opinion-finding task in the blogosphere, a task characterised by “What do people think about X?”&lt;/li&gt;&lt;/ul&gt;The third paper of the Blog session is from UMass, a regular participant in the TREC Blog track. It also investigates the blog distillation search task:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;Blog Site Search Using Resource Selection&lt;/i&gt;. Jangwon Seo, Bruce Croft (University of Massachusetts Amherst, USA)&lt;/li&gt;&lt;/ul&gt;We hope to see you in CIKM!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-5098900302731493951?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/5098900302731493951/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=5098900302731493951' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5098900302731493951'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/5098900302731493951'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/10/cikm-2008.html' title='CIKM 2008'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-92678104084603193</id><published>2008-10-19T15:14:00.000+01:00</published><updated>2008-10-19T16:19:28.173+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Blog track; TREC'/><title type='text'>TREC Blog track will run in 2009</title><content type='html'>Following our previous &lt;a href="http://terrierteam.blogspot.com/2008/09/about-blog-search-tasks.html"&gt;post&lt;/a&gt;, I'm pleased to report that we have just heard that the &lt;a href="http://trec.nist.gov"&gt;TREC&lt;/a&gt; program committee has accepted our proposal for the blog track to continue  in 2009.&lt;br /&gt;&lt;br /&gt;The intention is to use a larger Blog collection, and to have at least one search task that goes beyond topical relevance by taking into account a facet representing an attribute of required "quality".&lt;br /&gt;&lt;br /&gt;There will be a workshop to discuss the proposed blog search tasks at the TREC 2008 conference on the afternoon of Thursday 20th November 2008.&lt;br /&gt;&lt;br /&gt;If you cannot attend TREC, and wish to make any comments or suggestions, please feel free to post your thoughts in this post, or to email them privately, if you wish so.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-92678104084603193?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/92678104084603193/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=92678104084603193' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/92678104084603193'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/92678104084603193'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/10/trec-blog-track-will-run-in-2009.html' title='TREC Blog track will run in 2009'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-4320757610237522949</id><published>2008-09-12T22:22:00.000+01:00</published><updated>2008-09-15T13:58:28.343+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ICWSM'/><category scheme='http://www.blogger.com/atom/ns#' term='blog'/><category scheme='http://www.blogger.com/atom/ns#' term='cfp'/><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><category scheme='http://www.blogger.com/atom/ns#' term='deadlines'/><title type='text'>Conference Deadline Traffic Jam</title><content type='html'>I noted today that &lt;a href="http://www.blogger.com/%3Ccite%3Ehttp://datamining.typepad.com/"&gt;Matthew Hurst&lt;/a&gt; has posted the &lt;a href="http://datamining.typepad.com/data_mining/2008/09/icwsm-call-for-papers.html"&gt;ICWSM 2009 Call for Papers&lt;/a&gt;. Unfortunately the submission deadline is on the 21st January. This is a full 6 weeks later than for ICWSM 2008.  Moreover, this falls four days before the SIGIR full paper deadline.&lt;br /&gt;&lt;br /&gt;As an IR researcher, we have to target certain conferences. While I'd like to have multiple papers ready for several conferences with similar deadlines in advance, various pressures and reasons don't make that possible (e.g. I'd like a holiday at Christmas!).&lt;br /&gt;&lt;br /&gt;The conference deadlines in January and February now look like:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;11th January: WWW 2009 Posters due&lt;/li&gt;&lt;li&gt;19th January: SIGIR 2009 Abstracts due&lt;/li&gt;&lt;li&gt;21st January: ICWSM 2009 Papers/Posters/Demos dues&lt;/li&gt;&lt;li&gt;25th January: SIGIR 2009 Papers due&lt;/li&gt;&lt;li&gt;9th February: NAACL-HLT 2009 Short Papers due&lt;/li&gt;&lt;li&gt;22nd February: ACL-IJCNLP Papers due&lt;/li&gt;&lt;li&gt;23rd February: SIGIR 2009 Posters due&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Happy writing!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-4320757610237522949?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/4320757610237522949/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=4320757610237522949' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4320757610237522949'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/4320757610237522949'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/09/conference-deadline-traffic-jam.html' title='Conference Deadline Traffic Jam'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-6658115910150977890</id><published>2008-09-10T03:00:00.000+01:00</published><updated>2008-09-10T14:20:59.163+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Faceted Blog search interface'/><category scheme='http://www.blogger.com/atom/ns#' term='TREC'/><category scheme='http://www.blogger.com/atom/ns#' term='blog search'/><title type='text'>About Blog Search Tasks</title><content type='html'>We have been very busy recently with the &lt;a href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt; 2008 &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;Blog track&lt;/a&gt;. Now that all runs have been submitted and that the relevance assessments are on-going, it is the time of the year where we start planning for the future of the track at TREC 2009! Indeed, TREC operates a policy where existing tracks are renewed on an annual basis, and following the submission of a proposal.&lt;br /&gt;&lt;br /&gt;Back in 2006, when we first proposed the Blog track, our aim was to have a long-term objective for the track, recognising that the richness of the blogosphere and its peculiarities will require several years of investigation before reaching a full understanding of the different blog search tasks, and how they should be effectively addressed. In particular, we proposed to adopt an incremental approach, where we begin with basic blog search tasks and progressively move to more complex search scenarios.&lt;br /&gt;&lt;br /&gt;In the first three years of the track (2006-2008), we  addressed two main blog search tasks:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Opinion finding: involves locating blog posts that express an &lt;em&gt;opinion&lt;/em&gt; about a given target.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Blog distillation: involves locating blogs that are &lt;em&gt;principally devoted&lt;/em&gt; to a topic &lt;em&gt;X&lt;/em&gt; over the timespan of the feed.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The first task tackles an important aspect of blogs, namely their opinionated/subjective nature, and the tendency of bloggers to express views, thoughts and feelings towards named-entities. This tasks helps users to find out what the bloggers think about  X. The second search task addresses a scenario where the user would like to find a blog to follow or read in their RSS reader. Our main findings and conclusions from the first two years of the Blog track at TREC are summarised in the ICWSM 2008 paper, entitled &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm/publications/ounis08trecblog.pdf"&gt;On the Trec Blog Track&lt;/a&gt;. The Blog track &lt;a href="http://trec.nist.gov/pubs/trec15/papers/BLOG06.OVERVIEW.pdf"&gt;2006&lt;/a&gt; and &lt;a href="http://trec.nist.gov/pubs/trec16/papers/BLOG.OVERVIEW08.pdf"&gt;2007&lt;/a&gt; overview papers provide further detailed analysis and results.&lt;br /&gt;&lt;br /&gt;We are now proposing to move to a second phase of the Blog track, where more refined and complex search scenarios should be investigated. In particular, we are thinking to use a new and larger collection of blogs, which has a much longer timespan than the 11-weeks period covered in the &lt;a href="http://ir.dcs.gla.ac.uk/test_collections/blog06info.html"&gt;Blog06&lt;/a&gt; collection. This allows investigating another important characteristic of the blogosphere, namely the temporal/chronological aspect of blogging, and various related search tasks such as story identification and tracking.&lt;br /&gt;&lt;br /&gt;While we were thinking about such possible future tasks, we came across a position paper by Marti Hearst, Matthew Hurst and Susan Dumais, entitled "&lt;a href="http://people.ischool.berkeley.edu/%7Ehearst/papers/blogsearch08.pdf"&gt;What Should Blog Search Look Like?&lt;/a&gt;", which will be presented in the forthcoming &lt;a href="http://ir.mathcs.emory.edu/SSM2008/"&gt;Search in Social Media (SSM 2008)  &lt;/a&gt;workshop at &lt;a href="http://www.cikm2008.org/"&gt;CIKM 2008&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In particular, Hearst et al. propose that the blog distillation task should be further refined by taking into account a number of dimensions or attributes such as the authority of the blog, the trustworthiness of its authors, the genre of the blog and its style of writing. For example, a user might be interested in blogs to read about a topic X, but where the blogger expresses in-depth viewpoints, backed up by a scientific methodology or evidence.  The Cranfield evaluation paradigm adopted by TREC requires deeper thoughts about how relevance assessments should be conducted in such a scenario.&lt;br /&gt;&lt;br /&gt;Unsurprisingly for a strong advocate of the importance of user interfaces and visualisation tools for information retrieval, Hearst together with her co-authors propose a &lt;span style="font-style: italic;"&gt;faceted&lt;/span&gt; blog search interface to help the user explore the attributes of the blogs before choosing those they wish to follow or read, i.e. &lt;a href="http://thenoisychannel.blogspot.com/2008/09/query-elaboration-as-dialogue.html"&gt;exploratory search&lt;/a&gt; at its best!  The conclusion of the  paper provides a good summary of  Hearst et al.'s views:&lt;br /&gt;&lt;blockquote&gt;For the problem of selecting a blog to read, we propose a faceted interface which highlights different attributes of interest, with a focus on people and on matching the taste preferences of the reader. For the task of “taking the pulse of the blogosphere,” we suggest that blog data be integrated with other social media and that the existing work on tracking trends and aggregating views is heading in the right direction.&lt;/blockquote&gt;As we are trying to wrap up our proposal for TREC 2009, we would like to hear other suggestions and comments about what blog search should look like.  Please feel free to post your thoughts and comments in this post, or to email them privately, if you wish so.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-6658115910150977890?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/6658115910150977890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=6658115910150977890' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6658115910150977890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6658115910150977890'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/09/about-blog-search-tasks.html' title='About Blog Search Tasks'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-6042039160417762323</id><published>2008-09-01T22:25:00.001+01:00</published><updated>2008-09-02T11:20:10.472+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='expert search; commoditising workers'/><title type='text'>From Expert Search to Commoditising Workers</title><content type='html'>While I'm putting the finishing touches to my PhD thesis (titled The Voting Model for Expert and Blog Search), I thought I'd pick up on a recent related article.&lt;br /&gt;&lt;br /&gt;An excerpt from &lt;a href="http://thenumerati.net/"&gt;&lt;em&gt;The Numerati&lt;/em&gt;&lt;/a&gt; has been published on &lt;a href="http://www.businessweek.com/magazine/content/08_36/b4098032904806.htm?campaign_id=rss_daily"&gt;BusinessWeek.com&lt;/a&gt;. &lt;cite&gt;In the excerpt, Stephen Baker &lt;/cite&gt;interviews the scientist &lt;a href="http://en.wikipedia.org/wiki/Samer_Takriti"&gt;Samer Takriti&lt;/a&gt; while he was working at IBM . Samer, who is a specialist in &lt;a href="http://en.wikipedia.org/wiki/Operations_Research" title="Operations Research" class="mw-redirect"&gt;Operations Research,&lt;/a&gt;  is working on commoditising workers. Similar to how supply chains and production lines have been modelled and improved, Samer believes that people can be assigned to projects using combinations of their availability, their scost, and their &lt;b&gt;skills/expertise&lt;/b&gt;.  The idea is to optimise the use of co-workers, leading to a better productivity within an organisation.&lt;br /&gt;&lt;br /&gt;What's really interesting here is that this is a real application of expert search technology, being applied not just to satisfy occasional expertise needs ("I'm stuck, who should I ask for help?"), but in daily use to determine work assignments and to increase productivity. A fusion of search technology with &lt;a href="http://en.wikipedia.org/wiki/Constraint_optimization"&gt; constraint optimisation&lt;/a&gt;. Tools like these are likely to become invaluable in assigning jobs in global consultancy companies, where managers are unlikely to know everyone at their disposal. Such tools could even be used to identify the best training path for a co-worker to become skilled and productive in a particular area.&lt;br /&gt;&lt;blockquote&gt;Imagine, says Aleksandra Mojsilovic, one of Takriti's close colleagues, that the company has a superior worker named Joe Smith. Management could really benefit from two or three others just like him, or even a dozen. Once the company has built rich mathematical profiles of Smith and his fellow workers, it might be possible to identify at least a few of the experiences or routines that make Joe Smith so good. "If you had the full employment history, you could even compute the steps to become a Joe Smith," she says.&lt;/blockquote&gt;Van drivers have been having their routes assigned automatically for many years. Why shouldn't consultants at IBM be any different? However, Baker points out that some people may be left out by systems (his example, a senior consultant left out because of his high cost, which Takriti counteracts by allowing senior staff members more "time on the bench" than junior staff, because when senior consultants are utilised they get larger cheques). Even still, the concern is this reliance on an expert search system to assign jobs when "expertise relevance" is an even vaguer concept than "document relevance", and expert search systems are not yet (and might never be) as accurate as a travelling salesman solution or a program to optimise a supply chain.&lt;br /&gt;&lt;br /&gt;(Via &lt;a href="http://science.slashdot.org/science/08/09/01/1137204.shtml"&gt;Slashdot&lt;/a&gt;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-6042039160417762323?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/6042039160417762323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=6042039160417762323' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6042039160417762323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6042039160417762323'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/09/from-expert-search-to-commoditising.html' title='From Expert Search to Commoditising Workers'/><author><name>Craig Macdonald</name><uri>http://www.blogger.com/profile/13764972230026912718</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_f7dcGnhz7mo/TEVaK7ftYFI/AAAAAAAAABU/YdVJVx4CC7M/s1600-R/craig_up_a_hill.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-7006718362889160395</id><published>2008-08-05T09:40:00.000+01:00</published><updated>2008-08-06T11:34:16.506+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion-finding'/><category scheme='http://www.blogger.com/atom/ns#' term='evaluation'/><category scheme='http://www.blogger.com/atom/ns#' term='document prior'/><category scheme='http://www.blogger.com/atom/ns#' term='training'/><category scheme='http://www.blogger.com/atom/ns#' term='blog search'/><category scheme='http://www.blogger.com/atom/ns#' term='Sigir'/><title type='text'>SIGIR 2008</title><content type='html'>We are just back from Singapore, where we have attended the extremely well organised &lt;a href="http://www.sigir2008.org/"&gt;SIGIR'08&lt;/a&gt; conference. We presented one full paper and three posters.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig&lt;/a&gt; presented our full paper entitled &lt;a href="http://portal.acm.org/citation.cfm?id=1390334.1390348"&gt;Retrieval Sensitivity Under Training Using Different Measures&lt;/a&gt;. Through a large-scale empirical evaluation, the paper addresses an important practical issue, when deploying a search engine, namely whether it matters which evaluation measure is used during training, especially when the available training data is very incomplete. The paper shows among other results that it is not necessarily appropriate to train by directly optimising the target evaluation measure (e.g. MAP) .  In particular, the paper shows that bPref, infAP and nDCG are all superior training measures than MAP when the training dataset is incomplete and when the evaluation measure is MAP. Interestingly,  the same research question has been addressed by &lt;a href="http://www.soi.city.ac.uk/%7Eser/homepage.html"&gt;Stephen Robertson&lt;/a&gt;, albeit more theoretically, in his keynote talk at the &lt;a href="http://research.microsoft.com/users/LR4IR-2008/"&gt;SIGIR'08 LR4IR&lt;/a&gt; workshop, where he &lt;/span&gt;justified and illustrated why optimising directly the evaluation measure on the training set is not often a good approach (as we say, "Great minds think alike"!).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Tw-GvQO7xww/SJgkf7xpQjI/AAAAAAAAAAY/q1fgn9sIOnc/s1600-h/normal_IMG_1913.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_Tw-GvQO7xww/SJgkf7xpQjI/AAAAAAAAAAY/q1fgn9sIOnc/s320/normal_IMG_1913.JPG" alt="" id="BLOGGER_PHOTO_ID_5230971098231292466" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://ir.dcs.gla.ac.uk/terrier"&gt;Terrier Team&lt;/a&gt; also presented three posters at the conference:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=1390334.1390473"&gt;Ranking Opinionated Blog Posts Using Opi&lt;/a&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=1390334.1390473"&gt;nionFinder&lt;/a&gt; (Presented by &lt;a href="http://www.dcs.gla.ac.uk/%7Eben"&gt;Ben&lt;/a&gt;): The paper proposes an approach to use and integrate an NLP opinion-identification toolkit, &lt;a href="http://www.cs.pitt.edu/mpqa/opinionfinderrelease/"&gt;OpinionFinder&lt;/a&gt;, into the retrieval process of an IR system, such that opinionated, relevant documents are retrieved in response to a query. This is one of the very few opinion finding detection approaches that were shown to be effective in the&lt;a href="http://trec.nist.gov/pubs/trec16/papers/BLOG.OVERVIEW08.pdf"&gt; TREC Blog Track&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=1390334.1390483"&gt;Limits of Opinion-Finding Baseline Systems&lt;/a&gt; (Presented by &lt;a href="http://www.dcs.gla.ac.uk/%7Ecraigm"&gt;Craig&lt;/a&gt;/&lt;a href="http://www.dcs.gla.ac.uk/%7Eounis"&gt;Iadh&lt;/a&gt;): The paper investigates how the underlying baseline retrieval system performance affects the overall opinion-finding performance. Two effective opinion-finding techniques are applied to all the baseline runs submitted to the TREC 2007 Blog track, leading to interesting insights and conclusions.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Tw-GvQO7xww/SJgleDcp5OI/AAAAAAAAAAg/SLBdwWjdOxw/s1600-h/IMG_0998.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_Tw-GvQO7xww/SJgleDcp5OI/AAAAAAAAAAg/SLBdwWjdOxw/s320/IMG_0998.JPG" alt="" id="BLOGGER_PHOTO_ID_5230972165442626786" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=1390334.1390490&amp;amp;coll=ACM&amp;amp;dl=ACM&amp;amp;type=series&amp;amp;idx=SERIES278&amp;amp;part=series&amp;amp;WantType=Proceedings&amp;amp;title=SIGIR&amp;amp;CFID=39457284&amp;amp;CFTOKEN=33325742"&gt;Automatic Document Prior Feature Selection for  Web Retrieval&lt;/a&gt; (Presented by &lt;a href="http://www.dcs.gla.ac.uk/%7Epj"&gt;PJ&lt;/a&gt;): The paper investigates whether the retrieval performance of a Web search engine can be further enhanced by selecting the best document prior feature (e.g. PageRank, URL-Depth, etc.) on a per-query basis. The paper proposes a novel method for selecting the best document prior feature on a per-query basis.&lt;br /&gt;&lt;br /&gt;Ps: Photos are from the SIGIR'08 website.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-7006718362889160395?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/7006718362889160395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=7006718362889160395' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/7006718362889160395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/7006718362889160395'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/08/sigir-2008.html' title='SIGIR 2008'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_Tw-GvQO7xww/SJgkf7xpQjI/AAAAAAAAAAY/q1fgn9sIOnc/s72-c/normal_IMG_1913.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6043705792807544709.post-6637495963812706636</id><published>2008-08-04T20:54:00.000+01:00</published><updated>2008-08-04T21:58:51.043+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blog'/><category scheme='http://www.blogger.com/atom/ns#' term='information retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='forum'/><title type='text'>Welcome to the Terrier Team Blog</title><content type='html'>It has been a while since we started thinking about having a blog for the Terrier Team. Actually, since we have been involved in the organisation of a &lt;a href="http://trec.nist.gov/"&gt;TREC &lt;/a&gt; &lt;a href="http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG"&gt;blog track &lt;/a&gt; in 2006.&lt;br /&gt;&lt;br /&gt;Recently, we have been encouraged by the very informative and interesting information retrieval-related discussions, taking place in blogs such as&lt;br /&gt;&lt;a href="http://thenoisychannel.blogspot.com/"&gt;The Noisy Channel &lt;/a&gt;, the &lt;a href="http://www.searchenginecaffe.com/"&gt;Search Engine Caffè&lt;/a&gt;, or the&lt;br /&gt;&lt;a href="http://windowoffice.tumblr.com/"&gt; Window Office &lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;From mere regular readers of information retrieval blogs, we thought that it is now the right time to become more actively involved in blogging. Hence the creation of this new forum, where we intend to post news about our research work and activities. We hope to share our thoughts on information retrieval research, and  to engage in a dialogue with our fellow colleagues and friends.&lt;br /&gt;&lt;br /&gt;We do hope that many of you will join us in this forum.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6043705792807544709-6637495963812706636?l=terrierteam.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://terrierteam.blogspot.com/feeds/6637495963812706636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6043705792807544709&amp;postID=6637495963812706636' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6637495963812706636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6043705792807544709/posts/default/6637495963812706636'/><link rel='alternate' type='text/html' href='http://terrierteam.blogspot.com/2008/08/welcome-to-terrier-team-blog.html' title='Welcome to the Terrier Team Blog'/><author><name>Iadh Ounis</name><uri>http://www.blogger.com/profile/05740425172350940695</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry></feed>
