Friday, July 27, 2012

SMART: An open source framework for searching the physical world

Some of our readers are probably aware of our new project SMART, which aims to develop a new technology for the real-time indexing and retrieval of sensor and social streams. This three-year project is funded by the European Commission under the Seventh Framework Programme (grant number 287583). The project, which has started in November 2011, has already received a large national and international press coverage in online and print news over the last month. The BBC will shortly be broadcasting a piece of television about the project.

The name of the project and the resulting search engine, SMART, acknowledges the vision of the Internet of Things in general, and the concept of smart cities in particular. Indeed, SMART builds on the growing trend of smart cities, where in addition to physical infrastructure (roads, buildings), digital knowledge infrastructure is deployed to serve the needs of the citizens and local governments. The backbone of the digital knowledge infrastructure is mainly composed of sensors such as cameras, microphone arrays, or other environmental sensors, from weather to parking sensors. For example, in "smart cities", drivers can be notified where it is good to park their car or where to avoid traffic jams in the city centre at any time of the day. The main idea of the SMART project is to connect these sensors to the Internet and have search technologies to allow citizens to benefit from the information that these sensors can provide in real-time.

The SMART search engine builds upon the Terrier Information Retrieval platform, and exemplifies our recent move towards building new, separate and tailored products on top of the Terrier platform. In particular, Terrier has been enhanced and expanded with real-time indexing and a scalable distributed architecture allowing to process and handle a large volume of continuous and parallel streams.

SMART is a multi-disciplinary project in nature, encompassing state-of-the-art technologies from audio & video processing, social search and reasoning. Building upon these technologies, SMART analyses the input from sensors in real-time, for example to detect large crowds, or if live music can be heard. These can be compared with recent posts on social networks from the same area, to see whether the system can learn more about what is happening in the area around the sensors. By analysing the sensors across multiple locations within the city, when a user asks “what’s happening near me”, the system has some idea of which locations have the most interesting events.

Clearly, making real-world events searchable can have privacy/ethics implications. In fact, never before in our research have we been confronted with such a dichotomy between what is technologically feasible and what we conceive to be ethical. That's why we and our partners in the project are carefully considering privacy issues in our research. Indeed, we are closely working with various national Data Protection Authorities (DPAs) (i) to ensure that we don’t overstep the legal or ethical boundaries of privacy and (ii) to provide guidelines for the ethical implications of the SMART technologies and help prospective deployers to use/deploy SMART in a legal, ethical, and friendly manner. Interested readers can consult the first issue of the SMART Newsletter for further details about our ongoing efforts towards the privacy issue.

While we will be trialling the SMART search technology in The City of Santander (Spain), the key infrastructure of SMART (including the search components based on Terrier) will be made available as open source, encapsulating a vision whereby other smart cities can easily become involved and benefit from the project's outcomes. We expect the first release of the SMART search technology to become available as open source under the Mozilla Public License (MPL) 2.0 by the end of 2012. By releasing parts of SMART as open source, we aim to allow the formation of a community of early adopters that will be key for evaluating and sustaining the project.

With this in mind, we have just published a paper in the SIGIR 2012 Open Source Information Retrieval (OSIR 2012) workshop describing our current progress in the project as well as the open source vision of the project:

SMART: An open source framework for searching the physical world. M-Dyaa Albakour, Craig Macdonald, Iadh Ounis, Aristodemos Pnevmatikakis and John Soldatos. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. Portland, Oregon, USA. August 2012.

As always, we welcome comments and contributions from smart cities, community members and developers to the SMART vision.

Wednesday, July 25, 2012

From Puppy to Maturity: Experiences in Developing Terrier

We will be taking part in the SIGIR 2012 Workshop on Open Source Information Retrieval. In particular, we have published a paper on the Terrier open source information retrieval platform, detailing the vision behind the platform, some recent developments in Terrier, as well as a roadmap for future releases.

As always, our vision for the Terrier platform is to continue empowering researchers and practitioners in information retrieval (IR) with up-to-date, easily adaptable, effective and scalable indexing and search approaches, allowing them to build and evaluate the next generation IR applications. 

In particular, Terrier will be moving towards feature-based retrieval, in line with the increasing importance of the learning-to-rank paradigm in modern information retrieval where machine-learned ranking functions combining multiple features are deployed. To do so, Terrier will be supporting the efficient and effective extraction of query-independent and query-dependent features.

To support scalability and efficiency, Terrier's data structures have undergone a major enhancement to support advanced dynamic pruning techniques, as well as the development of applications requiring distributed and real-time indexing and retrieval such as Twitter search.

Finally, the growth of the Terrier platform over the past decade into exciting new areas such as MapReduce indexing and crowdsourcing entails increased functionality, but also platform complexity. To avoid software bloat, we are moving from a monolithic release structure, to a system of periodic core releases and timely plugin expansions. The first such release will be the CrowdTerrier plugin, providing  researchers with an out-of-the-box tool to achieve fast and cheap relevance assessments.

A more comprehensive account of the forthcoming Terrier releases is detailed in our paper below:

From Puppy to Maturity: Experiences in Developing Terrier. Craig Macdonald, Richard McCreadie, Rodrygo Santos and Iadh Ounis. In Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. Portland, Oregon, USA. August 2012

We hope to see many colleagues joining us to work towards the objectives of the platform and enriching its functionalities. As always, we welcome suggestions and any feedback on the roadmap in the run up to the forthcoming Terrier 4.0.