Multilingual information access (MLIA) is increasingly part of many complex systems, such as digital libraries, intranet and enterprise portals, Web search engines.
The CLEF research community has been outstanding and very active in designing, developing, and testing MLIA methods
and techniques, constantly improving the performances of such components. But is this enough? Do we really know how
MLIA components (stop lists, stemmers, IR models, relevance feedback, translation techniques, etc.) behave with respect
to languages? Do we have a deep comprehension of how these components interact together when the language changes?
Unfortunately, today's picture is quite fragmentary since researchers have mainly focused on specific aspects of multilinguality but a comprehensive and unifying view is still missing. This situation prevents an easy adoption of MLIA techniques and technology transfer by relevant application and developer communities. Indeed, it is often difficult for people outside the IR community to extract from the specialised scientific literature indications about the most promising approaches and solutions.
We are thus launching a cooperative effort where a series of large-scale and systematic grid experiments will allow us to improve our comprehension of MLIA systems and gain an exhaustive picture of their behaviour with respect to languages. In this way, we can exploit the valuable resources and experimental collections made available by CLEF over the years in order to gain more insights about the effectiveness of the various weighting schemes and retrieval techniques with respect to the languages and to disseminate this knowledge to the relevant application and developer communities.
Individual researchers or small groups do not usually have the possibility of running large-scale and systematic experiments over a large set of experimental collections and resources.
Average CLEF participants.
Expert CLEF participants.
Suppose to depict the performances, e.g. mean average precision, of the composition of different IR
components across a set of languages as a kind of surface area, as represented in the figure above
which aims at providing you with an intuitive idea of the problem rather than conveying specific
The average CLEF participants, shown on the left, may only be able to sample a few points on this surface since, for example, they usually test just a few variations of their own or customary IR model with a stemmer for two or three languages. Instead, the expert CLEF participants, represented on the right, may have the expertise and competence to test all the possible variations of a given component across a set of languages, thus investigating a good slice of the surface area.
However, even though each of these cases produces valuable research results and contributes to the advancement of the
discipline, they are both still far removed from a clear and complete comprehension of the features and properties of the
surface represented in the figure above.
Therefore, Grid@CLEF aims at gaining a more exhaustive comprehension of the characteristics of this surface and recognizes that a far deeper sampling of it is needed to achieve this goal, as shown in the figure below. In this sense, Grid@CLEF will create a fine-grained grid of points over this surface and, hence, the name of the track comes.
Participants will be asked to take part in a series of carefully designed and controlled experiments in order to ensure that the tested MLIA components are really comparable across the participants and differences come only from the languages and tasks at hand. Grid experiments will be offered as monolingual, bilingual, and multilingual tasks. Participants will be asked to deeply analyse their own results and to discuss and compare them with those of other participants in order to obtain a comprehensive and shared understanding of MLIA with respect to languages.
Summing up, Grid@CLEF has the following goals:
Don't hesitate, join us and put your dots in the grid.