Information Management System

University of Padua

Department of Information Engineering

  • About IMS
  • Members
  • Research Areas
  • Projects
  • Events
  • Publications

Research Areas

The main research areas covered by IMS are:

  • Digital Cultural Heritage
  • Information Retrieval
  • Machine Learning
  • Web of Data
  • Structured Data

Digital Cultural Heritage

Throughout their history, cultural heritage institutions have had two central purposes. They have been charged firstly with preserving artefacts of cultural significance, and secondly, with describing and cataloguing these artefacts in a way that makes them accessible to a variety of audiences, from experienced researchers to the general public. The advent of the widespread digitization of cultural heritage collections has significant implications for institutions that hold these types of collections. The twin purposes of such institutions have continued to be important. Both digital preservation and access present challenges for owners of cultural heritage collections.

The issues that surround access are complex and far-reaching. Many of these issues are not unique to digital cultural heritage, but cultural heritage raises specific questions about supporting access to cultural heritage collections and individual artefacts.

The group has a long experience in developing original methods and tools to support the user in knowledge discovery and exploration through digital cultural collections. In particular:

  • digital library systems
  • digital archive systems
  • metadata and interoperability
  • user engagement
  • user generated content

Contact Person

Maristella Agosti

maristella.agostiunipd.it

http://www.dei.unipd.it/~agosti/

Collaborations

Trinity College Dublin (IE); Graz University of Technology (AT); University of Sheffield (UK); University of Duisburg-Essen (DE); University of Basel (CH); University of Athens (GR); University of Bari “Aldo Moro” (IT); Sapienza University of Rome (IT); University of Bologna (IT); Institute of Information Science and Technologies (ISTI), Italian National Research Council (CNR), Pisa (IT).


Information Retrieval

Information Retrieval (IR) is concerned with complex systems delivering a variety of key applications to industry and society: Web search engines, (bio)medical search, expertise retrieval systems, intellectual property and patent search, enterprise search, and many others. IR systems operate using a best match approach: in response to an often vague user query, they return a ranked list of documents ordered by the estimation of their relevance to that query.

Nowadays, user tasks and needs are becoming increasingly demanding, the data sources to be searched are rapidly evolving and greatly heterogeneous, the interaction between users and IR systems is much more articulated, and the systems themselves become increasingly complicated and constituted by many interrelated components.

In this context effectiveness, meant as the ability of systems to retrieve and better rank relevant documents while at the same time suppressing the retrieval of not relevant ones, is the primary concern. Since there are no a-priori exact answers to a user query, experimental evaluation based on effectiveness is the main driver of research and innovation in the field.

In particular, core research activities in the field are:

  • design and development of new IR models and indexing strategies
  • multilingual and multimodal information access
  • user context and implicit feedback
  • design and development of evaluation protocols and measures for both qualitative and quantitative assessment of system performances
  • visual analytics for experimental evaluation
  • application of IR techniques to several domains, e.g. cultural heritage, linguistics, social media

Contact Person

Nicola Ferro

nicola.ferrounipd.it

http://www.dei.unipd.it/~ferro/

Collaborations

Sapienza University of Rome (IT); Institute of Information Science and Technologies – ISTI, Italian National Research Council – CNR, Pisa (IT); University of Sheffield (UK); Dublin City University (IE); University of Toulouse (FR); University of Amsterdam (NL); University of Applied Sciences Western Switzerland – HES-SO (CH); Zurich University of Applied Sciences (CH); Universitat Politècnica de València (ES); Vienna University of Technology (AT); University of Tampere (FI); Aalborg University Copenhagen (DK); National Institute of Standards and Technology – NIST (USA); National Institute of Informatics – NII (JP); Indian Statistical Institute – ISI (IN); Dhirubhai Ambani Institute of Information and Communication Technology – DAIICT Gandinagar (IN).

Machine Learning

In Machine Learning (ML) computers are programmed to optimize a performance criterion using example data or past experience. It is reasonable to assume that there is a hidden process that explains the data we observe. Though we do not know the details of this process, we know that it is not completely random. This enacts the possibility of finding a good and useful approximation even though we may not be able to identify the process completely. Mathematical models defined upon parameters can be used for ML. The learning part of the models consists in choosing the parameters, which optimize a performance criterion with respect to observed data.

The research activities of our group focus on innovative interactive visualisation approaches to support machine learning for big data by tackling open research questions in two novel research areas: Interactive Machine Learning (IML) and Visual Analytics (VA). Both areas rely on human knowledge to improve the learning systems. IML focuses on the development of machine learning procedures based on design choices such as selection and creation of the model, definition of evidential features, and the setting of parameters. VA focuses on the design of interactive graphical representations of information that might better support the human ability to perceive and to construct meaningful patterns from data.

In particular, core research activities in the field are:

  • Bayesian machine learning
  • Design of visualisation techniques of probabilistic models for classification and retrieval
  • Cost sensitive learning
  • Interactive visualisation techniques for evaluation measures of ML and IR
  • Interactive relevance feedback
  • Model diverse sources of evidence for document classification and ranking

Contact Person

Giorgio Maria Di Nunzio

giorgiomaria.dinunziounipd.it

http://www.dei.unipd.it/~dinunzio/

Collaborations

University of Montreal (CA); Queensland University of Technology (AU); Tianjin University (CN). Sapienza University of Rome (IT).

Web of Data

One of the most relevant socio-economic and scientific changes in recent years has been the recognition of data as a valuable asset. The principal driver of this evolution is the Web of Data, the size of which is estimated to have exceeded 100 billion facts (i.e. semantically connected entities). The actual paradigm realising the Web of Data is the Linked Open Data (LOD), which by exploiting Web technologies, such as the Resource Framework Description (RDF), allows public data in machine-readable formats to be opened up ready for consumption and re-use. LOD is becoming the de-facto standard for data publishing, accessing and sharing because it allows for flexible manipulation, enrichment and discovery of data in addition to overcoming interoperability issues.

Nevertheless, LOD publishing is just the first step for revealing the ground-breaking potential of this approach residing in the semantic connections between data enabling new knowledge creation and discovery possibilities. Current efforts for disclosing this potential are being concentrated on the design of new methodologies for creating meaningful and possibly unexpected semantic links between data and for managing the knowledge created through these connections. This endeavour is shifting LOD from a publishing paradigm to a knowledge creation and sharing one. One of the research goals of our group is to study approaches which shift focus from the systems handling the data to the data themselves.

In particular, core research activities in the field are:

  • Methodologies for enabling data-level interoperability
  • Research data publishing
  • Linked Open Data
  • Dialectometrics
  • Data citation

Contact Person

Gianmaria Silvello

giorgiomaria.dinunziounipd.it

http://www.dei.unipd.it/~silvello/

Collaborations

University of Verona (IT); University of Venice Ca’ Foscari (IT); Goethe University Frankfurt (DE); Humboldt University of Berlin (DE); University of Edinburgh (UK); National University of Ireland Galway – NUIG (IE); Aalborg University Copenhagen (DK).

Structured Data

Structured and semi-structured data deal with representation of and efficient access to information by exploiting the explicit structure present within it. In this context, we focus on: (i) keyword-based search over structured data; (ii) efficient access to semi-structured and XML data.

Keyword search is the foremost approach for searching information and it has been successfully applied for retrieving non-structured documents. Nonetheless, retrieving information from documents is intrinsically different from querying structured data sources with either an explicit schema, as relational databases or triple stores, or an implicit one, as tables in textual documents and on the Web. Indeed, to access such data we need to use formal and complex query languages. These languages are not end-user oriented at all and require deep knowledge about the structure of the data to express a query. Therefore, keyword-based access to structured data has been gaining a lot of traction both in research and industry as a means for facilitating and making more natural the access to structured data. So far, industrial-grade systems are still lacking, even though the market is seeking them.

We focus on the development of such systems by exploiting techniques coming from both the database and the information retrieval fields as well as the definition and creation of comprehensive experimental evaluation and benchmarking activities for supporting and fostering the development and delivery of industrial-grade systems for accessing structured data through natural language queries.

Efficient access to semi-structured and XML data concerns the development of alternative data models for representing hierarchical data. Instead of using links between nodes or adjacency matrices for representing a tree, we represent a hierarchy as a family of nested sets where the inclusion relationship among sets allows us to represent parent/child relations. This new way of modelling hierarchies allows us to define efficient data structures, which perform the main XPath axis operations (parent/children and ancestor/descendant) in an extremely efficient manner, i.e. two-three orders of magnitude faster than state-of-the-art while requiring more or less the same resources in terms of memory occupation. These improvements open up interesting possibilities where large-scale XML processing is needed, such as digital libraries, digital archives, genome and protein databases and so on.

Contact Person

Nicola Ferro

nicola.ferrounipd.it

http://www.dei.unipd.it/~ferro/

Collaborations

University of Modena and Reggio Emilia (IT).