International Workshop on Profiling and Searching Data on the Web

April 24, 2018, Lyon, France. Co-located with The Web Conference '2018


Maarten de Rijke,
University of Amsterdam
Aidan Hogan,
Universidad de Chile

Objectives and Goals

The web of data has seen tremendous growth recently. New forms of structured data have emerged in the form of web markup, such as, and a large amount of data in web tables. Considering these rich, heterogeneous and evolving data sources which cover a wide variety of domains, the exploitation of web data becomes increasingly important in the context of various applications, including (federated) search, question answering and fact verification.

The objective of this workshop is to bring together researchers and practitioners interested in the development of data search techniques, data profiling, and dataset retrieval on the web. This includes looking at the specifics of data-centric information seeking behaviour, understanding interaction challenges in data search on the web, and analysing the cognitive processes involved in the consumption of structured data by users. At the same time we aim to discuss technologies addressing data search – including semantics, information retrieval for web data (ranking algorithms and indexing), in particular in the context of decentralised and distributed systems, such as the web. We are interested in approaches to analyse, characterise and discover data sources. We want to facilitate a discussion around data search across formats and domain-specific applications.

We envision the workshop as a forum for researchers and practitioners to come together and discuss common challenges and identify synergies for joint initiatives. We welcome contributions describing technical approaches, as well as those related to Human Computer Interaction research in data discovery, profiling and retrieval.

Topics and Themes

PROFILES & DATA:SEARCH ’18 seeks application-oriented papers, as well as more theoretical papers and position papers. The workshop proposes a multidisciplinary discussion on the following themes, with a focus on RDF, CSV, JSON and other structured and semi-structured datasets:

Data Profiling

Human Data Interaction

We are interested in contributions using a variety of methods. This can include, for example, user studies, lab experiments, system based evaluation, but also experiments using gamification and crowdsourcing.

Submission Guidelines

We welcome the following types of contributions:

We encourage full papers (8 pages), short papers (4 pages) as well as position papers (2 pages). All submissions must be written in English and must be formatted according to the ACM format. The proceedings of the workshop will be included in the companion proceedings of The WebConf2018. Each submission will be reviewed by at least 2 members of the PC. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. Please submit your contributions electronically in PDF format via the Easychair system:

We follow a single-blind process with at least two reviewers per paper. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop.

Important Dates

Workshop paper submissions due: 24 January 6 February 2018

Workshop paper notifications sent: 14 February 2018

Camera-ready copies due: 01 March 2018

PROFILES & DATA:SEARCH Workshop: 24 April 2018

Tentative Schedule

09:00 – 09:10Introduction & welcome
09:10 – 09:20Opening
09:20 – 10:20Keynote talk Maarten de Rijke
Learning to Search for Datasets
Over the years, search engines have developed to return a broad range of retrievable items, from documents to people, locations, and products. Research datasets are being turned in retrievable items too. This raises a number of interesting challenges. Starting from the user end (What do users want from datasets?) to  increasing the retrievability of datasets (What kind of contextual information is available to enrich datasets so as to make the more easily retrieval?) to optimizing rankers for datasets in the absence of large volumes of interaction data (How can we train learning to rank datasets algorithms in weakly supervised ways?). In the talk I will survey recent progress in these three areas and identify important open problems.
10:20 – 11:00 Break
11:00 – 12:20 Paper presentations
  • Zhiyu Chen, Haiyan Jia, Jeff Heflin and Brian Davison
    Generating Schema Labels through Dataset Content Analysis
    (11:00 - 11:20)
  • Semih Yumuşak, Andreas Kamilaris, Erdogan Dogdu, Halife Kodaz, Elif Uysal and Riza Emre Aras
    A Discovery and Analysis Engine for Semantic Web
    (11:20 - 11:40)
  • Sean Soderman, Anusha Kola, Maxim Podkorytov, Michel Geyer and Michael Gubanov
    Hybrid.AI: A Learning Search Engine for Large-scale Structured Data
    (11:40 - 12:00)
  • Emilia Kacprzak, Laura Koesten, Jeni Tennison and Elena Simperl
    Characterising Dataset Search Queries
    (12:00 - 12:15)( Short paper)
12:20 – 13:40 Lunch break
13:40 – 14:40Keynote talk Aidan Hogan
Profiling Graphs: Order from Chaos
Graphs are being increasingly adopted as a flexible data model in scenarios (e.g., Google’s Knowledge Graph, Facebook’s Graph API, Wikidata, etc.) where multiple editors are involved in content creation, where the schema is ever changing, where data are incomplete, where the connectivity of resources plays a key role—scenarios where relational models traditionally struggle. But with this flexibility comes a conceptual cost: it can be difficult to summarise and understand, at a high level, the content that a given graph contains. Hence profiling graphs becomes of increasing importance to extract order, a posteriori, from the chaotic processes by which such graphs are often generated. This talk will motivate the use of graphs as a data model, abstract recent trends in graph data management, and then turn to the issue of profiling graphs: what are the goals of such profiling, the principles by which graphs can be summarised, the main techniques by which this can/could be achieved? The talk will emphasise the importance of profiling graphs while highlighting a variety of open research questions yet to be tackled.
14:40 – 15:00 Paper presentation
  • Mohamed Ben Ellefi, Odile Papini, Djamal Merad, Jean-Marc Boi, Jean-Philip Royer, Jérôme Pasquet, Jean-Christophe Sourisseau, Filipe Castro, Mohamad Motasem Nawaf and Pierre Drap
    Cultural Heritage Resources Profiling: Ontology-based Approach (14:40-15:00)
15:00 – 15:40 Coffee break
15:40 – 15:55 Paper presentation
  • Sebastian Neumaier, Lőrinc Thurnay, Thomas J. Lampoltshammer and Tomáš Knap
    Search, Filter, Fork, and Link Open Data - The ADEQUATe platform: data- and community-driven quality improvements
    (15:40-15:55) (Short paper)
15:55 – 16:50 Panel discussion with Paul Groth, Aidan Hogan, and Jeni Tennison
16:50 – 17:00 Summary of discussions, wrap up

Chairs and Organizers

Program Committee

Organization Committee

Laura Koesten, Open Data Institute and University of Southampton.

Dr. Elena Demidova, L3S Research Center (Hannover, Germany).

Dr. Vadim Savenkov, Vienna University of Economics and Business.

Dr. John Breslin, National University of Ireland Galway.

Prof. Oscar Corcho, Universidad Politécnica de Madrid.

Dr. Stefan Dietze, L3S Research Center (Hannover, Germany).

Prof. Elena Simperl, University of Southampton.