GeoE3 Task 2.3: Semantic Search

Introduction

The following environment serves to both outline the requirements and process required to implement a semantic search on existing metadata records and to provide a demonstration of the potential that enriching metadata with more extensive keyword lists has for improving the semantic searchability of digital resources in national geoinformation catalogues across Europe. As a demonstration environment, the results presented in the following sections represent first results of implementation in the context of Task 2.3 of the GeoE3 project, focused on improving semantinc searchability.

Requirements and Implementation Steps

In order to carry out a semantic enrichment of metadata as performed in the proof of concept demonstrated below, the following is required in order to successfully implement this solution:

  • Each metadata record for a given dataset should be available as linked data. Many national catalogues already make metadata records available as linked data in the DCAT-AP profile but, alternatively, a transformation script using the python library rdflib could be used to run a transformation and the DCAT-AP should be used here too.
  • During the transformation to linked data, each metadata record should be enriched by adding keywords to the existing keyword list. For these additional keywords, it is suggested that the data model classes, properties and related themes are added to the list of keywords. Wherever possible, make use of a keyword which features in the GEMET thesaurus. In the context of Task 2.3 but this could be automated in the future.

For insight into how a metadata record should look following these steps, the following example is useful: BAG_NL_Enriched

Tooling

The proof of concept exercise carried out for this demonstration environment has made exclusive use of the features and functionality of TriplyDB. It is possible to make use of this environment for testing purposes. For this please contact the content provider for your module. Using this triplestore and a triplestore in general, the following steps should be performed:

  • All metadata records should be uploaded as a single, queryable graph. This can be done manually but could be automated. Please refer to the triply documentation for this.
  • The GEMET thesaurus should be added to this graph
  • An ElasticSearch service started on this dataset
  • Optional: A SPARQL service could be started to run the SPARQL queries available in the following section.

For a complete overview of how to perform these steps using the TriplyDB interface, please refer to the TriplyDB documentation.

The simplicity of using this tool for semantic search can be tested here. Included in the underlying dataset are metadata records from various countries including Finland, the Netherlands and France among others.

Proof of Concept

During the GeoE3 task, the requirements and implementation steps were successfully carried out for a range of European countries. In addition to testing the semantic searchability of these metadata records using the previously mentioned ElasticSearch tool, the following figures aim to highlight how adding a few keywords to an existing metadata record can significantly improve their searchability.

These tables are visualisations of simple SPARQL queries across the range of metadata records which can be seen live by clicking on 'Try this query yourself' should you wish to implement this yourself.

The following table provides an overview of all countries that include a keyword associated with 'address' in their metadata. These keywords have, in some cases, been added to existing metadata records based on a manual enrichment step for the purpose of highlighting the potential that metadata enrichment has for the semantic searchability of datasets.

Table 1. A list of partner countries with keywords corresponding to the GEMET vocabulary value Address

The following table provides an overview of all countries that include a keyword associated with 'building' in their metadata. These keywords have, in some cases, been added to existing metadata records based on a manual enrichment step for the purpose of highlighting the potential that metadata enrichment has for semantic searchability of datasets.

Table 2. A list of partner countries with keywords corresponding to the GEMET vocabulary value Building