Explaining Data Patterns using Knowledge from the Web of Data
Knowledge Discovery (KD) is a field with a long tradition, aimed at developing methodologies to detect hidden patterns and regularities in large datasets, using techniques from a wide range of domains, such as statistics, machine learning, pattern recognition and data visualisation. In most real world contexts, the interpretation and explanation of the discovered patterns is left to human experts, who use their background knowledge to analyse, refine and make the patterns understandable for the intended purpose. Explaining patterns is therefore an intensive and time-consuming process, where parts of the knowledge can remain unrevealed, especially when the experts lack some of the required background knowledge.
In this publication, the author investigates the hypothesis that such an interpretation process can be facilitated by introducing background knowledge from the Web of (Linked) Data. In the last decade, many areas started publishing and sharing their domain-specific knowledge in the form of structured data, with the objective of encouraging information sharing, reuse and discovery. The author’s view is that with a constantly increasing amount of shared and connected knowledge, the process of explaining patterns can become easier, faster, and more automated.
To demonstrate this, Dedalo was developed: a framework that automatically provides explanations for patterns of data using background knowledge extracted from the Web of Data. The author studied the elements required for a piece of information to be considered an explanation, identified the best strategies to automatically find the right piece of information in the Web of Data, and designed a process able to produce explanations to a given pattern using the background knowledge autonomously collected from the Web of Data.
The final evaluation of Dedalo involved users within an empirical study based on a real-world scenario. The author has demonstrated that the explanation process is complex when one is not familiar with the domain of usage, but also that this can be simplified considerably by using the Web of Data as a source of background knowledge.
The author, Ilaria Tiddi, has won the SWSA Distinguished Dissertation Award 2017 for this publication.