Data Science curriculum session at EDF2013

Data Science Curriculum for Professional Session at the European Data Forum

We had a very successful session on a data science curriculum for professionals at the European Data Forum in Dublin this week. The session was a joint venture between the Euclid and PlanetData projects, was chaired by John Domingue and included the following themes:

  • John Domingue (The Open University) spoke about the need also for learning materials for the wider community to ensure citizen engagement around the main issues, including privacy, associated with Big Data. He also discussed the need for a constructivist learning approach based upon sound pedagogical theories and supported by easy-to-use environments. Slides available at:
  • Nick Campbell (Trinity College Dublin) spoke about his wide experiences in speech understanding and its relationship to Big Data especially from the perspective of privacy. Slides available at:
  • Barry Norton (Ontotext) gave an overview of the main principles underlying the teaching approach adopted in Euclid:
    • Show realistic solutions – Euclid’s learning materials are based around a data set containing more than 200M triples at the moment and Euclid aims to grow this to around 0.5B triples.
    • Use real data – Euclid has adopted the MusicBrainz dataset which is part of the Linked Open Data Cloud and is used in a number of real applications.
    • Use real tools – which are explained in screencasts, webinars as well as being available for students to interact with. We also include the latest W3C standards such as R2RML in our portfolio.
    • Show scalable solutions – all the technologies we show work at the 100M+ triples scale. Moreover we make these available to students within our iBook formats integrated with the multi-media learning material.
    • Eat the dog’s food – Euclid monitors community reaction to our material on public email lists, on Twitter, in LinkedIn, SlideShare and Vimeo, aggregates the results as Linked Data and provides the results in a public SPARQL endpoint.

Slides available at:

Barry Norton presenting the five main principles underlying Euclid’s educational approach

Barry Norton presenting the five main principles underlying Euclid’s educational approach. Marko Grobelnik and John Domingue are relaxing on the panel sofa.

  • In a timely fashion Marko Grobelnik gave an overview of the main concepts associated with data science based on the cube shown below. Data modalities represent different data formats which vary from rich representations based on ontologies to raw data associated with signals. Data operators capture different approaches which exist to how one can manage and manipulate the data to support understanding and decision making. Additional issues incorporate other essential criteria required to ensure that the mechanisms work in practice. Marko stressed in his presentation the issue that at present there is little collaboration and communication between the research disciplines associated with the cube. Part of this is due to the different viewpoints and languages used in each distinct research area. He gave as an example how he only managed to get statisticians to listen to him when he mentioned that semantics can de-sparsify data. Slides available at:

Data Science Concept Cube
The Data Science Concept Cube shown by Marko Grobelnik.

The session was well attended and after the presentations a lively debate followed which, on audience insistence, took 20 minutes of the 30 minute coffee break. Our Project Officer Stefano Bertolo was in the audience and he tweeted positively about Barry’s (“….@BarryNorton makes me weep of joy when he explains what it takes to teach scalanility (sic) …”) and Marko’s (“Semantics de-sparsify data says @marko_grobelnik and that's why it should be in machine learning curricula…”) presentations resulting in several immediate requests for the slides.

The audience at the Data Science Curricula session at EDF
The audience at the Data Science Curricula session at EDF.

Wednesday, April 10th 2013 - 8:00 AM (GMT +01:00) (GM +01:00)