Course 6: Scaling up Linked Data

This learning pathway describes how Linked Data applications can be scaled up to deal with very large volumes of Linked Data.

You can study the materials of this learning pathway at your own pace, as there is no predetermined start or end date.

1. Learning outcomes

By the end of this learning pathway you should have an understanding of:

  • What is meant by the term Big Data and the relationship between Big Data and Linked Data.
  • NoSQL databases and how they can be used to managed large volumes of Linked Data.
  • The Hadoop framework and how it can be applied to large scale RDF reasoning.
  • How Linked Data streams, such as sensor data, can be processed.
  • Some current research initiatives related to working with very large datasets.

2. Introduction to Big Linked Data & NoSQL Databases

Learn about Big Linked Data and how NoSQL databases can support the storage and querying of RDF data.

Watch Part I of the webinar 'Scaling Up Linked Data' (56 minutes):

View the slides of this webinar:

Read Parts I & II of Chapter 6 'Scaling Up Linked Data':

HTML

iBook

ePUB

Kindle

3. Hadoop, Stream Processing & Scaling up even further

Learn about a large-scale approach to reasoning with the use of Hadoop, how you can deal with streams of data, as well as some case studies in the successful application of Linked Data to Big Data scenarios.

Watch Part II of the webinar 'Scaling Up Linked Data' (59 minutes):

View the slides of this webinar:

Read Parts III, IV & V of Chapter 6 'Scaling Up Linked Data':

HTML

iBook

ePUB

Kindle

4. Further reading

If you are interested in more learning materials and resources about scaling up Linked Data, here are some suggestions that are relevant to this particular pathway:

 
 
 
 
 
 
 
 
[9] Neumann, T. and Weikum, G. (2010). x-rdf-3x: Fast querying, high update rates, and consistency for rdf databases. PVLDB, 3 (1), 256-263.
 
[10] Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W. and Liu, L. (2013). TripleBit: a Fast and Compact System for Large Scale RDF Data. PVLDB, 6 (7), 517-528.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[27] De Abreu, D., Flores, A., Palma, G., Pestana, V., Piñero, J., Queipo, J., Sánchez, J. and Vidal, M. (2013). Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data. ISWC 2013 Workshop on Consuming Linked Data. Sydney, Australia.
 
 
 
[30] Punnoose, R, Crainiceanu, A., Rapp D. (2012). Rya: A Scalable RDF Triple Store for the Clouds. Cloud-I ’12, Istanbul, Turkey.
 
 
[32] Cudré-Mauroux, P. Enchev, I., Fundatureanu, S. et al. (2013) NoSQL Databases for RDF: An Empirical Evaluation. International Semantic Web Conference (ISWC 2013), Sydney, Australia.
 
[33] Bizer, C., Schultz, A. (2009). The Berlin SPARQL Benchmark. International Journal on Semantic Web and Information Systems (IJSWIS), 5 (2), 1–24.
 
[34] Urbani, J., Kotoulas, S., Oren, E. and van Harmelen, F. (2009). Scalable Distributed Reasoning with MapReduce. International Semantic Web Conference (ISWC 2009), Washington, DC.
 
[35] Huang, J., Abadi, D. J. and Ren, K. (2011). Scalable SPARQL querying of large RDF graphs. PVLDB, 4 (11), 1123-1134.
 
[36] Urbani, J. (2013). Three Laws Learned from Web-scale Reasoning. AAAI Symposium on Semantics for Big Data. Arlington, VA.
 
[37] Rudolph, S., Tserendorj, T., Hitzler, P. (2008). What is approximate reasoning? 2nd International Conference on Web Reasoning and Rule Systems (RR2008), Karlsruhe, Germany.
 
[38] Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: Semantic foundations and query execution. Technical report, Stanford University.
 
[39] Barbieri, D. F., et al.  (2010). Querying RDF streams with C-SPARQL. ACM SIGMOD Record, 39 (1), 20-26.
 
[40] Balduini, M. et al. (2013). Tutorial on Stream Reasoning for Linked Data. Tutorial at International Semantic Web Conference (ISWC’2013), Sydney, Australia.
 
[41] Calbimonte, J-P. and Corcho, O. (2013). SPARQLStream: Ontology-based access to data streams. Tutorial at International Semantic Web Conference (ISWC’2013), Sydney, Australia
 
 
[43] Heino, N. and Pan, J. (2012). RDFS Reasoning on Massively Parallel Hardware. International Semantic Web Conference, Boston, MA, USA.
 
 
 
[46] Angles, R., Prat-Perez, A., Dominguez-Sal, D. and Larriba-Pey, J.-L. (2013). Benchmarking database systems for social network applications. International Workshop on Graph Data Management Experience and Systems (GRADES 2013), New York, NY, USA.
 
 [47] M. Dayarathna and T. Suzumura. Xgdbench: A benchmarking platform for graph stores in exascale clouds. International Conference on Cloud Computing Technology and Science (CloudCom), Taipei, Taiwan.