Dataset Characteristics Identification for Federated SPARQL Query

Nur Aini Rakhmawati, Lutfi Nur Fadzilah


Nowadays, the amount of data published in the RDF format is increasing. Federated SPARQL query engines that can query from multiple distributed SPARQL endpoints have been developed recently. A federated query engine usually has different performance compared to the others. One of the factors that affect the performance of the query engine is the characteristic of the accessed RDF dataset, such as the number of triples, the number of classes, the number of properties, the number of subjects, the number of entities, the number of objects, and the spreading factor of a dataset. The aim of this work is to identify the characteristic of RDF dataset and create a query set for evaluating a federated engine.  The study was conducted by identifying 16 datasets that used by ten research papers in Linked Data area.


Federated SPARQL query, dataset, benchmark

Full Text:



F. Gandon and G. Schreiber, “RDF 1.1 XML Syntax,†W3C Recommendation, Feb. 2014,

S. Harris and A. Seaborne (eds), “SPARQL 1.1 query language,†W3C, Working Draft, 2013.

O. Go¨rlitz, M. Thimm, and S. Staab, “Splodge: Systematic generation of sparql benchmark queries for linked open data,†in International Semantic Web Conference (1), 2012, pp. 116–132.

A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt, “Fedx: a federation layer for distributed query processing on linked open data,†in Extended Semantic Web Conference. Springer, 2011, pp. 481–486.

M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, and E. Ruckhaus, “Anapsid: an adaptive query processing engine for sparql endpoints,†The Semantic Web–ISWC 2011, pp. 18–34, 2011.

S. Lynden, I. Kojima, A. Matono, and Y. Tanimura, “Aderis: An adaptive query processor for joining federated sparql endpoints,†On the Move to Meaningful Internet Systems: OTM 2011, pp. 808–817, 2011.

N. A. Rakhmawati, M. Karnstedt, M. Hausenblas, and S. Decker, “On metrics for measuring fragmentation of federation over sparql endpoints.†in WEBIST (1), 2014, pp. 119–126.

G. Montoya, M.-E. Vidal, O. Corcho, E. Ruckhaus, and C. Buil-Aranda, “Benchmarking federated sparql query engines: Are existing testbeds enough?†in International Semantic Web Conference. Springer, 2012, pp. 313–324.

M. Schmidt, O. Go¨rlitz, P. Haase, G. Ladwig, A. Schwarte, and T. Tran, “Fedbench: A benchmark suite for federated semantic data query processing,†The Semantic Web–ISWC 2011, pp. 585–600, 2011.

H. Wu, T. Fujiwara, Y. Yamamoto, J. Bolleman, and A. Yamaguchi, “Biobenchmark toyama 2012: an evaluation of the performance of triple stores on biological data,†Journal of biomedical semantics, vol. 5, no. 1, p. 32, 2014.

N. A. Rakhmawati, M. Saleem, S. Lalithsena, and S. Decker, “Qfed: Query set for federated sparql query benchmark,†in Proceedings of the 16th International Conference on Information Integration and Web- based Applications & Services. ACM, 2014, pp. 207–211.

M. Saleem, Q. Mehmood, and A.-C. N. Ngomo, “Feasible: A feature- based sparql benchmark generation framework,†in International Semantic Web Conference. Springer, 2015, pp. 52–69.

S. Duan, A. Kementsietsidis, K. Srinivas, and O. Udrea, “Apples and oranges: a comparison of rdf benchmarks and real rdf datasets,†in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 2011, pp. 145–156.

P. Westphal, C. Stadler, and J. Pool, “Countering language attrition with panlex and the web of data,†Semantic Web, vol. 6, no. 4, pp. 347–353, 2015.

H. Wu, B. Villazon-Terrazas, J. Z. Pan, and J. M. Gomez-Perez, “How redundant is it?-an empirical analysis on linked datasets,†in Proceedings of the 5th International Conference on Consuming Linked Data-Volume 1264. CEUR-WS. org, 2014, pp. 97–108.

K. M. Endris, S. Faisal, F. Orlandi, S. Auer, and S. Scerri, “irap- an interest-based rdf update propagation framework.†in International Semantic Web Conference (Posters & Demos), 2015.

A. Adamou, M. d’Aquin, H. Barlow, and S. Brown, “Led: curated and crowdsourced linked data on music listening experiences,†in Proceedings of the 2014 International Conference on Posters & Demonstrations Track-Volume 1272. CEUR-WS. org, 2014, pp. 93–96.

G. de Melo, “Lexvo. org: Language-related information for the linguis- tic linked data cloud,†Semantic Web, vol. 6, no. 4, pp. 393–400, 2015.

J. Debattista, S. Auer, and C. Lange, “Luzzu–a framework for linked data quality assessment,†in Semantic Computing (ICSC), 2016 IEEE Tenth International Conference on. IEEE, 2016, pp. 124–131.

J. Baier, D. Daroch, J. L. Reutter, and D. Vrgoc, “Property paths over linked data: Can it be done and how to start?†[email protected] ISWC, 2016.

A. Basharat, B. Abro, I. B. Arpinar, and K. Rasheed, “Semantic hadith: Leveraging linked data opportunities for islamic knowledge.†in [email protected] WWW, 2016.

K. R. Kurte, S. S. Durbha, R. L. King, N. H. Younan, and R. Vatsavai, “Semantics-enabled framework for spatial image information mining of linked earth observation data,†IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 1, pp. 29–44, 2017.

Y. Hu, K. Janowicz, P. Hitzler, and K. Sengupta, “The semantic web journal as linked data.†in International Semantic Web Conference (Posters & Demos), 2015.

“Convert GeoNames RDF dump format into ntriples,â€Available:

R. Cyganiak, “An RDF schema and associated documentation for expressing metadata about RDF datasets,†available at: archive/google-code-wiki/ [On- line].



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.