How to connect to SPARQL endpoint?

Hello everyone,

I want to do some graph analysis and my source data is accessible via a SPARQL endpoint exposed by a Jena Fuseki server.

I’d like to know if there is any package like the SPARQLWrapper package for Python (SPARQLWrapper · PyPI).

Basically, what I want to do is:

  1. Query a SPARQL endpoint
  2. Convert the result into a Data Frame
  3. Do some other stuff that is not relevant here

Is there such a package? I have not been able to find one, yet, but probably I’ve been looking in the wrong place?

Thank you very much in advance for your help.

Kind regards,

Thomas

This is an old request with no response (which I take to be a bad sign). However, I’d like to add that I too want to be able to do this. UK Government data is available through a SPARQL endpoint and I’d like access to it from Julia.

Has any progress towards this been made in the last three years?

Thanks!

Calling Python from Julia is really easy using PyCall, and myself I like the automatic conda package installation using pyimport_conda.

Rewriting the example at the SPARQLWrapper documentation site would give something like the below.

This installs Julia-“owned” copy of a minimal conda distribution, installs the required conda package (and dependencies). I don’t know how to escape the triple-quotes, so I moved that outside.

using Pkg
Pkg.add(["PyCall", "Conda"])
using PyCall, Conda

# Force install
SPARQLWrapper  = pyimport_conda("SPARQLWrapper", "SPARQLWrapper")

query = """
PREFIX gts: <http://resource.geosciml.org/ontology/timescale/gts#>

SELECT *
WHERE {
    ?a a gts:Age .
}
ORDER BY ?a
LIMIT 3
"""

b = py"""
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(
    "http://vocabs.ardc.edu.au/repository/api/sparql/"
    "csiro_international-chronostratigraphic-chart_geologic-time-scale-2020"
)
sparql.setReturnFormat(JSON)

# gets the first 3 geological ages
# from a Geological Timescale database,
# via a SPARQL endpoint
sparql.setQuery($(query))

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print(r)
except Exception as e:
    print(e)
"""

ret = py"ret"

println("Print from Julia")
values = [d["a"]["value"] for d in ret["results"]["bindings"]]
println(values)

That’s really helpful. Thank you.

I’ve never used Python and I’m only a beginner at Julia, but this gives me a really clear way forward to try.

This is an uphill struggle for me! :smiling_face_with_tear:

I’m trying to extract postcodes and their related ward, local authority and parliamentary constituency:

#using Pkg
#Pkg.add(["PyCall", "Conda"])
using DataFrames, PyCall, Conda

# Force install
SPARQLWrapper  = pyimport_conda("SPARQLWrapper", "SPARQLWrapper")

query = """
PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#>
PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?display ?ward ?la ?pcon
WHERE
{
  ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> .
  ?s pcd:postcode1space ?display ;
     geog:ward ?wardcode ;
     geog:localauthoritydistrict ?lacode ;
     geog:parliamentaryconstituency ?pconcode .
  ?wardcode rdfs:label ?ward .
  ?lacode rdfs:label ?la .
  ?pconcode rdfs:label ?pcon .
  filter(?display in ("OX17 2JP","B2 5EE"))
}
LIMIT 10
"""

b = py"""
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(
    "http://statistics.data.gov.uk/sparql"
)
sparql.setReturnFormat(JSON)


sparql.setQuery($(query))

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print(r)
except Exception as e:
    print(e)
"""

ret = py"ret"

println("Print from Julia")

values = DataFrame()
values[!, :display] = [d["display"]["value"] for d in ret["results"]["bindings"]]
values[!, :ward] = [d["ward"]["value"] for d in ret["results"]["bindings"]]
values[!, :la] = [d["la"]["value"] for d in ret["results"]["bindings"]]
values[!, :pcon] = [d["pcon"]["value"] for d in ret["results"]["bindings"]]
println(values)

This works (horay!) but I have three questions.

My output at Print from Julia looks like this:

rint from Julia
10×4 DataFrame
 Row │ display   ward       la         pcon      
     │ String    String     String     String
─────┼───────────────────────────────────────────
   1 │ OX17 2JP  E05013260  E06000062  E14000942
   2 │ OX17 2JP  E05013260  E06000062  E14000942
   3 │ OX17 2JP  E05013260  E06000062  E14000942
   4 │ OX17 2JP  E05013260  E06000062  E14000942
   5 │ OX17 2JP  E05013260  E06000062  E14000942
   6 │ OX17 2JP  E05013260  E06000062  E14000942
   7 │ OX17 2JP  E05013260  E06000062  E14000942
   8 │ OX17 2JP  E05013260  E06000062  E14000942
   9 │ B2 5EE    E05011151  E08000025  E14000564
  10 │ B2 5EE    E05011151  E08000025  E14000564

Q1. Why does my query return multiple lines per postcode?
Q2. How can I pass a (possibly very long) list of postcodes into the query?
Q3. How can I extract the output rather more elegantly into the DataFrame?

Thank you for your help!

Making progress. I have answered two of my own questions:

strtry = """ "OX17 2JP", "B2 5EE", "G12 9ER", "AB1 0AG", "SY25 6PY" """

query = """
PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#>
PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX stge: <http://statistics.data.gov.uk/def/statistical-geography#>

SELECT ?display ?ward ?la ?pcon ?reg ?dintr ?dterm
WHERE
{
    ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> .
    ?s pcd:postcode1space ?display ;
        geog:ward ?wardcode ;
        geog:europeanelectoralregion ?regcode ;
        geog:localauthoritydistrict ?lacode ;
        geog:parliamentaryconstituency ?pconcode .
    OPTIONAL {?s <http://statistics.data.gov.uk/def/postcode/unit/dateofintroduction> ?dintr ;
                <http://statistics.data.gov.uk/def/postcode/unit/dateoftermination> ?dterm .}
    ?wardcode stge:officialname ?ward .
    ?lacode stge:officialname ?la .
    ?pconcode stge:officialname ?pcon .
    ?regcode stge:officialname ?reg .
    filter(?display in ($strtry))
}
LIMIT 1000
"""
b = py"""
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://statistics.data.gov.uk/sparql")
sparql.setReturnFormat(JSON)
sparql.setQuery($(query))
try:
    ret = sparql.queryAndConvert()
except Exception as e:
    print(e)
"""

ret = py"ret"

values = DataFrame(display=String[], ward=String[], la=String[], pcon=String[], reg=String[], dintr=Union{Date,Missing}[], dterm=Union{Date,Missing}[])
for (i,d) in enumerate(ret["results"]["bindings"])
    push!(values, ["","","","","",missing,missing])
    values[i, :display] = d["display"]["value"]
    values[i, :ward] = d["ward"]["value"]
    values[i, :la] = d["la"]["value"]
    values[i, :pcon] = d["pcon"]["value"]
    values[i, :reg] = d["reg"]["value"]
    if get!(d, "dintr", "missing") !== "missing"
        values[i, :dintr] = Date(last(d["dintr"]["value"],7), "yyyy-mm")
    end
    if get!(d, "dterm", "missing") !== "missing"
        values[i, :dterm] = Date(last(d["dterm"]["value"],7), "yyyy-mm")
    end
end

println(values)

Which produces exactly what I want:

5×7 DataFrame
 Row │ display   ward                     la                     pcon                    reg            dintr       dterm      
     │ String    String                   String                 String                  String         Date?       Date?
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ B2 5EE    Ladywood                 Birmingham             Birmingham, Ladywood    West Midlands  missing     missing
   2 │ OX17 2JP  Middleton Cheney         West Northamptonshire  South Northamptonshire  East Midlands  missing     missing
   3 │ AB1 0AG   Lower Deeside            Aberdeen City          Aberdeen South          Scotland       1990-12-01  1992-07-01
   4 │ G12 9ER   Partick East/Kelvindale  Glasgow City           Glasgow North           Scotland       missing     missing
   5 │ SY25 6PY  Llangeitho               Ceredigion             Ceredigion              Wales          missing     missing

I am wondering if I could use HTTP.jl to construct this query in native Julia. I’ve tried a number of ways but always unsuccessfully. For example,

query = """QUERY = PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#> PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX stge: <http://statistics.data.gov.uk/def/statistical-geography#> SELECT ?display ?ward ?la ?pcon ?reg ?dintr ?dterm WHERE { ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> . ?s pcd:postcode1space ?display ; geog:ward ?wardcode ; geog:europeanelectoralregion ?regcode ; geog:localauthoritydistrict ?lacode ; geog:parliamentaryconstituency ?pconcode . OPTIONAL {?s <http://statistics.data.gov.uk/def/postcode/unit/dateofintroduction> ?dintr ; <http://statistics.data.gov.uk/def/postcode/unit/dateoftermination> ?dterm .} ?wardcode stge:officialname ?ward . ?lacode stge:officialname ?la . ?pconcode stge:officialname ?pcon . ?regcode stge:officialname ?reg . filter(?display in ("OX17 2JP")) } LIMIT 1000"""

url = raw"http://statistics.data.gov.uk/sparql.json"
response = HTTP.post(url, query = "$query")

(essentially the same as the python formulation) gives:

ERROR: LoadError: HTTP.Exceptions.StatusError(400, "POST", "/sparql.json?QUERY = PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#> PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX stge: <http://statistics.data.gov.uk/def/statistical-geography#> SELECT ?display ?ward ?la ?pcon ?reg ?dintr ?dterm WHERE { ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> . ?s pcd:postcode1space ?display ; geog:ward ?wardcode ; geog:europeanelectoralregion ?regcode ; geog:localauthoritydistrict ?lacode ; geog:parliamentaryconstituency ?pconcode . OPTIONAL {?s <http://statistics.data.gov.uk/def/postcode/unit/dateofintroduction> ?dintr ; <http://statistics.data.gov.uk/def/postcode/unit/dateoftermination> ?dterm .} ?wardcode stge:officialname ?ward . ?lacode stge:officialname ?la . ?pconcode stge:officialname ?pcon . ?regcode stge:officialname ?reg . filter(?display in (\"OX17 2JP\")) } LIMIT 1000", HTTP.Messages.Response:
"""
HTTP/1.0 400 Bad Request
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Content-Length: 273
Date: Fri, 20 Oct 2023 18:09:57 GMT


<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>400 Bad Request</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Bad Request</h1>
<h2>Your client has issued a malformed or illegal request.</h2>
<h2></h2>
</body></html>
""")

What do I need to do different?