How to connect to SPARQL endpoint?

Hello everyone,

I want to do some graph analysis and my source data is accessible via a SPARQL endpoint exposed by a Jena Fuseki server.

I’d like to know if there is any package like the SPARQLWrapper package for Python (SPARQLWrapper · PyPI).

Basically, what I want to do is:

  1. Query a SPARQL endpoint
  2. Convert the result into a Data Frame
  3. Do some other stuff that is not relevant here

Is there such a package? I have not been able to find one, yet, but probably I’ve been looking in the wrong place?

Thank you very much in advance for your help.

Kind regards,

Thomas

1 Like

This is an old request with no response (which I take to be a bad sign). However, I’d like to add that I too want to be able to do this. UK Government data is available through a SPARQL endpoint and I’d like access to it from Julia.

Has any progress towards this been made in the last three years?

Thanks!

1 Like

Calling Python from Julia is really easy using PyCall, and myself I like the automatic conda package installation using pyimport_conda.

Rewriting the example at the SPARQLWrapper documentation site would give something like the below.

This installs Julia-“owned” copy of a minimal conda distribution, installs the required conda package (and dependencies). I don’t know how to escape the triple-quotes, so I moved that outside.

using Pkg
Pkg.add(["PyCall", "Conda"])
using PyCall, Conda

# Force install
SPARQLWrapper  = pyimport_conda("SPARQLWrapper", "SPARQLWrapper")

query = """
PREFIX gts: <http://resource.geosciml.org/ontology/timescale/gts#>

SELECT *
WHERE {
    ?a a gts:Age .
}
ORDER BY ?a
LIMIT 3
"""

b = py"""
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(
    "http://vocabs.ardc.edu.au/repository/api/sparql/"
    "csiro_international-chronostratigraphic-chart_geologic-time-scale-2020"
)
sparql.setReturnFormat(JSON)

# gets the first 3 geological ages
# from a Geological Timescale database,
# via a SPARQL endpoint
sparql.setQuery($(query))

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print(r)
except Exception as e:
    print(e)
"""

ret = py"ret"

println("Print from Julia")
values = [d["a"]["value"] for d in ret["results"]["bindings"]]
println(values)
1 Like

That’s really helpful. Thank you.

I’ve never used Python and I’m only a beginner at Julia, but this gives me a really clear way forward to try.

This is an uphill struggle for me! :smiling_face_with_tear:

I’m trying to extract postcodes and their related ward, local authority and parliamentary constituency:

#using Pkg
#Pkg.add(["PyCall", "Conda"])
using DataFrames, PyCall, Conda

# Force install
SPARQLWrapper  = pyimport_conda("SPARQLWrapper", "SPARQLWrapper")

query = """
PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#>
PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?display ?ward ?la ?pcon
WHERE
{
  ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> .
  ?s pcd:postcode1space ?display ;
     geog:ward ?wardcode ;
     geog:localauthoritydistrict ?lacode ;
     geog:parliamentaryconstituency ?pconcode .
  ?wardcode rdfs:label ?ward .
  ?lacode rdfs:label ?la .
  ?pconcode rdfs:label ?pcon .
  filter(?display in ("OX17 2JP","B2 5EE"))
}
LIMIT 10
"""

b = py"""
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(
    "http://statistics.data.gov.uk/sparql"
)
sparql.setReturnFormat(JSON)


sparql.setQuery($(query))

try:
    ret = sparql.queryAndConvert()

    for r in ret["results"]["bindings"]:
        print(r)
except Exception as e:
    print(e)
"""

ret = py"ret"

println("Print from Julia")

values = DataFrame()
values[!, :display] = [d["display"]["value"] for d in ret["results"]["bindings"]]
values[!, :ward] = [d["ward"]["value"] for d in ret["results"]["bindings"]]
values[!, :la] = [d["la"]["value"] for d in ret["results"]["bindings"]]
values[!, :pcon] = [d["pcon"]["value"] for d in ret["results"]["bindings"]]
println(values)

This works (horay!) but I have three questions.

My output at Print from Julia looks like this:

rint from Julia
10×4 DataFrame
 Row │ display   ward       la         pcon      
     │ String    String     String     String
─────┼───────────────────────────────────────────
   1 │ OX17 2JP  E05013260  E06000062  E14000942
   2 │ OX17 2JP  E05013260  E06000062  E14000942
   3 │ OX17 2JP  E05013260  E06000062  E14000942
   4 │ OX17 2JP  E05013260  E06000062  E14000942
   5 │ OX17 2JP  E05013260  E06000062  E14000942
   6 │ OX17 2JP  E05013260  E06000062  E14000942
   7 │ OX17 2JP  E05013260  E06000062  E14000942
   8 │ OX17 2JP  E05013260  E06000062  E14000942
   9 │ B2 5EE    E05011151  E08000025  E14000564
  10 │ B2 5EE    E05011151  E08000025  E14000564

Q1. Why does my query return multiple lines per postcode?
Q2. How can I pass a (possibly very long) list of postcodes into the query?
Q3. How can I extract the output rather more elegantly into the DataFrame?

Thank you for your help!

Making progress. I have answered two of my own questions:

strtry = """ "OX17 2JP", "B2 5EE", "G12 9ER", "AB1 0AG", "SY25 6PY" """

query = """
PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#>
PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX stge: <http://statistics.data.gov.uk/def/statistical-geography#>

SELECT ?display ?ward ?la ?pcon ?reg ?dintr ?dterm
WHERE
{
    ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> .
    ?s pcd:postcode1space ?display ;
        geog:ward ?wardcode ;
        geog:europeanelectoralregion ?regcode ;
        geog:localauthoritydistrict ?lacode ;
        geog:parliamentaryconstituency ?pconcode .
    OPTIONAL {?s <http://statistics.data.gov.uk/def/postcode/unit/dateofintroduction> ?dintr ;
                <http://statistics.data.gov.uk/def/postcode/unit/dateoftermination> ?dterm .}
    ?wardcode stge:officialname ?ward .
    ?lacode stge:officialname ?la .
    ?pconcode stge:officialname ?pcon .
    ?regcode stge:officialname ?reg .
    filter(?display in ($strtry))
}
LIMIT 1000
"""
b = py"""
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://statistics.data.gov.uk/sparql")
sparql.setReturnFormat(JSON)
sparql.setQuery($(query))
try:
    ret = sparql.queryAndConvert()
except Exception as e:
    print(e)
"""

ret = py"ret"

values = DataFrame(display=String[], ward=String[], la=String[], pcon=String[], reg=String[], dintr=Union{Date,Missing}[], dterm=Union{Date,Missing}[])
for (i,d) in enumerate(ret["results"]["bindings"])
    push!(values, ["","","","","",missing,missing])
    values[i, :display] = d["display"]["value"]
    values[i, :ward] = d["ward"]["value"]
    values[i, :la] = d["la"]["value"]
    values[i, :pcon] = d["pcon"]["value"]
    values[i, :reg] = d["reg"]["value"]
    if get!(d, "dintr", "missing") !== "missing"
        values[i, :dintr] = Date(last(d["dintr"]["value"],7), "yyyy-mm")
    end
    if get!(d, "dterm", "missing") !== "missing"
        values[i, :dterm] = Date(last(d["dterm"]["value"],7), "yyyy-mm")
    end
end

println(values)

Which produces exactly what I want:

5×7 DataFrame
 Row │ display   ward                     la                     pcon                    reg            dintr       dterm      
     │ String    String                   String                 String                  String         Date?       Date?
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ B2 5EE    Ladywood                 Birmingham             Birmingham, Ladywood    West Midlands  missing     missing
   2 │ OX17 2JP  Middleton Cheney         West Northamptonshire  South Northamptonshire  East Midlands  missing     missing
   3 │ AB1 0AG   Lower Deeside            Aberdeen City          Aberdeen South          Scotland       1990-12-01  1992-07-01
   4 │ G12 9ER   Partick East/Kelvindale  Glasgow City           Glasgow North           Scotland       missing     missing
   5 │ SY25 6PY  Llangeitho               Ceredigion             Ceredigion              Wales          missing     missing

I am wondering if I could use HTTP.jl to construct this query in native Julia. I’ve tried a number of ways but always unsuccessfully. For example,

query = """QUERY = PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#> PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX stge: <http://statistics.data.gov.uk/def/statistical-geography#> SELECT ?display ?ward ?la ?pcon ?reg ?dintr ?dterm WHERE { ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> . ?s pcd:postcode1space ?display ; geog:ward ?wardcode ; geog:europeanelectoralregion ?regcode ; geog:localauthoritydistrict ?lacode ; geog:parliamentaryconstituency ?pconcode . OPTIONAL {?s <http://statistics.data.gov.uk/def/postcode/unit/dateofintroduction> ?dintr ; <http://statistics.data.gov.uk/def/postcode/unit/dateoftermination> ?dterm .} ?wardcode stge:officialname ?ward . ?lacode stge:officialname ?la . ?pconcode stge:officialname ?pcon . ?regcode stge:officialname ?reg . filter(?display in ("OX17 2JP")) } LIMIT 1000"""

url = raw"http://statistics.data.gov.uk/sparql.json"
response = HTTP.post(url, query = "$query")

(essentially the same as the python formulation) gives:

ERROR: LoadError: HTTP.Exceptions.StatusError(400, "POST", "/sparql.json?QUERY = PREFIX pcd: <http://statistics.data.gov.uk/def/postcode/unit#> PREFIX geog: <http://statistics.data.gov.uk/def/hierarchy/best-fit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX stge: <http://statistics.data.gov.uk/def/statistical-geography#> SELECT ?display ?ward ?la ?pcon ?reg ?dintr ?dterm WHERE { ?s rdf:type <http://statistics.data.gov.uk/def/postcode/unit> . ?s pcd:postcode1space ?display ; geog:ward ?wardcode ; geog:europeanelectoralregion ?regcode ; geog:localauthoritydistrict ?lacode ; geog:parliamentaryconstituency ?pconcode . OPTIONAL {?s <http://statistics.data.gov.uk/def/postcode/unit/dateofintroduction> ?dintr ; <http://statistics.data.gov.uk/def/postcode/unit/dateoftermination> ?dterm .} ?wardcode stge:officialname ?ward . ?lacode stge:officialname ?la . ?pconcode stge:officialname ?pcon . ?regcode stge:officialname ?reg . filter(?display in (\"OX17 2JP\")) } LIMIT 1000", HTTP.Messages.Response:
"""
HTTP/1.0 400 Bad Request
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Content-Length: 273
Date: Fri, 20 Oct 2023 18:09:57 GMT


<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>400 Bad Request</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Bad Request</h1>
<h2>Your client has issued a malformed or illegal request.</h2>
<h2></h2>
</body></html>
""")

What do I need to do different?