Using SimpleQueryMachine to perform SPARQL queries

A common choice of machine to use with the PeruserTransformer is a class called SimpleQueryMachine , which is part of the "selector" module.  It provides an easy pathway to configure a simple SPARQL of an RDF model query using the cocoon sitemap.

The peruser root sitemap (in PERUSER_SRC/conf/cocoon/sitemap/sitemap.xmap) defines a transformer named pselsq, which uses SimpleQueryMachine.  The configuration looks like this:

            <!-- Configure a Peruser-Selector-SimpleQuery(Machine)transformer, which
               can be used to execute simple SPARQL queries in any child sitemap.-->
            <map:transformer name="pselsq" logger="net.peruser" src="net.peruser.binding.cocoon.PeruserTransformer">
                <pm:machine>
                    <pm:cuteName>selector_simple_query_machine</pm:cuteName>
                    <pm:class>net.peruser.module.selector.SimpleQueryMachine</pm:class>
                </pm:machine>
            </map:transformer>

This transformer may be used by reference in any child sitemap (i.e. the sitemap.xmap file in any subdirectory of PERUSER_SRC/app).   So, to do simple SPARQL queries, there is no need for you to define your own transformer type.    For example, the toolchest code performs a simple sparql query using the following code (see toolchest/sitemap.xmap):

                <map:transform type="pselsq" src="peruser_uri_scheme:SIMPLE_TOOL_QUERY">
                    <!-- We choose the file based on HTTP request param named "queryFile" -->
                    <map:parameter name="peruser_uri_scheme:queryFile" value="sparql/{request-param:queryFile}"/>
                    <map:parameter name="peruser_uri_scheme:dataFile" value="rdf/sw_tools_070619_utf8.rdf"/>
                    <!-- Jena parser likes to be told what model format it is parsing -->
                    <map:parameter name="peruser_uri_scheme:dataFormat" value="RDF/XML"/>
                    <!-- Base URI is unimportant in this example -->
                    <map:parameter name="peruser_uri_scheme:dataBaseURI" value="http://www.peruser.net/phonyBaseURI#"/>
                </map:transform>

Note that in a cocoon context, the XML input will come from the previous stage of the cocoon transformation pipeline, so models may be constructed in such a pipeline  (from files, using XSLT, SQL, previous SPARQL stages, etc.) and then queried using s SimpleQueryMachine transformation stage.

Regardless of whether any models were included in the XML input (which is the only relevance of the input contents for a SimpleQueryMachine), the machine will then iooks in its configuration at the thing denoted by instructAddr, and extracts the value of four properties:

Property URI

Req/Opt

Description

peruser_uri_scheme:queryFile 

required

URL or file path (relative or absolute) to the text of the SPARQL query to be executed.

peruser_uri_scheme:dataFile

optional

URL or file path (relative or absolute) of a default model for the SPARQL query.

peruser_uri_scheme:dataFormat

optional

Text format of the default model file, either "N3", "RDF/XML", ?...

peruser_uri_scheme:dataBaseURI

optional

Base uri for the default model.

When supplied, the dataFile+dataFormat+dataBaseURI instructs the SimpleQueryMachine to read in an RDF model from a file, merge it together with any (optional) models found in the XML input, and then use this merged model as the "default model" for the SPARQL query.    The SPARQL query may of course also specify other models using FROM clauses which are resolved and loaded according to the SPARQL specification.

The SPARQL query found in the specified file is parsed and executed against the default model, if any.  The output is then collected into an XML Doc using the SPARQL XML Results format.  This output doc is then returned, which in a cocoon context implies that it will be passed on to the next stage of the enclosing transformation pipeline.  

Known Limitations of SimpleQueryMachine

  1. Currently there is no support in the SimpleQueryMachine for performing SPARQL queries that use DESCRIBE or CONSTRUCT to produce RDF outputs.  However, this is a straightforward feature to add and we plan to do so very soon.
  2. The SimpleQueryMachine does not support the use of inference on the model to be queried.

Implementation Details

A SimpleQueryMachine instance runs a single SPARQL query each time its processDoc(Address instructAddr, Doc input) method is invoked.

It is important to note that SimpleQueryMachine is not a CommandMachine, and is thus not defined in accordance with that design pattern.  There are no "commands" involved in the execution of this machine.    It is a standalone DocProcessorMachine, and all of its functionality is triggered directly via the processDoc() method.

When this method is invoked, the machine must already contain certain configuration information, as the result of a previous call to setup or reconfigure.  This configuration may include specifications for any number of different queries as defined by the 1-4 parameters shown below.

The instructAddr argument is used to find the intended query specification within the machine's configuration.  InstructAddr must be the address of a thing containing at least the queryFile parameter shown above.

The input argument is an XML(-compatible) document which may contain some number of RDF/XML-encoded models.   The machine looks for these models at the XPath address //pmd:model_set/pmd:model, where pmd = http://www.peruser.net/model_description.  All such contents are parsed and merged into a single memory model which is then used as the default model of the SPARQL query.  If a data file is also supplied in the machine configuration, then this data file will be merged into the same default model constructed from the XML input.