This document describes the SWI-Prolog semweb package. The core of this package is an efficient main-memory based RDF store that is tightly connected to Prolog. Additional libraries provide reading and writing RDF/XML and Turtle data, caching loaded RDF documents and persistent storage. This package is the core of a ready-to-run platform for developing Semantic Web applications named ClioPatria, which is distributed separately. The SWI-Prolog RDF store is among the most memory efficient main-memory stores for RDF1http://cliopatria.swi-prolog.org/help/source/doc/home/vnc/prolog/src/ClioPatria/web/help/memusage.txt
Version 3 of the RDF library enhances concurrent use of the library by allowing for lock-free reading and writing using short-held locks. It provides Prolog compatible logical update view on the triple store and isolation using transactions and snapshots. This version of the library provides near real-time modification and querying of RDF graphs, making it particularly interesting for handling streaming RDF and graph manipulation tasks.
The core of the SWI-Prolog package semweb
is an
efficient main-memory RDF store written in C that is tightly integrated
with Prolog. It provides a fully logical predicate rdf/3
to query the RDF store efficiently by using multiple (currently 9)
indexes. In addition, SWI-Prolog provides libraries for reading and
writing XML/RDF and Turtle and a library that provides persistency using
a combination of efficient binary snapshots and journals.
Below, we describe a few usage scenarios that guides the current design of this Prolog-based RDF store.
Application prototyping platform
Bundled with ClioPatria, the store is an efficient platform for prototyping a wide range of semantic web applications. Prolog, connected to the main-memory based store is a productive platform for writing application logic that can be made available through the SPARQL endpoint of ClioPatria, using an application specific API (typically based on JSON or XML) or as an HTML based end-user application. Prolog is more versatile than SPARQL, allows composing of the logic from small building blocks and does not suffer from the Object-relational impedance mismatch.
Data integration
The SWI-Prolog store is optimized for entailment on the
rdfs:subPropertyOf
relation. The rdfs:subPropertyOf
relation is crucial for integrating data from multiple sources while
preserving the original richness of the sources because integration can
be achieved by defining the native properties as sub-properties of
properties from a unifying schema such as Dublin Core.
Dynamic data
This RDF store is one of the few stores that is primarily based on backward reasoning. The big advantage of backward reasoning is that it can much easier deal with changes to the database because it does not have to take care of propagating the consequences. Backward reasoning reduces storage requirements. The price is more reasoning during querying. In many scenarios the extra reasoning using a main memory will outperform the fetching the precomputed results from external storage.
Prototyping reasoning systems
Reasoning systems, not necessarily limited to entailment reasoning, can be prototyped efficiently on the Prolog based store. This includes‘what-if' reasoning, which is supported by snapshot and transaction isolation. These features, together with the concurrent loading capabilities, make the platform well equiped to collect relevant data from large external stores for intensive reasoning. Finally, the TIPC package can be used to create networks of cooperating RDF based agents.
Streaming RDF
Transactions, snapshots, concurrent modifications and the database monitoring facilities (see rdf_monitor/2) make the platform well suited for prototyping systems that deal with streaming RDF data.
Depending on the OS and further application restrictions, the SWI-Prolog RDF stores scales to about 15 million triples on 32-bit hardware. On 64-bit hardware, the scalability is limited by the amount of physical memory, allowing for approximately 4 million triples per gigabyte. The other limiting factor for practical use is the time required to load data and/or restore the database from the persistent file backup. Performance depends highly on hardware, concurrent performance and whether or not the data is spread over multiple (named) graphs that can be loaded in parallel. Restoring over 20 million triples per minute is feasible on medium hardware (Intel i7/2600 running Ubuntu 12.10).
The current‘semweb' package provides two sets of interface predicates. The original set is described in section 3.3. The new API is described in section 3.1. The original API was designed when RDF was not yet standardised and did not yet support data types and language indicators. The new API is designed from the RDF 1.1 specification, introducing consistent naming and access to literals using the value space. The new API is currently defined on top of the old API, so both APIs can be mixed in a single application.
The library(semweb/rdf11)
provides a new interface to
the SWI-Prolog RDF database based on the RDF 1.1 specification.
Triples consist of the following three terms:
Alias:Local
,
where Alias and Local are atoms. Each abbreviated IRI is expanded by the
system to a full IRI.
^
^
Type
Datatype IRI Prolog term xsd:float float xsd:double float xsd:decimal float (1) xsd:integer integer XSD integer sub-types integer xsd:boolean true
orfalse
xsd:date date(Y,M,D)
xsd:dateTime date_time(Y,M,D,HH,MM,SS)
(2,3)xsd:gDay integer xsd:gMonth integer xsd:gMonthDay month_day(M,D)
xsd:gYear integer xsd:gYearMonth year_month(Y,M)
xsd:time time(HH,MM,SS)
(2)
Notes:
(1) The current implementation of xsd:decimal
values as
floats is formally incorrect. Future versions of SWI-Prolog may
introduce decimal as a subtype of rational.
(2) SS fields denote the number of seconds. This can either be an integer or a float.
(3) The date_time
structure can have a 7th field that
denotes the timezone offset in seconds as an integer.
In addition, a ground object value is translated into a properly typed RDF literal using rdf_canonical_literal/2.
There is a fine distinction in how duplicate statements are handled in rdf/[3,4]: backtracking over rdf/3 will never return duplicate triples that appear in multiple graphs. rdf/4 will return such duplicate triples, because their graph term differs.
S | is the subject term. It is either a blank node or IRI. |
P | is the predicate term. It is always an IRI. |
O | is the object term. It is either a
literal, a blank node or IRI (except for true and false
that denote the values of datatype XSD boolean). |
G | is the graph term. It is always an IRI. |
inverse_of
and
symmetric
. See rdf_set_predicate/2.inverse_of
and
symmetric
predicate properties. The version rdf_reachable/5
maximizes the steps considered and returns the number of steps taken.
If both S and O are given, these predicates are semidet
.
The number of steps D is minimal because the implementation
uses
breadth first search.
Constraints on literal values
->
), the
semantics of the goal remains the same. Preferably, constraints are
placed before the graph pattern as they often help the RDF
database to exploit its literal indexes. In the example below, the
database can choose between using the subject and/or predicate hash or
the ordered literal table.
{ Date >= "2000-01-01"^^xsd:date }, rdf(S, P, Date)
The following constraints are currently defined:
>
,
>=
,==
,=<
,<
The predicates rdf_where/1
and {}/1 are identical. The
rdf_where/1 variant is provided
to avoid ambiguity in applications where {}/1 is used for other
purposes. Note that it is also possible to write rdf11:{...}
.
Enumerating objects by role
Enumerating objects by type
Testing objects types
For performance reasons, this does not check for compliance to the syntax defined in RFC 3987. This checks whether the term is (1) an atom and (2) not a blank node identifier.
Success of this goal does not imply that the IRI is present in the database (see rdf_iri/1 for that).
A blank node is represented by an atom that starts with
_:
.
Success of this goal does not imply that the blank node is present in the database (see rdf_bnode/1 for that).
For backwards compatibility, atoms that are represented with an atom
that starts with __
are also considered to be a blank node.
An RDF literal term is of the form String@LanguageTag
or
Value^^Datatype
.
Success of this goal does not imply that the literal is well-formed or that it is present in the database (see rdf_literal/1 for that).
Success of this goal does not imply that the name is well-formed or that it is present in the database (see rdf_name/1 for that).
Success of this goal does not imply that the object term in well-formed or that it is present in the database (see rdf_object/1 for that).
Since any RDF term can appear in the object position, this is equaivalent to rdf_is_term/1.
Success of this goal does not imply that the predicate term is present in the database (see rdf_predicate/1 for that).
Since only IRIs can appear in the predicate position, this is equivalent to rdf_is_iri/1.
Only blank nodes and IRIs can appear in the subject position.
Success of this goal does not imply that the subject term is present in the database (see rdf_subject/1 for that).
Since blank nodes are represented by atoms that start with‘_:`
and an IRIs are atoms as well, this is equivalent to
atom(Term)
.
Success of this goal does not imply that the RDF term is present in the database (see rdf_term/1 for that).
Prolog Term Datatype IRI float xsd:double integer xsd:integer string xsd:string true
orfalse
xsd:boolean date(Y,M,D)
xsd:date date_time(Y,M,D,HH,MM,SS)
xsd:dateTime date_time(Y,M,D,HH,MM,SS,TZ)
xsd:dateTime month_day(M,D)
xsd:gMonthDay year_month(Y,M)
xsd:gYearMonth time(HH,MM,SS)
xsd:time
For example:
?- rdf_canonical_literal(42, X). X = 42^^'http://www.w3.org/2001/XMLSchema#integer'.
^
^
Type
Note that this ordering is a complete ordering of RDF terms that is consistent with the partial ordering defined by SPARQL.
Diff | is one of < , =
or > |
If a type is provided using Value^
^
Type
syntax, additional conversions are performed. All types accept either an
atom or Prolog string holding a valid RDF lexical value for the type and
xsd:float and xsd:double accept a Prolog integer.
_:
. Blank nodes generated by this predicate are of the form
_:genid
followed by a unique integer.
The following predicates are utilities to access RDF 1.1 collections.
A collection is a linked list created from rdf:first
and rdf:next
triples, ending in rdf:nil
.
rdf:first
and rdf:rest
property and the list ends in rdf:nil
.
If RDFTerm is unbound, RDFTerm is bound to each maximal
RDF list. An RDF list is maximal if there is no triple rdf(_, rdf:rest, RDFList)
.
Implementation of the conventional human interpretation of RDF 1.1 containers.
RDF containers are open enumeration structures as opposed to RDF collections or RDF lists which are closed enumeration structures. The same resource may appear in a container more than once. A container may be contained in itself.
rdf:Alt
with
first member
Default and remaining members Others.
Notice that this construct adds no machine-processable semantics but is conventionally used to indicate to a human reader that the numerical ordering of the container membership properties of Container is intended to only be relevant in distinguishing between the first and all non-first members.
Default denotes the default option to take when choosing one of the alternatives container in Container. Others denotes the non-default options that can be chosen from.
Notice that this construct adds no machine-processable semantics but is conventionally used to indicate to a human reader that the numerical ordering of the container membership properties of Container is intended to not be significant.
Notice that this construct adds no machine-processable semantics but is conventionally used to indicate to a human reader that the numerical ordering of the container membership properties of Container is intended to be significant.
Success of this goal does not imply that Property is present in the database.
rdf(Container, P, Elem)
is true and P is a
container membership property.rdf(Container, P, Elem)
is true and P is the N-th
(0-based) container membership property.
The central module of the RDF infrastructure is library(semweb/rdf_db)
.
It provides storage and indexed querying of RDF triples. RDF data is
stored as quintuples. The first three elements denote the RDF triple.
The extra Graph and Line elements provide information
about the origin of the triple.
The actual storage is provided by the foreign language (C)
module. Using a dedicated C-based implementation we can reduce memory
usage and improve indexing capabilities, for example by providing a
dedicated index to support entailment over rdfs:subPropertyOf
.
Currently the following indexes are provided (S=subject, P=predicate,
O=object, G=graph):
rdfs:subPropertyOf
relations. This index
supports rdf_has/3 to query a
property and all its children efficiently.(rdf(R,_,_);rdf(_,_,R))
normally produces many
duplicate answers.library(semweb/litindex)
provides indexed search on tokens inside literals.
literal(Value)
if the object is a literal value. If a value of the form
NameSpaceID:LocalName is provided it is expanded to a ground atom using expand_goal/2.
This implies you can use this construct in compiled code without paying
a performance penalty. Literal values take one of the following forms:
rdf:datatype
TypeID. The Value is either the textual
representation or a natural Prolog representation. See the option
convert_typed_literal(:Convertor) of the parser. The storage layer
provides efficient handling of atoms, integers (64-bit) and floats
(native C-doubles). All other data is represented as a Prolog record.
For literal querying purposes, Object can be of the form
literal(+Query, -Value)
, where Query is one of the terms
below. If the Query takes a literal argument and the value has a numeric
type numerical comparison is performed.
icase(Text)
. Backward compatibility.
Backtracking never returns duplicate triples. Duplicates can be
retrieved using rdf/4. The predicate rdf/3
raises a type-error if called with improper arguments. If rdf/3
is called with a term literal(_)
as Subject or Predicate
object it fails silently. This allows for graph matching goals like
rdf(S,P,O)
,rdf(O,P2,O2)
to proceed without
errors.
Source | is a term Graph:Line. If Source is instatiated, passing an atom is the same as passing Atom:_. |
rdf(Subject, Predicate, Object)
is
true exploiting the rdfs:subPropertyOf predicate as well as inverse
predicates declared using rdf_set_predicate/2
with the
inverse_of
property.inverse_of(Pred)
.symetric(true)
or inverse_of(P2)
properties.
If used with either Subject or Object unbound, it first returns the origin, followed by the reachable nodes in breadth-first search-order. The implementation internally looks one solution ahead and succeeds deterministically on the last solution. This predicate never generates the same node twice and is robust against cycles in the transitive relation.
With all arguments instantiated, it succeeds deterministically if a path can be found from Subject to Object. Searching starts at Subject, assuming the branching factor is normally lower. A call with both Subject and Object unbound raises an instantiation error. The following example generates all subclasses of rdfs:Resource:
?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource'). X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ; X = 'http://www.w3.org/2000/01/rdf-schema#Class' ; X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ; ...
infinite
to impose no distance-limit.
The predicates below enumerate the basic objects of the RDF store. Most of these predicates also enumerate objects that are not associated to any currently visible triple. Objects are retained as long as they are visible in active queries or snapshots. After that, some are reclaimed by the RDF garbage collector, while others are never reclaimed.
This predicate is primarily intended as a way to process all resources without processing resources twice. The user must be aware that some of the returned resources may not appear in any visible triple.
Note that resources that have rdf:type
rdf:Property
are not automatically included in the result-set of this predicate,
while all resources that appear as the second argument of a
triple are included.
The predicates below modify the RDF store directly. In addition, data may be loaded using rdf_load/2 or by restoring a persistent database using rdf_attach_db/2. Modifications follow the Prolog logical update view semantics, which implies that modifications remain invisible to already running queries. Further isolation can be achieved using rdf_transaction/3.
user
. Subject
and Predicate are resources. Object is either a
resource or a term literal(Value)
. See rdf/3
for an explanation of Value for typed and language qualified literals.
All arguments are subject to name-space expansion. Complete duplicates
(including the same graph and‘line' and with a compatible‘lifespan')
are not added to the database.Graph | is either the name of a graph (an atom) or a term Graph:Line, where Line is an integer that denotes a line number. |
literal(Value)
.
The update semantics of the RDF database follows the conventional Prolog logical update view. In addition, the RDF database supports transactions and snapshots.
rdf_transaction(Goal, user, [])
. See rdf_transaction/3.rdf_transaction(Goal, Id, [])
. See rdf_transaction/3.
library(semweb/rdf_persistency)
.Processed options are:
true
, which implies that an anonymous
snapshot is created at the current state of the store. Modifications due
to executing Goal are only visible to Goal.snapshot
option. A snapshot created outside a
transaction exists until it is deleted. Snapshots taken inside a
transaction can only be used inside this transaction.
_:
.
For backward compatibility reason, __
is also considered to
be a blank node.
The RDF library can read and write triples in RDF/XML and a
proprietary binary format. There is a plugin interface defined to
support additional formats. The library(semweb/turtle)
uses
this plugin API to support loading Turtle files using rdf_load/2.
rdf_load(FileOrList, [])
. See rdf_load/2.share
(default),
equivalent blank nodes are shared in the same resource.library(semweb/turtle)
extend the
set of recognised extensions.file://
URL when loading a file or, if the
specification is a URL, its normalized version without the optional #fragment.true
, changed
(default) or
not_loaded
.not_modified
, cached(File)
,
last_modified(Stamp)
or unknown
.false
, do not use or create a cache file.true
(default false
), register xmlns
namespace declarations or Turtle @prefix
prefixes using
rdf_register_prefix/3
if there is no conflict.true
, the message reporting completion is printed using
level silent
. Otherwise the level is informational
.
See also print_message/2.Boolean+
Other options are forwarded to process_rdf/3. By default, rdf_load/2 only loads RDF/XML from files. It can be extended to load data from other formats and locations using plugins. The full set of plugins relevant to support different formats and locations is below:
:- use_module(library(semweb/turtle)). % Turtle and TriG :- use_module(library(semweb/rdf_ntriples)). :- use_module(library(semweb/rdf_zlib_plugin)). :- use_module(library(semweb/rdf_http_plugin)). :- use_module(library(http/http_ssl_plugin)).
library(semweb/rdf_persistency)
and
library(semweb/rdf_cache)
rdf_save(Out, [])
. See rdf_save/2
for details.false
(default true
) do not save blank
nodes that do not appear (indirectly) as object of a named resource.write_xml_base
option.xml:lang
saved with rdf:RDF element.true
(default false
), inline resources when
encountered for the first time. Normally, only bnodes are handled this
way.true
(default false
), emit subjects sorted
on the full URI. Useful to make file comparison easier.false
, do not include the xml:base
declaration that is written normally when using the
base_uri
option.false
(default true
), never use xml
attributes to save plain literal attributes, i.e., always used an XML
element as in <name>Joe</name>
.Out | Location to save the data. This can also
be a file-url (file://path ) or a stream wrapped in a term stream(Out) . |
Partial save
Sometimes it is necessary to make more arbitrary selections of
material to be saved or exchange RDF descriptions over an open network
link. The predicates in this section provide for this. Character
encoding issues are derived from the encoding of the Stream,
providing support for
utf8
, iso_latin_1
and ascii
.
Save an RDF header, with the XML header, DOCTYPE, ENTITY and opening the rdf:RDF element with appropriate namespace declarations. It uses the primitives from section 3.5 to generate the required namespaces and desired short-name. Options is one of:
rdf
and rdfs
are added to the provided List. If a namespace is not
declared, the resource is emitted in non-abreviated form.Fast loading and saving
Loading and saving RDF format is relatively slow. For this reason we
designed a binary format that is more compact, avoids the complications
of the RDF parser and avoids repetitive lookup of (URL) identifiers.
Especially the speed improvement of about 25 times is worth-while when
loading large databases. These predicates are used for caching by
rdf_load/2 under certain
conditions as well as for maintaining persistent snapshots of the
database using
library(semweb/rdf_persistency)
.
Many RDF stores turned triples into quadruples. This store is no exception, initially using the 4th argument to store the filename from which the triple was loaded. Currently, the 4th argument is the RDF named graph. A named graph maintains some properties, notably to track origin, changes and modified state.
modified(false)
.Additional graph properties can be added by defining rules for the multifile predicate property_of_graph/2. Currently, the following extensions are defined:
library(semweb/rdf_persistency)
true
if the graph is persistent.
Literal values are ordered and indexed using a skip list. The aim of this index is threefold.
library(semweb/litindex)
.
As string literal matching is most frequently used for searching
purposes, the match is executed case-insensitive and after removal of
diacritics. Case matching and diacritics removal is based on Unicode
character properties and independent from the current locale. Case
conversion is based on the‘simple uppercase mapping' defined by
Unicode and diacritic removal on the‘decomposition type'. The
approach is lightweight, but somewhat simpleminded for some languages.
The tables are generated for Unicode characters upto 0x7fff. For more
information, please check the source-code of the mapping-table generator
unicode_map.pl
available in the sources of this package.
Currently the total order of literals is first based on the type of literal using the ordering numeric < string < term Numeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. Strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.
The ordered tree is used for indexed execution of
literal(prefix(Prefix), Literal)
as well as literal(like(Like), Literal)
if Like does not start with a‘*'. Note that results of
queries that use the tree index are returned in alphabetical order.
The predicates below form an experimental interface to provide more
reasoning inside the kernel of the rdb_db engine. Note that symetric
,
inverse_of
and transitive
are not yet
supported by the rest of the engine. Also note that there is no relation
to defined RDF properties. Properties that have no triples are not
reported by this predicate, while predicates that are involved in
triples do not need to be defined as an instance of rdf:Property.
symmetric(true)
is the same as inverse_of(Predicate)
,
i.e., creating a predicate that is the inverse of itself.inverse_of([])
.
The transitive
property is currently not used. The symmetric
and inverse_of
properties are considered by rdf_has/3,4
and
rdf_reachable/3.
inverse_of(Self)
.rdf_subject_branch_factor
property, uniqueness of the object value is computed from the hash key
rather than the actual values.rdf_subject_branch_factor
, but also considering
triples of‘subPropertyOf' this relation. See also rdf_has/3.rdf_object_branch_factor
, but also considering
triples of‘subPropertyOf' this relation. See also rdf_has/3.
Prolog code often contains references to constant resources with a
known
prefix (also known as XML namespaces). For example,
http://www.w3.org/2000/01/rdf-schema#Class
refers to the
most general notion of an RDFS class. Readability and maintability
concerns require for abstraction here. The RDF database maintains a
table of known prefixes. This table can be queried using rdf_current_ns/2
and can be extended using rdf_register_ns/3.
The prefix database is used to expand prefix:local
terms
that appear as arguments to calls which are known to accept a resource.
This expansion is achieved by Prolog preprocessor using expand_goal/2.
rdf_current_prefix(Prefix, Expansion), atom_concat(Expansion, Local, URI),
true
, replace existing namespace alias. Please note that
replacing a namespace is dangerous as namespaces affect preprocessing.
Make sure all code that depends on a namespace is compiled after
changing the registration.true
and Alias is already defined, keep the original
binding for Prefix and succeed silently.Without options, an attempt to redefine an alias raises a permission error.
Predefined prefixes are:
Explicit expansion is achieved using the predicates below. The predicate rdf_equal/2 performs this expansion at compile time, while the other predicates do it at runtime.
Note that this predicate is a meta-predicate on its output argument. This is necessary to get the module context while the first argument may be of the form (:)/2. The above mode description is correct, but should be interpreted as (?,?).
existence_error(rdf_prefix, Prefix)
existence_error(rdf_prefix, Prefix)
Terms of the form Prefix:Local
that appear in TermIn
for which
Prefix is not defined are not replaced. Unlike rdf_global_id/2
and
rdf_global_object/2, no
error is raised.
Namespace handling for custom predicates
If we implement a new predicate based on one of the predicates of the semweb libraries that expands namespaces, namespace expansion is not automatically available to it. Consider the following code computing the number of distinct objects for a certain property on a certain object.
cardinality(S, P, C) :- ( setof(O, rdf_has(S, P, O), Os) -> length(Os, C) ; C = 0 ).
Now assume we want to write labels/2 that returns the number of distict labels of a resource:
labels(S, C) :- cardinality(S, rdfs:label, C).
This code will not work because rdfs:label
is not
expanded at compile time. To make this work, we need to add an rdf_meta/1
declaration.
:- rdf_meta cardinality(r,r,-).
The example below defines the rule concept/1.
:- use_module(library(semweb/rdf_db)). % for rdf_meta :- use_module(library(semweb/rdfs)). % for rdfs_individual_of :- rdf_meta concept(r). %% concept(?C) is nondet. % % True if C is a concept. concept(C) :- rdfs_individual_of(C, skos:'Concept').
In addition to expanding calls, rdf_meta/1 also causes expansion of clause heads for predicates that match a declaration. This is typically used write Prolog statements about resources. The following example produces three clauses with expanded (single-atom) arguments:
:- use_module(library(semweb/rdf_db)). :- rdf_meta label_predicate(r). label_predicate(rdfs:label). label_predicate(skos:prefLabel). label_predicate(skos:altLabel).
This section describes the remaining predicates of the
library(semweb/rdf_db)
module.
Location | is a term File:Line. |
When inside a transaction, Generation is unified to a term TransactionStartGen + InsideTransactionGen. E.g., 4+3 means that the transaction was started at generation 4 of the global database and we have created 3 new generations inside the transaction. Note that this choice of representation allows for comparing generations using Prolog arithmetic. Comparing a generation in one transaction with a generation in another transaction is meaningless.
triples
for the interpretation of this value.icase
, substring
, word
, prefix
or like
. For backward compatibility, exact
is
a synonym for icase
.Major*10000 + Minor*100 + Patch.
Storing RDF triples in main memory provides much better performance than using external databases. Unfortunately, although memory is fairly cheap these days, main memory is severely limited when compared to disks. Memory usage breaks down to the following categories. Rough estimates of the memory usage is given for 64-bit systems. 32-bit system use slightly more than half these amounts.
Bucket arrays are resized if necessary. Old triples remain at their original location. This implies that a query may need to scan multiple buckets. The garbage collector may relocate old indexed triples. It does so by copying the old triple. The old triple is later reclaimed by GC. Reindexed triples will be reused, but many reindexed triples may result in a significant memory fragmentation.
The hash parameters can be controlled with rdf_set/1. Applications that are tight on memory and for which the query characteristics are more or less known can optimize performance and memory by fixing the hash-tables. By fixing the hash-tables we can tailor them to the frequent query patterns, we avoid the need for to check multiple hash buckets (see above) and we avoid memory fragmentation due to optimizing triples for resized hashes.
set_hash_parameters :- rdf_set(hash(s, size, 1048576)), rdf_set(hash(p, size, 1024)), rdf_set(hash(sp, size, 2097152)), rdf_set(hash(o, size, 1048576)), rdf_set(hash(po, size, 2097152)), rdf_set(hash(spo, size, 2097152)), rdf_set(hash(g, size, 1024)), rdf_set(hash(sg, size, 1048576)), rdf_set(hash(pg, size, 2048)).
s
,
p
, sp
, o
, po
, spo
, g
, sg
or pg
. Parameter is one of:
permission_error
exception.The garbage collector
The RDF store has a garbage collector that runs in a separate thread named =__rdf_GC=. The garbage collector removes the following objects:
rdfs:subPropertyOf
relations
that are related to old queries.
In addition, the garbage collector reindexes triples associated to
the hash-tables before the table was resized. The most recent resize
operation leads to the largest number of triples that require
reindexing, while the oldest resize operation causes the largest
slowdown. The parameter optimize_threshold
controlled by rdf_set/1
can be used to determine the number of most recent resize operations for
which triples will not be reindexed. The default is 2.
Normally, the garbage collector does it job in the background at a low priority. The predicate rdf_gc/0 can be used to reclaim all garbage and optimize all indexes.Warming up the database
The RDF store performs many operations lazily or in background threads. For maximum performance, perform the following steps:
warm_indexes :- ignore(rdf(s, _, _)), ignore(rdf(_, p, _)), ignore(rdf(_, _, o)), ignore(rdf(s, p, _)), ignore(rdf(_, p, o)), ignore(rdf(s, p, o)), ignore(rdf(_, _, _, g)), ignore(rdf(s, _, _, g)), ignore(rdf(_, p, _, g)).
Predicates:
__rdf_GC
performs garbage collection as long as
it is considered‘useful'.
Using rdf_gc/0 should only be needed to ensure a fully clean database for analysis purposes such as leak detection.
The duplicates marks are used to reduce the administrative load of avoiding duplicate answers. Normally, the duplicates are marked using a background thread that is started on the first query that produces a substantial amount of duplicates.
The predicate rdf_monitor/2
allows registrations of call-backs with the RDF store. These call-backs
are typically used to keep other databases in sync with the RDF store.
For example,
library(library(semweb/rdf_persistency))
monitors the RDF
store for maintaining a persistent copy in a set of files and
library(library(semweb/rdf_litindex))
uses added and
deleted literal values to maintain a fulltext index of literals.
literal(Arg)
of the triple's object. This event is
introduced in version 2.5.0 of this library.begin(Nesting)
or
end(Nesting)
. Nesting expresses the nesting
level of transactions, starting at‘0' for a toplevel transaction. Id
is the second argument of rdf_transaction/2.
The following transaction Ids are pre-defined by the library:
file(Path)
or stream(Stream)
.file(Path)
.
Mask is a list of events this monitor is interested in.
Default (empty list) is to report all events. Otherwise each element is
of the form +Event or -Event to include or exclude monitoring for
certain events. The event-names are the functor names of the events
described above. The special name all
refers to all events
and
assert(load)
to assert events originating from rdf_load_db/1.
As loading triples using rdf_load_db/1
is very fast, monitoring this at the triple level may seriously harm
performance.
This predicate is intended to maintain derived data, such as a journal, information for undo, additional indexing in literals, etc. There is no way to remove registered monitors. If this is required one should register a monitor that maintains a dynamic list of subscribers like the XPCE broadcast library. A second subscription of the same hook predicate only re-assignes the mask.
The monitor hooks are called in the order of registration and in the
same thread that issued the database manipulation. To process all
changes in one thread they should be send to a thread message queue. For
all updating events, the monitor is called while the calling thread has
a write lock on the RDF store. This implies that these events are
processed strickly synchronous, even if modifications originate from
multiple threads. In particular, the transaction
begin,
... updates ... end sequence is never interleaved with
other events. Same for load
and parse
.
This RDF low-level module has been created after two year
experimenting with a plain Prolog based module and a brief evaluation of
a second generation pure Prolog implementation. The aim was to be able
to handle upto about 5 million triples on standard (notebook) hardware
and deal efficiently with subPropertyOf
which was
identified as a crucial feature of RDFS to realise fusion of different
data-sets.
The following issues are identified and not solved in suitable manner.
subPropertyOf
of subPropertyOf
subPropertyOf
, it is likely to be profitable to
handle resource identity efficient. The current system has no support
for it.
The library(rdf_db)
module provides several hooks for
extending its functionality. Database updates can be monitored and acted
upon through the features described in section
3.4. The predicate rdf_load/2
can be hooked to deal with different formats such as rdfturtle,
different input sources (e.g. http) and different strategies for caching
results.
The hooks below are used to add new RDF file formats and sources from which to load data to the library. They are used by the modules described below and distributed with the package. Please examine the source-code if you want to add new formats or locations.
library(library(semweb/turtle))
library(library(semweb/rdf_zlib_plugin))
library(library(semweb/rdf_http_plugin))
library(library(http/http_ssl_plugin))
library(library(semweb/rdf_http_plugin))
to load RDF from HTTPS servers.library(library(semweb/rdf_persistency))
library(library(semweb/rdf_cache))
file(+Name)
,
stream(+Stream)
or url(Protocol, URL)
. If this
hook succeeds, the RDF will be read from Stream using rdf_load_stream/3.
Otherwise the default open functionality for file and stream are used.xml
.owl
. Format is either a built-in format (xml
or triples
) or a format understood by the rdf_load_stream/3
hook.
This
module uses the library(zlib)
library to load compressed
files on the fly. The extension of the file must be .gz
.
The file format is deduced by the extension after stripping the .gz
extension. E.g. rdf_load('file.rdf.gz')
.
This module allows for rdf_load('http://...')
.
It exploits the library library(http/http_open.pl)
. The
format of the URL is determined from the mime-type returned by the
server if this is one of
text/rdf+xml
, application/x-turtle
or
application/turtle
. As RDF mime-types are not yet widely
supported, the plugin uses the extension of the URL if the claimed
mime-type is not one of the above. In addition, it recognises
text/html
and application/xhtml+xml
, scanning
the XML content for embedded RDF.
The library library(semweb/rdf_cache)
defines the
caching strategy for triples sources. When using large RDF sources,
caching triples greatly speedup loading RDF documents. The cache library
implements two caching strategies that are controlled by rdf_set_cache_options/1.
Local caching This approach applies to files only. Triples are
cached in a sub-directory of the directory holding the source. This
directory is called .cache
(_cache
on
Windows). If the cache option create_local_directory
is true
,
a cache directory is created if posible.
Global caching This approach applies to all sources, except
for unnamed streams. Triples are cached in directory defined by the
cache option global_directory
.
When loading an RDF file, the system scans the configured cache files
unless cache(false)
is specified as option to rdf_load/2
or caching is disabled. If caching is enabled but no cache exists, the
system will try to create a cache file. First it will try to do this
locally. On failure it will try to configured global cache.
enabled(Boolean)
If true
, caching is
enabled.local_directory(Name)
. Plain name of local directory.
Default .cache
(_cache
on Windows).create_local_directory(Bool)
If true
, try
to create local cache directoriesglobal_directory(Dir)
Writeable directory for storing
cached parsed files.create_global_directory(Bool)
If true
, try
to create the global cache directory.read
, it returns the name of an existing file. If write
it returns where a new cache file can be overwritten or created.
The library library(semweb/rdf_litindex.pl)
exploits the
primitives of section 4.5.1 and the
NLP package to provide indexing on words inside literal constants. It
also allows for fuzzy matching using stemming and‘sounds-like'
based on the double metaphone algorithm of the NLP package.
sounds(Like,
Words)
, stem(Like, Words)
or prefix(Prefix,
Words)
. On compound expressions, only combinations that provide
literals are returned. Below is an example after loading the ULAN2Unified
List of Artist Names from the Getty Foundation. database
and showing all words that sounds like‘rembrandt' and appear
together in a literal with the word‘Rijn'. Finding this result
from the 228,710 literals contained in ULAN requires 0.54 milliseconds
(AMD 1600+).
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L). L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt', 'Rembrand', 'Rembrandt', 'Rembrandtsz', 'Rembrant', 'Rembrants', 'Rijmbrand'])]
Here is another example, illustrating handling of diacritics:
?- rdf_token_expansions(case(cafe), L). L = [case(cafe, [cafe, caf\'e])]
rdf_litindex:tokenization(Literal, -Tokens)
. On failure it
calls tokenize_atom/2
from the NLP package and deletes the following: atoms of length 1,
floats, integers that are out of range and the english words and
, an
, or
, of
,
on
, in
, this
and the
.
Deletion first calls the hook rdf_litindex:exclude_from_index(token,
X)
. This hook is called as follows:
no_index_token(X) :- exclude_from_index(token, X), !. no_index_token(X) :- ...
‘Literal maps' provide a relation between literal values, intended to create additional indexes on literals. The current implementation can only deal with integers and atoms (string literals). A literal map maintains an ordered set of keys. The ordering uses the same rules as described in section 4.5. Each key is associated with an ordered set of values. Literal map objects can be shared between threads, using a locking strategy that allows for multiple concurrent readers.
Typically, this module is used together with rdf_monitor/2
on the channals new_literal
and old_literal
to
maintain an index of words that appear in a literal. Further abstraction
using Porter stemming or Metaphone can be used to create additional
search indices. These can map either directly to the literal values, or
indirectly to the plain word-map. The SWI-Prolog NLP package provides
complimentary building blocks, such as a tokenizer, Porter stem and
Double Metaphone.
rdf_litindex.pl
.not(Key)
. If not-terms
are provided, there must be at least one positive keywords. The
negations are tested after establishing the positive matches.
The library(semweb/rdf_persistency)
provides reliable persistent storage for the RDF data. The store uses a
directory with files for each source (see rdf_source/1)
present in the database. Each source is represented by two files, one in
binary format (see rdf_save_db/2)
representing the base state and one represented as Prolog terms
representing the changes made since the base state. The latter is called
the journal.
cpu_count
or 1 (one) on
systems where this number is unknown. See also concurrent/3.true
, supress loading messages from rdf_attach_db/2.true
, nested log transactions are added to the
journal information. By default (false
), no log-term is
added for nested transactions.
The database is locked against concurrent access using a file
lock
in Directory. An attempt to attach to a
locked database raises a permission_error
exception. The
error context contains a term rdf_locked(Args)
, where args
is a list containing time(Stamp)
and pid(PID)
.
The error can be caught by the application. Otherwise it prints:
ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB' ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
false
, the
journal and snapshot for the database are deleted and further changes to
triples associated with DB are not recorded. If Bool
is true
a snapshot is created for the current state and
further modifications are monitored. Switching persistency does not
affect the triples in the in-memory RDF database.min_size(KB)
only
journals larger than KB Kbytes are merged with the base
state. Flushing a journal takes the following steps, ensuring a stable
state can be recovered at any moment.
.new
..new
file over the base
state.Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.
The above predicates suffice for most applications. The predicates in
this section provide access to the journal files and the base state
files and are intented to provide additional services, such as reasoning
about the journals, loaded files, etc.3A
library library(rdf_history)
is under development
exploiting these features supporting wiki style editing of RDF.
Using rdf_transaction(Goal, log(Message))
, we can add
additional records to enrich the journal of affected databases with Term
and some additional bookkeeping information. Such a transaction adds a
term
begin(Id, Nest, Time, Message)
before the change operations
on each affected database and end(Id, Nest, Affected)
after
the change operations. Here is an example call and content of the
journal file mydb.jrn
. A full explanation of the terms that
appear in the journal is in the description of rdf_journal_file/2.
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
start([time(1183540570)]). begin(1, 0, 1183540570.36, by(jan)). assert(s, p, o). end(1, 0, []). end([time(1183540578)]).
Using rdf_transaction(Goal, log(Message, DB))
, where DB
is an atom denoting a (possibly empty) named graph, the system
guarantees that a non-empty transaction will leave a possibly empty
transaction record in DB. This feature assumes named graphs are named
after the user making the changes. If a user action does not affect the
user's graph, such as deleting a triple from another graph, we still
find record of all actions performed by some user in the journal of that
user.
time(Stamp)
.time(Stamp)
.log(Message)
. Id is an
integer counting the logged transactions to this database. Numbers are
increasing and designed for binary search within the journal file.
Nest is the nesting level, where‘0' is a toplevel
transaction.
Time is a time-stamp, currently using float notation with two
fractional digits. Message is the term provided by the user
as argument of the log(Message)
transaction.log(Message)
. Id and Nest
match the begin-term. Others gives a list of other databases
affected by this transaction and the Id of these records. The
terms in this list have the format DB:Id..trp
for the base state and .jrn
for the
journal.
This module implements the Turtle language for representing the RDF triple model as defined by Dave Beckett from the Institute for Learning and Research Technology University of Bristol and later standardized by the W3C RDF working group.
This module acts as a plugin to rdf_load/2,
for processing files with one of the extensions .ttl
or .n3
.
rdf(Subject, Predicate, Object [, Graph])
The representation is consistent with the SWI-Prolog RDF/XML and ntriples parsers. Provided options are:
node(1)
, node(2)
, ...auto
(default), turtle
or trig
.
The auto mode switches to TriG format of there is a
{
before the first triple. Finally, of the format is
explicitly stated as turtle
and the file appears to be a
TriG file, a warning is printed and the data is loaded while ignoring
the graphs.->
IRI mapping because
this rarely causes errors. To force strictly conforming mode, pass iri
.prefixes(Pairs)
. Compatibility to rdf_load/2.[]
if there is no base-uri.warning
(default), print the error and continue parsing
the remainder of the file. If error
, abort with an
exception on the first error encountered.on_error(warning)
is active, this option cane be used to
retrieve the number of generated errors.Input | is one of stream(Stream) , atom(Atom) ,
a http ,
https or file url or a filename specification
as accepted by absolute_file_name/3. |
rdf(S,P,O)
terms for a normal Turtle file or rdf(S,P,O,G)
terms if the GRAPH
keyword is used to associate a set of
triples in the document with a particular graph. The Graph
argument provides the default graph for storing the triples and Line
is the line number where the statement started.
call(OnObject, ListOfTriples, Graph:Line)
This predicate supports the same Options as rdf_load_turtle/3.
Errors encountered are sent to print_message/2, after which the parser tries to recover and parse the remainder of the data.
turtle_write_quoted_string(Out, Value, false)
,
writing a string with only a single "
. Embedded newlines
are escapes as \n
.<...>
true
(default), use a
for the predicate rdf:type
.
Otherwise use the full resource.true
(default false
), emit numeric
datatypes using Prolog's write to achieve canonical output.true
(default), write some informative comments between
the output segmentstrue
(default), using P-O and O-grouping.true
(default), inline bnodes that are used once.true
(default), omit the type if allowed by turtle.true
(default false
), do not print the
final informational message.true
(default false
), write [...] and (...)
on a single line.true
(default), use prefixes from rdf_current_prefix/2.
The option expand
allows for serializing alternative
graph representations. It is called through call/5,
where the first argument is the expand-option, followed by S,P,O,G. G is
the graph-option (which is by default a variable). This notably allows
for writing RDF graphs represented as rdf(S,P,O)
using the
following code fragment:
triple_in(RDF, S,P,O,_G) :- member(rdf(S,P,O), RDF). ..., rdf_save_turtle(Out, [ expand(triple_in(RDF)) ]),
Out | is one of stream(Stream) , a
stream handle, a file-URL or an atom that denotes a filename. |
graph(+Graph)
option and instead processes one additional
option:
encoding(utf8)
,indent(0)
,tab_distance(0)
,subject_white_lines(1)
,align_prefixes(false)
,user_prefixes(false)
comment(false)
,group(false)
,single_line_bnodes(true)
The library(semweb/rdf_ntriples)
provides a fast reader
for the RDF N-Triples and N-Quads format. N-Triples is a simple format,
originally used to support the W3C RDF test suites. The current format
has been extended and is a subset of the Turtle format (see
library(semweb/turtle)
).
The API of this library is almost identical to library(semweb/turtle)
.
This module provides a plugin into rdf_load/2,
making this predicate support the format ntriples
and nquads
.
Triple | is a term triple(Subject,Predicate,Object) .
Arguments follow the normal conventions of the RDF libraries. NodeID
elements are mapped to node(Id) . If end-of-file is reached, Triple
is unified with
end_of_file . |
syntax_error(Message)
on syntax errorsQuad | is a term quad(Subject,Predicate,Object,Graph) .
Arguments follow the normal conventions of the RDF libraries. NodeID
elements are mapped to node(Id) . If end-of-file is reached, Quad
is unified with
end_of_file . |
syntax_error(Message)
on syntax errors
triple(Subject,Predicate,Object)
quad(Subject,Predicate,Object,Graph)
.node(_)
, bnodes are returned as node(Id)
.:<
baseuri>_warning
(default) or error
on_error
is warning
, unify Count
with th number of errors.Triples | is a list of rdf(Subject, Predicate, Object) |
Quads | is a list of rdf(Subject, Predicate, Object, Graph) |
graph(Graph)
.
CallBack | is called as call(CallBack, Triples, Graph) ,
where Triples is a list holding a single rdf(S,P,O) triple.
Graph is passed from the graph option and unbound if this
option is omitted. |
ntriples
and nquads
formats.nt
,
ntriples
and nquads
.
This module implements extraction of RDFa triples from parsed XML or HTML documents. It has two interfaces: read_rdfa/3 to read triples from some input (stream, file, URL) and xml_rdfa/3 to extract triples from an HTML or XML document that is already parsed with load_html/3 or load_xml/3.
rdf(S,P,O)
triples extracted from
Input. Input is either a stream, a file name, a
URL referencing a file name or a URL that is valid for http_open/3. Options
are passed to open/4, http_open/3
and xml_rdfa/3. If no base is
provided in Options, a base is deduced from Input.rdf(S,P,O)
terms
extracted from DOM according to the RDFa specification. Options
processed:
lang
vocab
library(semweb/rdfa)
as loader for HTML RDFa
files.
The library(semweb/rdfs)
library adds interpretation of the triple store in terms of concepts
from RDF-Schema (RDFS). There are two ways to provide support for more
high level languages in RDF. One is to view such languages as a set of entailment
rules. In this model the rdfs library would provide a predicate rdfs/3
providing the same functionality as rdf/3
on union of the raw graph and triples that can be derived by applying
the RDFS entailment rules.
Alternatively, RDFS provides a view on the RDF store in terms of
individuals, classes, properties, etc., and we can provide predicates
that query the database with this view in mind. This is the approach
taken in the library(semweb/rdfs.p)
l library, providing
calls like
rdfs_individual_of(?Resource, ?Class)
.5The
SeRQL language is based on querying the deductive closure of the triple
set. The SWI-Prolog SeRQL library provides entailment modules
that take the approach outlined above.
The predicates in this section explore the rdfs:subPropertyOf
,
rdfs:subClassOf
and rdf:type
relations. Note
that the most fundamental of these, rdfs:subPropertyOf
, is
also used by rdf_has/[3,4].
rdfs:subPropertyOf
relation. It can be used to test as well
as generate sub-properties or super-properties. Note that the commonly
used semantics of this predicate is wired into rdf_has/[3,4].bugThe
current implementation cannot deal with cycles.bugThe
current implementation cannot deal with predicates that are an rdfs:subPropertyOf
of rdfs:subPropertyOf
, such as owl:samePropertyAs
.rdfs:subClassOf
relation. It can be used to test as well as
generate sub-classes or super-classes.bugThe
current implementation cannot deal with cycles.rdf:type
property that refers to
Class or a sub-class thereof. Can be used to test, generate
classes Resource belongs to or generate individuals described
by Class.
The
RDF construct rdf:parseType
=Collection
constructs a list using the rdf:first
and rdf:next
relations.
rdf:List
or rdfs:Container
.rdf:List
into a Prolog list of objects.user
.
Complex projects require RDF resources from many locations and
typically wish to load these in different combinations. For example
loading a small subset of the data for debugging purposes or load a
different set of files for experimentation. The library library(semweb/rdf_library.pl)
manages sets of RDF files spread over different locations, including
file and network locations. The original version of this library
supported metadata about collections of RDF sources in an RDF file
called Manifest. The current version supports both the
VoID format and the
original format. VoID files (typically named void.ttl
) can
use elements from the RDF Manifest vocabulary to support features that
are not supported by VoID.
A manifest file is an RDF file, often in
Turtle
format, that provides meta-data about RDF resources. Often, a manifest
will describe RDF files in the current directory, but it can also
describe RDF resources at arbitrary URL locations. The RDF schema for
RDF library meta-data can be found in rdf_library.ttl
. The
namespace for the RDF library format is defined as
http://www.swi-prolog.org/rdf/library/
and abbreviated as
lib
.
The schema defines three root classes: lib:Namespace, lib:Ontology and lib:Virtual, which we describe below.
/
, the basename of each loaded file is
appended to the given source. Defaults to the URL the RDF is loaded
from.wn-basic
and wn-full
as virtual resources. The lib:Virtual resource
is used as a second rdf:type:
<wn-basic> a lib:Ontology ; a lib:Virtual ; ...
@prefix lib: <http://www.swi-prolog.org/rdf/library/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . [ a lib:Namespace ; lib:mnemonic "rdfs" ; lib:namespace rdfs: ] .
The VoID aims at resolving the same problem as the Manifest files described here. In addition, the VANN vocabulary provides the information about preferred namepaces prefixes. The RDF library manager can deal with VoID files. The following relations apply:
Dataset
and Linkset
are similar to
lib:Ontology
, but a VoID resource is always
Virtual. I.e., the VoID URI itself never refers to an RDF
document.
owl:imports
and its lib specializations are
replaced by void:subset
(referring to another VoID dataset)
and void:dataDump
(referring to a concrete document).
dcterms:description
rather than rdfs:comment
lib:source
, lib:baseURI
and lib:Cloudnode
, which have no equivalent in VoID.
vann:preferredNamespacePrefix
and
vann:preferredNamespaceUri
as alternatives to its
proprietary way for defining prefixes. The domain of these predicates is
unclear. The library recognises them regardless of the domain. Note that
the range of vann:preferredNamespaceUri
is a literal.
A disadvantage of that is that the Turtle prefix declaration cannot be
reused.Currently, the RDF metadata is not stored in the RDF database. It is processed by low-level primitives that do not perform RDFS reasoning. In particular, this means that rdfs:supPropertyOf and rdfs:subClassOf cannot be used to specialise the RDF meta vocabulary.
The initial metadata file(s) are loaded into the system using rdf_attach_library/1.
void.ttl
,
Manifest.ttl
or Manifest.rdf
is loaded (in
this order of preference).
Declared namespaces are added to the rdf-db namespace list.
Encountered ontologies are added to a private database of
rdf_list_library.pl
. Each ontology is given an
identifier, derived from the basename of the URL without the
extension. This, using the declaration below, the identifier of the
declared ontology is wn-basic
.
<wn-basic> a void:Dataset ; dcterms:title "Basic WordNet" ; ...
It is possible for the initial set of manifests to refer to RDF files that are not covered by a manifest. If such a reference is encountered while loading or listing a library, the library manager will look for a manifest file in the directory holding the referenced RDF file and load this manifest. If a manifest is found that covers the referenced file, the directives found in the manifest will be followed. Otherwise the RDF resource is simply loaded using the current defaults.
Further exploration of the library is achieved using rdf_list_library/1 or rdf_list_library/2:
rdf_list_library(Id,[])
.
Typically, a project will use a single file using the same format as a manifest file that defines alternative configurations that can be loaded. This file is loaded at program startup using rdf_attach_library/1. Users can now list the available libraries using rdf_list_library/0 and rdf_list_library/1:
1 ?- rdf_list_library. ec-core-vocabularies E-Culture core vocabularies ec-all-vocabularies All E-Culture vocabularies ec-hacks Specific hacks ec-mappings E-Culture ontology mappings ec-core-collections E-Culture core collections ec-all-collections E-Culture all collections ec-medium E-Culture medium sized data (artchive+aria) ec-all E-Culture all data
Now we can list a specific category using rdf_list_library/1.
Note this loads two additional manifests referenced by resources
encountered in
ec-mappings
. If a resource does not exist is is flagged
using
[NOT FOUND]
.
2 ?- rdf_list_library('ec-mappings'). % Loaded RDF manifest /home/jan/src/eculture/vocabularies/mappings/Manifest.ttl % Loaded RDF manifest /home/jan/src/eculture/collections/aul/Manifest.ttl <file:///home/jan/src/eculture/src/server/ec-mappings> . <file:///home/jan/src/eculture/vocabularies/mappings/mappings> . . <file:///home/jan/src/eculture/vocabularies/mappings/interface> . . . file:///home/jan/src/eculture/vocabularies/mappings/interface_class_mapping.ttl . . . file:///home/jan/src/eculture/vocabularies/mappings/interface_property_mapping.ttl . . <file:///home/jan/src/eculture/vocabularies/mappings/properties> . . . file:///home/jan/src/eculture/vocabularies/mappings/ethnographic_property_mapping.ttl . . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_properties.ttl . . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_property_semantics.ttl . . <file:///home/jan/src/eculture/vocabularies/mappings/situations> . . . file:///home/jan/src/eculture/vocabularies/mappings/eculture_situations.ttl . <file:///home/jan/src/eculture/collections/aul/aul> . . file:///home/jan/src/eculture/collections/aul/aul.rdfs . . file:///home/jan/src/eculture/collections/aul/aul.rdf . . file:///home/jan/src/eculture/collections/aul/aul9styles.rdf . . file:///home/jan/src/eculture/collections/aul/extractedperiods.rdf . . file:///home/jan/src/eculture/collections/aul/manual-periods.rdf
Resources and manifests are located either on the local filesystem or
on a network resource. The initial manifest can also be loaded from a
file or a URL. This defines the initial base URL of the
document. The base URL can be overruled using the Turtle @base
directive. Other documents can be referenced relative to this base URL
by exploiting Turtle's URI expansion rules. Turtle resources can be
specified in three ways, as absolute URLs (e.g. <http://www.example.com/rdf/ontology.rdf
>),
as relative URL to the base (e.g. <../rdf/ontology.rdf
>)
or following a
prefix (e.g. prefix:ontology).
The prefix notation is powerful as we can define multiple of them and
define resources relative to them. Unfortunately, prefixes can only be
defined as absolute URLs or URLs relative to the base URL. Notably, they
cannot be defined relative to other prefixes. In addition, a prefix can
only be followed by a Qname, which excludes .
and /
.
Easily relocatable manifests must define all resources relative to the base URL. Relocation is automatic if the manifest remains in the same hierarchy as the resources it references. If the manifest is copied elsewhere (i.e. for creating a local version) it can use @base to refer to the resource hierarchy. We can point to directories holding manifest files using @prefix declarations. There, we can reference Virtual resources using prefix:name. Here is an example, were we first give some line from the initial manifest followed by the definition of the virtual RDFS resource.
@base <http://gollem.science.uva.nl/e-culture/rdf/> . @prefix base: <base_ontologies/> . <ec-core-vocabularies> a lib:Ontology ; a lib:Virtual ; dc:title "E-Culture core vocabularies" ; owl:imports base:rdfs , base:owl , base:dc , base:vra , ...
<rdfs> a lib:Schema ; a lib:Virtual ; rdfs:comment "RDF Schema" ; lib:source rdfs: ; lib:schema <rdfs.rdfs> .
In this section we provide skeleton code for filling the RDF database from a password protected HTTP repository. The first line loads the application. Next we include modules that enable us to manage the RDF library, RDF database caching and HTTP connections. Then we setup the HTTP authentication, enable caching of processed RDF files and load the initial manifest. Finally load_data/0 loads all our RDF data.
:- use_module(server). :- use_module(library(http/http_open)). :- use_module(library(semweb/rdf_library)). :- use_module(library(semweb/rdf_cache)). :- http_set_authorization('http://www.example.org/rdf', basic(john, secret)). :- rdf_set_cache_options([ global_directory('RDF-Cache'), create_global_directory(true) ]). :- rdf_attach_library('http://www.example.org/rdf/Manifest.ttl'). %% load_data % % Load our RDF data load_data :- rdf_load_library('all').
The VoID metadata below allows for loading WordNet in the two predefined versions using one of
?- rdf_load_library('wn-basic', []). ?- rdf_load_library('wn-full', []).
@prefix void: <http://rdfs.org/ns/void#> . @prefix vann: <http://purl.org/vocab/vann/> . @prefix lib: <http://www.swi-prolog.org/rdf/library/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix wn20s: <http://www.w3.org/2006/03/wn/wn20/schema/> . @prefix wn20i: <http://www.w3.org/2006/03/wn/wn20/instances/> . [ vann:preferredNamespacePrefix "wn20i" ; vann:preferredNamespaceUri "http://www.w3.org/2006/03/wn/wn20/instances/" ] . [ vann:preferredNamespacePrefix "wn20s" ; vann:preferredNamespaceUri "http://www.w3.org/2006/03/wn/wn20/schema/" ] . <wn20-common> a void:Dataset ; dc:description "Common files between full and basic version" ; lib:source wn20i: ; void:dataDump <wordnet-attribute.rdf.gz> , <wordnet-causes.rdf.gz> , <wordnet-classifiedby.rdf.gz> , <wordnet-entailment.rdf.gz> , <wordnet-glossary.rdf.gz> , <wordnet-hyponym.rdf.gz> , <wordnet-membermeronym.rdf.gz> , <wordnet-partmeronym.rdf.gz> , <wordnet-sameverbgroupas.rdf.gz> , <wordnet-similarity.rdf.gz> , <wordnet-synset.rdf.gz> , <wordnet-substancemeronym.rdf.gz> , <wordnet-senselabels.rdf.gz> . <wn20-skos> a void:Dataset ; void:subset <wnskosmap> ; void:dataDump <wnSkosInScheme.ttl.gz> . <wnskosmap> a lib:Schema ; lib:source wn20s: ; void:dataDump <wnskosmap.rdfs> . <wnbasic-schema> a void:Dataset ; lib:source wn20s: ; void:dataDump <wnbasic.rdfs> . <wn20-basic> a void:Dataset ; a lib:CloudNode ; dc:title "Basic WordNet" ; dc:description "Light version of W3C WordNet" ; owl:versionInfo "2.0" ; lib:source wn20i: ; void:subset <wnbasic-schema> , <wn20-skos> , <wn20-common> . <wnfull-schema> a void:Dataset ; lib:source wn20s: ; void:dataDump <wnfull.rdfs> . <wn20-full> a void:Dataset ; a lib:CloudNode ; dc:title "Full WordNet" ; dc:description "Full version of W3C WordNet" ; owl:versionInfo "2.0" ; lib:source wn20i: ; void:subset <wnfull-schema> , <wn20-skos> , <wn20-common> ; void:dataDump <wordnet-antonym.rdf.gz> , <wordnet-derivationallyrelated.rdf.gz> , <wordnet-participleof.rdf.gz> , <wordnet-pertainsto.rdf.gz> , <wordnet-seealso.rdf.gz> , <wordnet-wordsensesandwords.rdf.gz> , <wordnet-frame.rdf.gz> .
This module provides a SPARQL client. For example:
?- sparql_query('select * where { ?x rdfs:label "Amsterdam" }', Row, [ host('dbpedia.org'), path('/sparql/')]). Row = row('http://www.ontologyportal.org/WordNet#WN30-108949737') ; false.
Or, querying a local server using an ASK
query:
?- sparql_query('ask { owl:Class rdfs:label "Class" }', Row, [ host('localhost'), port(3020), path('/sparql/')]). Row = true.
HTTPS servers are supported using the scheme(https)
option:
?- sparql_query('select * where { ?x rdfs:label "Amsterdam"@nl }', Row, [ scheme(https), host('query.wikidata.org'), path('/sparql') ]).
rdf(S,P,O)
for
CONSTRUCT
and DESCRIBE
queries, row(...)
for SELECT
queries and true
or false
for ASK
queries. Options are
Variables that are unbound in SPARQL (e.g., due to SPARQL optional
clauses), are bound in Prolog to the atom '$null$'
.
SELECT
query.Remaining options are passed to http_open/3. The defaults for Host, Port and Path can be set using sparql_set_server/1. The initial default for port is 80 and path is‘/sparql/`.
For example, the ClioPatria server understands the parameter
entailment
. The code below queries for all triples using
_rdfs_entailment.
?- sparql_query('select * where { ?s ?p ?o }', Row, [ search([entailment=rdfs]) ]).
Another useful option is the request_header
which, for
example, may be used to trick force a server to reply using a particular
document format:
?- sparql_query( 'select * where { ?s ?p ?o }', Row, [ host('integbio.jp'), path('/rdf/sparql'), request_header('Accept' = 'application/sparql-results+xml') ]).
sparql_set_server([ host(localhost), port(8080) path(world) ])
The default for port is 80 and path is /sparql/
.
v(Name, ...)
and Rows
is a list of row(....)
containing the column values in the
same order as the variable names.true
or false
v(Name, ...)
and Rows
is a list of row(....)
containing the column values in the
same order as the variable names.true
or false
This library provides predicates that compare RDF graphs. The current version only provides one predicate: rdf_equal_graphs/3 verifies that two graphs are identical after proper labeling of the blank nodes.
Future versions of this library may contain more advanced operations, such as diffing two graphs.
GraphA | is a list of rdf(S,P,O)
terms |
GraphB | is a list of rdf(S,P,O)
terms |
Substition | is a list if NodeA = NodeB terms. |
This module defines rules for user:portray/1 to help tracing and debugging RDF resources by printing them in a more concise representation and optionally adding comment from the label field to help the user interpreting the URL. The main predicates are:
prefix:id
writeq
prefix:label
prefix:id=label
The core infrastructure for storing and querying RDF is provided by this package, which is distributed as a core package with SWI-Prolog. ClioPatria provides a comprehensive server infrastructure on top of the semweb and http packages. ClioPatria provides a SPARQL 1.1 endpoint, linked open data (LOD) support, user management, a web interface and an extension infrastructure for programming (semantic) web applications.
Thea provides access to OWL ontologies at the level of the abstract syntax. Can interact with external DL reasoner using DIG.
RDF-DB version 3 is a major redesign of the SWI-Prolog RDF infrastructure. Nevertheles, version 3 is almost perfectly upward compatible with version 2. Below are some issues to take into consideration when upgrading.
Version 2 did not allow for modifications while read operations were in progress, for example due to an open choice point. As a consequence, operations that both queried and modified the database had to be wrapped in a transaction or the modifications had to be buffered as Prolog data structures. In both cases, the RDF store was not modified during the query phase. In version 3, modifications are allowed while read operations are in progress and follow the Prolog logical update view semantics. This is different from using a transaction in version 2, where the view for all read operations was frozen at the start of the transaction. In version 3, every read operation sees the store frozen at the moment that the operation was started.
We illustrate the difference by writing a forwards entailment rule that adds a sibling relation. In version 2, we could perform this operation using one of the following:
add_siblings_1 :- findall(S-O, ( rdf(S, f:parent, P), rdf(O, f:parent, P), S \== O ), Pairs), forall(member(S-O, Pairs), rdf_assert(S,f:sibling,O)). add_siblings_2 :- rdf_transaction( forall(( rdf(S, f:parent, P), rdf(O, f:parent, P), S \== O ), rdf_assert(S, f:sibling, O))).
In version 3, we can write this in the natural Prolog style below. In itself, this may not seem a big advantage because wrapping such operations in a transaction is often a good style anyway. The story changes with more complicated constrol structures that combine iterations with steps that depend on triples asserted in previous steps. Such scenarios can be programmed naturally in the current version.
add_siblings_3 :- forall(( rdf(S, f:parent, P), rdf(O, f:parent, P), S \== O ), rdf_assert(S, f:sibling, O)).
In version 3, code that combines queries with modification has the same semantics whether executed inside or outside a transaction. This property makes reusing such predicates predictable.
sources
is renamed into graphs
triples_by_file
is renamed into
triples_by_graph
gc
has additional argumentscore
is removed.This research was supported by the following projects: MIA and MultimediaN project (www.multimedian.nl) funded through the BSIK programme of the Dutch Government, the FP-6 project HOPS of the European Commission, the COMBINE project supported by the ONR Global NICOP grant N62909-11-1-7060 and the Dutch national program COMMIT.