The central module of the RDF infrastructure is library(semweb/rdf_db)
.
It provides storage and indexed querying of RDF triples. RDF data is
stored as quintuples. The first three elements denote the RDF triple.
The extra Graph and Line elements provide information
about the origin of the triple.
The actual storage is provided by the foreign language (C)
module. Using a dedicated C-based implementation we can reduce memory
usage and improve indexing capabilities, for example by providing a
dedicated index to support entailment over rdfs:subPropertyOf
.
Currently the following indexes are provided (S=subject, P=predicate,
O=object, G=graph):
- S, P, O, SP, PO, SPO, G, SG, PG
- Predicates connected by rdfs:subPropertyOf are combined in a predicate
cloud. The system causes multiple predicates in the cloud to share
the same hash. The cloud maintains a 2-dimensional array that expresses
the closure of all
rdfs:subPropertyOf
relations. This index
supports rdf_has/3 to query a
property and all its children efficiently.
- Additional indexes for predicates, resources and graphs allow
enumerating these objects without duplicates. For example, using rdf_resource/1
we enumerate all resources in the database only once, while enumeration
using e.g.,
(rdf(R,_,_);rdf(_,_,R))
normally produces many
duplicate answers.
- Literal Objects are combined in a skip list after case
normalization. This provides for efficient case-insensitive search,
prefix and range search. The plugin library
library(semweb/litindex)
provides indexed search on tokens inside literals.
- [nondet]rdf(?Subject,
?Predicate, ?Object)
- Elementary query for triples. Subject and Predicate
are atoms representing the fully qualified URL of the resource. Object
is either an atom representing a resource or
literal(Value)
if the object is a literal value. If a value of the form
NameSpaceID:LocalName is provided it is expanded to a ground atom using expand_goal/2.
This implies you can use this construct in compiled code without paying
a performance penalty. Literal values take one of the following forms:
- Atom
- If the value is a simple atom it is the textual representation of a
string literal without explicit type or language qualifier.
- lang(LangID, Atom)
- Atom represents the text of a string literal qualified with
the given language.
- type(TypeID, Value)
- Used for attributes qualified using the
rdf:datatype
TypeID. The Value is either the textual
representation or a natural Prolog representation. See the option
convert_typed_literal(:Convertor) of the parser. The storage layer
provides efficient handling of atoms, integers (64-bit) and floats
(native C-doubles). All other data is represented as a Prolog record.
For literal querying purposes, Object can be of the form
literal(+Query, -Value)
, where Query is one of the terms
below. If the Query takes a literal argument and the value has a numeric
type numerical comparison is performed.
- plain(+Text)
- Perform exact match and demand the language or type qualifiers to match.
This query is fully indexed.
- icase(+Text)
- Perform a full but case-insensitive match. This query is fully indexed.
- exact(+Text)
- Same as
icase(Text)
. Backward compatibility.
- substring(+Text)
- Match any literal that contains Text as a case-insensitive
substring. The query is not indexed on Object.
- word(+Text)
- Match any literal that contains Text delimited by a non
alpha-numeric character, the start or end of the string. The query is
not indexed on Object.
- prefix(+Text)
- Match any literal that starts with Text. This call is
intended for completion. The query is indexed using the skip list of
literals.
- ge(+Literal)
- Match any literal that is equal or larger than Literal in the
ordered set of literals.
- gt(+Literal)
- Match any literal that is larger than Literal in the ordered
set of literals.
- eq(+Literal)
- Match any literal that is equal to Literal in the ordered set
of literals.
- le(+Literal)
- Match any literal that is equal or smaller than Literal in
the ordered set of literals.
- lt(+Literal)
- Match any literal that is smaller than Literal in the ordered
set of literals.
- between(+Literal1, +Literal2)
- Match any literal that is between Literal1 and Literal2
in the ordered set of literals. This may include both Literal1
and
Literal2.
- like(+Pattern)
- Match any literal that matches Pattern case insensitively,
where the‘*' character in Pattern matches zero or more
characters.
Backtracking never returns duplicate triples. Duplicates can be
retrieved using rdf/4. The predicate rdf/3
raises a type-error if called with improper arguments. If rdf/3
is called with a term literal(_)
as Subject or Predicate
object it fails silently. This allows for graph matching goals like
rdf(S,P,O)
,rdf(O,P2,O2)
to proceed without
errors.
- [nondet]rdf(?Subject,
?Predicate, ?Object, ?Source)
- As rdf/3 but in addition query the
graph to which the triple belongs. Unlike rdf/3,
this predicate does not remove duplicates from the result set.
Source | is a term Graph:Line. If Source
is instatiated, passing an atom is the same as passing Atom:_. |
- [nondet]rdf_has(?Subject,
+Predicate, ?Object)
- Succeeds if the triple
rdf(Subject, Predicate, Object)
is
true exploiting the rdfs:subPropertyOf predicate as well as inverse
predicates declared using rdf_set_predicate/2
with the
inverse_of
property.
- [nondet]rdf_has(?Subject,
+Predicate, ?Object, -RealPredicate)
- Same as rdf_has/3, but RealPredicate
is unified to the actual predicate that makes this relation true. RealPredicate
must be
Predicate or an rdfs:subPropertyOf Predicate. If
an inverse match is found, RealPredicate is the term
inverse_of(Pred)
.
- [nondet]rdf_reachable(?Subject,
+Predicate, ?Object)
- Is true if Object can be reached from Subject
following the transitive predicate Predicate or a
sub-property thereof, while repecting the
symetric(true)
or inverse_of(P2)
properties.
If used with either Subject or Object unbound,
it first returns the origin, followed by the reachable nodes in
breadth-first search-order. The implementation internally looks one
solution ahead and succeeds deterministically on the last solution. This
predicate never generates the same node twice and is robust against
cycles in the transitive relation.
With all arguments instantiated, it succeeds deterministically if a
path can be found from Subject to Object.
Searching starts at Subject, assuming the branching factor is
normally lower. A call with both Subject and Object
unbound raises an instantiation error. The following example generates
all subclasses of rdfs:Resource:
?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource').
X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ;
X = 'http://www.w3.org/2000/01/rdf-schema#Class' ;
X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ;
...
- [nondet]rdf_reachable(?Subject,
+Predicate, ?Object, +MaxD, -D)
- Same as rdf_reachable/3, but
in addition, MaxD limits the number of edges expanded and D
is unified with the‘distance' between
Subject and Object. Distance 0 means Subject
and Object are the same resource. MaxD can be the
constant
infinite
to impose no distance-limit.
The predicates below enumerate the basic objects of the RDF store.
Most of these predicates also enumerate objects that are not associated
to any currently visible triple. Objects are retained as long as they
are visible in active queries or snapshots. After that, some are
reclaimed by the RDF garbage collector, while others are never
reclaimed.
- [nondet]rdf_subject(?Resource)
- True if Resource appears as a subject. This query respects
the visibility rules implied by the logical update view.
- See also
- rdf_resource/1.
- [nondet]rdf_resource(?Resource)
- True when Resource is a resource used as a subject or object
in a triple.
This predicate is primarily intended as a way to process all
resources without processing resources twice. The user must be aware
that some of the returned resources may not appear in any
visible triple.
- [nondet]rdf_current_predicate(?Predicate)
- True when Predicate is a currently known predicate.
Predicates are created if a triples is created that uses this predicate
or a property of the predicate is set using rdf_set_predicate/2.
The predicate may (no longer) have triples associated with it.
Note that resources that have rdf:type
rdf:Property
are not automatically included in the result-set of this predicate,
while all resources that appear as the second argument of a
triple are included.
- See also
- rdf_predicate_property/2.
- [nondet]rdf_current_literal(-Literal)
- True when Literal is a currently known literal. Enumerates
each unique literal exactly once. Note that it is possible that the
literal only appears in already deleted triples. Deleted triples may be
locked due to active queries, transactions or snapshots or may not yet
be reclaimed by the garbage collector.
- [nondet]rdf_graph(?Graph)
- True when Graph is an existing graph.
- [nondet]rdf_current_ns(:Prefix,
?URI)
-
- deprecated
- Use rdf_current_prefix/2.
The predicates below modify the RDF store directly. In addition, data
may be loaded using rdf_load/2 or
by restoring a persistent database using rdf_attach_db/2.
Modifications follow the Prolog logical update view semantics,
which implies that modifications remain invisible to already running
queries. Further isolation can be achieved using
rdf_transaction/3.
- [det]rdf_assert(+Subject,
+Predicate, +Object)
- Assert a new triple into the database. This is equivalent to
rdf_assert/4 using Graph
user
. Subject
and Predicate are resources. Object is either a
resource or a term literal(Value)
. See rdf/3
for an explanation of Value for typed and language qualified literals.
All arguments are subject to name-space expansion. Complete duplicates
(including the same graph and‘line' and with a compatible‘lifespan')
are not added to the database.
- [det]rdf_assert(+Subject,
+Predicate, +Object, +Graph)
- As rdf_assert/3, adding the
predicate to the indicated named graph.
Graph | is either the name of a graph (an
atom) or a term
Graph:Line, where Line is an integer that denotes a line
number. |
- [det]rdf_retractall(?Subject,
?Predicate, ?Object)
- Remove all matching triples from the database. As
rdf_retractall/4 using an
unbound graph.
- [det]rdf_retractall(?Subject,
?Predicate, ?Object, ?Graph)
- As rdf_retractall/3, also
matching Graph. This is particulary useful to remove all
triples coming from a loaded file. See also
rdf_unload/1.
- [det]rdf_update(+Subject,
+Predicate, +Object, ++Action)
- [det]rdf_update(+Subject,
+Predicate, +Object, +Graph, ++Action)
- Replaces one of the three (four) fields on the matching triples
depending on Action:
- subject(Resource)
- Changes the first field of the triple.
- predicate(Resource)
- Changes the second field of the triple.
- object(Object)
- Changes the last field of the triple to the given resource or
literal(Value)
.
- graph(Graph)
- Moves the triple from its current named graph to Graph. This
only works with rdf_update/5
and throws an error when used with rdf_update/4.
The update semantics of the RDF database follows the conventional
Prolog
logical update view. In addition, the RDF database supports
transactions and snapshots.
- [semidet]rdf_transaction(:Goal)
- Same as
rdf_transaction(Goal, user, [])
. See rdf_transaction/3.
- [semidet]rdf_transaction(:Goal,
+Id)
- Same as
rdf_transaction(Goal, Id, [])
. See rdf_transaction/3.
- [semidet]rdf_transaction(:Goal,
+Id, +Options)
- Run Goal in an RDF transaction. Compared to the ACID model,
RDF transactions have the following properties:
- Modifications inside the transactions become all atomically visible
to the outside world if Goal succeeds or remain invisible if Goal
fails or throws an exception. I.e., the atomicy property is fully
supported.
- Consistency is not guaranteed. Later versions may implement
consistency constraints that will be checked serialized just before the
actual commit of a transaction.
- Concurrently executing transactions do not infuence each other.
I.e., the isolation property is fully supported.
- Durability can be activated by loading
library(semweb/rdf_persistency)
.
Processed options are:
- snapshot(+Snapshot)
- Execute Goal using the state of the RDF store as stored in
Snapshot. See rdf_snapshot/1. Snapshot
can also be the atom
true
, which implies that an anonymous
snapshot is created at the current state of the store. Modifications due
to executing Goal are only visible to Goal.
- [det]rdf_snapshot(-Snapshot)
- Take a snapshot of the current state of the RDF store. Later, goals may
be executed in the context of the database at this moment using rdf_transaction/3
with the
snapshot
option. A snapshot created outside a
transaction exists until it is deleted. Snapshots taken inside a
transaction can only be used inside this transaction.
- [det]rdf_delete_snapshot(+Snapshot)
- Delete a snapshot as obtained from rdf_snapshot/1.
After this call, resources used for maintaining the snapshot become
subject to garbage collection.
- [nondet]rdf_active_transaction(?Id)
- True if Id is the identifier of a transaction in the context
of which this call is executed. If Id is not instantiated,
backtracking yields transaction identifiers starting with the innermost
nested transaction. Transaction identifier terms are not copied, need
not be ground and can be instantiated during the transaction.
- [nondet]rdf_current_snapshot(?Term)
- True when Term is a currently known snapshot.
- bug
- Enumeration of snapshots is slow.
- [semidet]rdf_is_resource(@Term)
- True if Term is an RDF resource. Note that this is merely a
type-test; it does not mean this resource is involved in any triple.
Blank nodes are also considered resources.
- See also
- rdf_is_bnode/1
- rdf_is_bnode(+Id)
- Tests if a resource is a blank node (i.e. is an anonymous resource). A
blank node is represented as an atom that starts with
_:
.
For backward compatibility reason, __
is also considered to
be a blank node.
- See also
- rdf_bnode/1.
- [semidet]rdf_is_literal(@Term)
- True if Term is an RDF literal object. Currently only checks
for groundness and the literal functor.
The RDF library can read and write triples in RDF/XML and a
proprietary binary format. There is a plugin interface defined to
support additional formats. The library(semweb/turtle)
uses
this plugin API to support loading Turtle files using rdf_load/2.
- [det]rdf_load(+FileOrList)
- Same as
rdf_load(FileOrList, [])
. See rdf_load/2.
- [det]rdf_load(+FileOrList,
:Options)
- Load RDF data. Options provides additional processing
options. Defined options are:
- blank_nodes(+ShareMode)
- How to handle equivalent blank nodes. If
share
(default),
equivalent blank nodes are shared in the same resource.
- base_uri(+URI)
- URI that is used for rdf:about="" and other RDF constructs
that are relative to the base uri. Default is the source URL.
- concurrent(+Jobs)
- If FileOrList is a list of files, process the input files
using Jobs threads concurrently. Default is the mininum of
the number of cores and the number of inputs. Higher values can be
useful when loading inputs from (slow) network connections. Using 1
(one) does not use separate worker threads.
- format(+Format)
- Specify the source format explicitly. Normally this is deduced from the
filename extension or the mime-type. The core library understands the
formats xml (RDF/XML) and triples (internal quick load and cache
format). Plugins, such as
library(semweb/turtle)
extend the
set of recognised extensions.
- graph(?Graph)
- Named graph in which to load the data. It is not allowed to load
two sources into the same named graph. If Graph is unbound,
it is unified to the graph into which the data is loaded. The default
graph is a
file://
URL when loading a file or, if the
specification is a URL, its normalized version without the optional #fragment.
- if(Condition)
- When to load the file. One of
true
, changed
(default) or
not_loaded
.
- modified(-Modified)
- Unify Modified with one of
not_modified
, cached(File)
,
last_modified(Stamp)
or unknown
.
- cache(Bool)
- If
false
, do not use or create a cache file.
- register_namespaces(Bool)
- If
true
(default false
), register xmlns
namespace declarations or Turtle @prefix
prefixes using
rdf_register_prefix/3
if there is no conflict.
- silent(+Bool)
- If
true
, the message reporting completion is printed using
level silent
. Otherwise the level is informational
.
See also print_message/2.
- prefixes(-Prefixes)
- Returns the prefixes defined in the source data file as a list of pairs.
- multifile
Boolean+
- Indicate that the addressed graph may be populated with triples from
multiple sources. This disables caching and avoids that an rdf_load/2
call affecting the specified graph cleans the graph.
Other options are forwarded to process_rdf/3.
By default,
rdf_load/2 only loads RDF/XML
from files. It can be extended to load data from other formats and
locations using plugins. The full set of plugins relevant to support
different formats and locations is below:
:- use_module(library(semweb/turtle)). % Turtle and TriG
:- use_module(library(semweb/rdf_ntriples)).
:- use_module(library(semweb/rdf_zlib_plugin)).
:- use_module(library(semweb/rdf_http_plugin)).
:- use_module(library(http/http_ssl_plugin)).
- See also
- rdf_db:rdf_open_hook/3,
library(semweb/rdf_persistency)
and
library(semweb/rdf_cache)
- [det]rdf_unload(+Source)
- Identify the graph loaded from Source and use rdf_unload_graph/1
to erase this graph.
- deprecated
- For compatibility, this predicate also accepts a graph name instead of a
source specification. Please update your code to use
rdf_unload_graph/1.
- [det]rdf_save(+Out)
- Same as
rdf_save(Out, [])
. See rdf_save/2
for details.
- [det]rdf_save(+Out,
:Options)
- Write RDF data as RDF/XML. Options is a list of one or more
of the following options:
- graph(+Graph)
- Save only triples associated to the given named Graph.
- anon(Bool)
- If
false
(default true
) do not save blank
nodes that do not appear (indirectly) as object of a named resource.
- base_uri(URI)
- BaseURI used. If present, all URIs that can be represented relative to
this base are written using their shorthand. See also
write_xml_base
option.
- convert_typed_literal(:Convertor)
- Call Convertor(-Type, -Content, +RDFObject), providing the
opposite for the convert_typed_literal option of the RDF parser.
- document_language(+Lang)
- Initial
xml:lang
saved with rdf:RDF element.
- encoding(Encoding)
- Encoding for the output. Either utf8 or iso_latin_1.
- inline(+Bool)
- If
true
(default false
), inline resources when
encountered for the first time. Normally, only bnodes are handled this
way.
- namespaces(+List)
- Explicitly specify saved namespace declarations. See
rdf_save_header/2 option
namespaces for details.
- sorted(+Boolean)
- If
true
(default false
), emit subjects sorted
on the full URI. Useful to make file comparison easier.
- write_xml_base(Bool)
- If
false
, do not include the xml:base
declaration that is written normally when using the
base_uri
option.
- xml_attributes(+Bool)
- If
false
(default true
), never use xml
attributes to save plain literal attributes, i.e., always used an XML
element as in <name>Joe</name>
.
Out | Location to save the data. This can also
be a file-url (file://path ) or a stream wrapped in a term stream(Out) . |
- See also
- rdf_save_db/1
- rdf_make
- Reload all loaded files that have been modified since the last time they
were loaded.
Partial save
Sometimes it is necessary to make more arbitrary selections of
material to be saved or exchange RDF descriptions over an open network
link. The predicates in this section provide for this. Character
encoding issues are derived from the encoding of the Stream,
providing support for
utf8
, iso_latin_1
and ascii
.
- Save XML document header, doctype and open the RDF environment. This
predicate also sets up the namespace notation.
Save an RDF header, with the XML header, DOCTYPE, ENTITY and opening
the rdf:RDF element with appropriate namespace declarations. It uses the
primitives from section 3.5 to generate the required namespaces and
desired short-name. Options is one of:
- graph(+URI)
- Only search for namespaces used in triples that belong to the given
named graph.
- namespaces(+List)
- Where List is a list of namespace abbreviations. With this
option, the expensive search for all namespaces that may be used by your
data is omitted. The namespaces
rdf
and rdfs
are added to the provided List. If a namespace is not
declared, the resource is emitted in non-abreviated form.
- [det]
- Finish XML generation and write the document footer.
- See also
- rdf_save_header/2, rdf_save_subject/3.
- [det]rdf_save_subject(+Out,
+Subject:resource, +Options)
- Save the triples associated to Subject to Out. Options:
- graph(+Graph)
- Only save properties from Graph.
- base_uri(+URI)
- convert_typed_literal(:Goal)
- document_language(+XMLLang)
- See also
- rdf_save/2 for a description of
these options.
Fast loading and saving
Loading and saving RDF format is relatively slow. For this reason we
designed a binary format that is more compact, avoids the complications
of the RDF parser and avoids repetitive lookup of (URL) identifiers.
Especially the speed improvement of about 25 times is worth-while when
loading large databases. These predicates are used for caching by
rdf_load/2 under certain
conditions as well as for maintaining persistent snapshots of the
database using
library(semweb/rdf_persistency)
.
- [det]rdf_save_db(+File)
- [det]rdf_save_db(+File,
+Graph)
- Save triples into File in a quick-to-load binary format. If Graph
is supplied only triples flagged to originate from that database are
added. Files created this way can be loaded using
rdf_load_db/1.
- [det]rdf_load_db(+File)
- Load triples from a file created using rdf_save_db/2.
Many RDF stores turned triples into quadruples. This store is no
exception, initially using the 4th argument to store the filename from
which the triple was loaded. Currently, the 4th argument is the RDF
named graph. A named graph maintains some properties, notably to
track origin, changes and modified state.
- [det]rdf_create_graph(+Graph)
- Create an RDF graph without triples. Succeeds silently if the graph
already exists.
- [det]rdf_unload_graph(+Graph)
- Remove Graph from the RDF store. Succeeds silently if the
named graph does not exist.
- [nondet]rdf_graph_property(?Graph,
?Property)
- True when Property is a property of Graph. Defined
properties are:
- hash(Hash)
- Hash is the (MD5-)hash for the content of Graph.
- modified(Boolean)
- True if the graph is modified since it was loaded or
rdf_set_graph/2 was called
with
modified(false)
.
- source(Source)
- The graph is loaded from the Source (a URL)
- source_last_modified(?Time)
- Time is the last-modified timestamp of Source at the moment
the graph was loaded from Source.
- triples(Count)
- True when Count is the number of triples in Graph.
Additional graph properties can be added by defining rules for the
multifile predicate property_of_graph/2.
Currently, the following extensions are defined:
library(semweb/rdf_persistency)
- persistent(Boolean)
- Boolean is
true
if the graph is persistent.
- [det]rdf_set_graph(+Graph,
+Property)
- Set properties of Graph. Defined properties are:
- modified(false)
- Set the modified state of Graph to false.
Literal values are ordered and indexed using a skip list. The
aim of this index is threefold.
- Unlike hash-tables, binary trees allow for efficient
prefix and range matching. Prefix matching is useful in
interactive applications to provide feedback while typing such as
auto-completion.
- Having a table of unique literals we generate creation and
destruction events (see rdf_monitor/2).
These events can be used to maintain additional indexing on literals,
such as‘by word'. See
library(semweb/litindex)
.
As string literal matching is most frequently used for searching
purposes, the match is executed case-insensitive and after removal of
diacritics. Case matching and diacritics removal is based on Unicode
character properties and independent from the current locale. Case
conversion is based on the‘simple uppercase mapping' defined by
Unicode and diacritic removal on the‘decomposition type'. The
approach is lightweight, but somewhat simpleminded for some languages.
The tables are generated for Unicode characters upto 0x7fff. For more
information, please check the source-code of the mapping-table generator
unicode_map.pl
available in the sources of this package.
Currently the total order of literals is first based on the type of
literal using the ordering numeric < string <
term Numeric values (integer and float) are ordered by value,
integers preceed floats if they represent the same value. Strings are
sorted alphabetically after case-mapping and diacritic removal as
described above. If they match equal, uppercase preceeds lowercase and
diacritics are ordered on their unicode value. If they still compare
equal literals without any qualifier preceeds literals with a type
qualifier which preceeds literals with a language qualifier. Same
qualifiers (both type or both language) are sorted alphabetically.
The ordered tree is used for indexed execution of
literal(prefix(Prefix), Literal)
as well as literal(like(Like), Literal)
if Like does not start with a‘*'. Note that results of
queries that use the tree index are returned in alphabetical order.
The predicates below form an experimental interface to provide more
reasoning inside the kernel of the rdb_db engine. Note that symetric
,
inverse_of
and transitive
are not yet
supported by the rest of the engine. Also note that there is no relation
to defined RDF properties. Properties that have no triples are not
reported by this predicate, while predicates that are involved in
triples do not need to be defined as an instance of rdf:Property.
- [det]rdf_set_predicate(+Predicate,
+Property)
- Define a property of the predicate. This predicate currently supports
the following properties:
- symmetric(+Boolean)
- Set/unset the predicate as being symmetric. Using
symmetric(true)
is the same as inverse_of(Predicate)
,
i.e., creating a predicate that is the inverse of itself.
- transitive(+Boolean)
- Sets the transitive property.
- inverse_of(+Predicate2)
- Define Predicate as the inverse of Predicate2. An
inverse relation is deleted using
inverse_of([])
.
The transitive
property is currently not used. The symmetric
and inverse_of
properties are considered by rdf_has/3,4
and
rdf_reachable/3.
- To be done
- Maintain these properties based on OWL triples.
- rdf_predicate_property(?Predicate,
?Property)
- Query properties of a defined predicate. Currently defined properties
are given below.
- symmetric(Bool)
- True if the predicate is defined to be symetric. I.e., {A} P
{B} implies {B} P {A}. Setting symmetric is equivalent to
inverse_of(Self)
.
- inverse_of(Inverse)
- True if this predicate is the inverse of Inverse. This
property is used by rdf_has/3, rdf_has/4, rdf_reachable/3
and
rdf_reachable/5.
- transitive(Bool)
- True if this predicate is transitive. This predicate is currently not
used. It might be used to make rdf_has/3
imply
rdf_reachable/3 for
transitive predicates.
- triples(Triples)
- Unify Triples with the number of existing triples using this
predicate as second argument. Reporting the number of triples is
intended to support query optimization.
- rdf_subject_branch_factor(-Float)
- Unify Float with the average number of triples associated
with each unique value for the subject-side of this relation. If there
are no triples the value 0.0 is returned. This value is cached with the
predicate and recomputed only after substantial changes to the triple
set associated to this relation. This property is intended for path
optimalisation when solving conjunctions of rdf/3
goals.
- rdf_object_branch_factor(-Float)
- Unify Float with the average number of triples associated
with each unique value for the object-side of this relation. In addition
to the comments with the
rdf_subject_branch_factor
property, uniqueness of the object value is computed from the hash key
rather than the actual values.
- rdfs_subject_branch_factor(-Float)
- Same as
rdf_subject_branch_factor
, but also considering
triples of‘subPropertyOf' this relation. See also rdf_has/3.
- rdfs_object_branch_factor(-Float)
- Same as
rdf_object_branch_factor
, but also considering
triples of‘subPropertyOf' this relation. See also rdf_has/3.
- See also
- rdf_set_predicate/2.
Prolog code often contains references to constant resources with a
known
prefix (also known as XML namespaces). For example,
http://www.w3.org/2000/01/rdf-schema#Class
refers to the
most general notion of an RDFS class. Readability and maintability
concerns require for abstraction here. The RDF database maintains a
table of known prefixes. This table can be queried using rdf_current_ns/2
and can be extended using rdf_register_ns/3.
The prefix database is used to expand prefix:local
terms
that appear as arguments to calls which are known to accept a resource.
This expansion is achieved by Prolog preprocessor using expand_goal/2.
- [nondet]rdf_current_prefix(:Alias,
?URI)
- Query predefined prefixes and prefixes defined with
rdf_register_prefix/2
and local prefixes defined with
rdf_prefix/2. If Alias is
unbound and one URI is the prefix of another, the longest is
returned first. This allows turning a resource into a prefix/local
couple using the simple enumeration below. See rdf_global_id/2.
rdf_current_prefix(Prefix, Expansion),
atom_concat(Expansion, Local, URI),
- [det]rdf_register_prefix(+Prefix,
+URI)
- [det]rdf_register_prefix(+Prefix,
+URI, +Options)
- Register Prefix as an abbreviation for URI. Options:
- force(Boolean)
- If
true
, replace existing namespace alias. Please note that
replacing a namespace is dangerous as namespaces affect preprocessing.
Make sure all code that depends on a namespace is compiled after
changing the registration.
- keep(Boolean)
- If
true
and Alias is already defined, keep the original
binding for Prefix and succeed silently.
Without options, an attempt to redefine an alias raises a permission
error.
Predefined prefixes are:
Explicit expansion is achieved using the predicates below. The
predicate rdf_equal/2 performs
this expansion at compile time, while the other predicates do it at
runtime.
- rdf_equal(?Resource1,
?Resource2)
- Simple equality test to exploit goal-expansion.
- [semidet]rdf_global_id(?IRISpec,
:IRI)
- Convert between Prefix:Local and full IRI (an atom). If IRISpec
is an atom, it is simply unified with IRI. This predicate
fails silently if IRI is an RDF literal.
Note that this predicate is a meta-predicate on its output argument.
This is necessary to get the module context while the first argument may
be of the form (:)/2. The above mode description is correct, but should
be interpreted as (?,?).
- Errors
existence_error(rdf_prefix, Prefix)
- See also
- - rdf_equal/2 provides a compile
time alternative
- The rdf_meta/1 directive asks for
compile time expansion of arguments.
- bug
- Error handling is incomplete. In its current implementation the same
code is used for compile-time expansion and to facilitate runtime
conversion and checking. These use cases have different requirements.
- [semidet]rdf_global_object(+Object,
:GlobalObject)
- [semidet]rdf_global_object(-Object,
:GlobalObject)
- Same as rdf_global_id/2, but
intended for dealing with the object part of a triple, in particular the
type for typed literals. Note that the predicate is a meta-predicate on
the output argument. This is necessary to get the module context while
the first argument may be of the form (:)/2.
- Errors
existence_error(rdf_prefix, Prefix)
- [det]rdf_global_term(+TermIn,
:GlobalTerm)
- Performs rdf_global_id/2 on
predixed IRIs and rdf_global_object/2
on RDF literals, by recursively analysing the term. Note that the
predicate is a meta-predicate on the output argument. This is necessary
to get the module context while the first argument may be of the form
(:)/2.
Terms of the form Prefix:Local
that appear in TermIn
for which
Prefix is not defined are not replaced. Unlike rdf_global_id/2
and
rdf_global_object/2, no
error is raised.
Namespace handling for custom predicates
If we implement a new predicate based on one of the predicates of the
semweb libraries that expands namespaces, namespace expansion is not
automatically available to it. Consider the following code computing the
number of distinct objects for a certain property on a certain object.
cardinality(S, P, C) :-
( setof(O, rdf_has(S, P, O), Os)
-> length(Os, C)
; C = 0
).
Now assume we want to write labels/2
that returns the number of distict labels of a resource:
labels(S, C) :-
cardinality(S, rdfs:label, C).
This code will not work because rdfs:label
is not
expanded at compile time. To make this work, we need to add an rdf_meta/1
declaration.
:- rdf_meta
cardinality(r,r,-).
The example below defines the rule concept/1.
:- use_module(library(semweb/rdf_db)). % for rdf_meta
:- use_module(library(semweb/rdfs)). % for rdfs_individual_of
:- rdf_meta
concept(r).
%% concept(?C) is nondet.
%
% True if C is a concept.
concept(C) :-
rdfs_individual_of(C, skos:'Concept').
In addition to expanding calls, rdf_meta/1
also causes expansion of
clause heads for predicates that match a declaration. This is
typically used write Prolog statements about resources. The following
example produces three clauses with expanded (single-atom) arguments:
:- use_module(library(semweb/rdf_db)).
:- rdf_meta
label_predicate(r).
label_predicate(rdfs:label).
label_predicate(skos:prefLabel).
label_predicate(skos:altLabel).
This section describes the remaining predicates of the
library(semweb/rdf_db)
module.
- rdf_bnode(-Id)
- Generate a unique anonymous identifier for a subject.
- [nondet]rdf_source_location(+Subject,
-Location)
- True when triples for Subject are loaded from Location.
Location | is a term File:Line. |
- [det]rdf_generation(-Generation)
- True when Generation is the current generation of the
database. Each modification to the database increments the generation.
It can be used to check the validity of cached results deduced from the
database. Committing a non-empty transaction increments the generation
by one.
When inside a transaction, Generation is unified to a term
TransactionStartGen + InsideTransactionGen. E.g., 4+3
means that the transaction was started at generation 4 of the global
database and we have created 3 new generations inside the transaction.
Note that this choice of representation allows for comparing generations
using Prolog arithmetic. Comparing a generation in one transaction with
a generation in another transaction is meaningless.
- rdf_estimate_complexity(?Subject,
?Predicate, ?Object, -Complexity)
- Return the number of alternatives as indicated by the database internal
hashed indexing. This is a rough measure for the number of alternatives
we can expect for an rdf_has/3
call using the given three arguments. When called with three variables,
the total number of triples is returned. This estimate is used in query
optimisation. See also rdf_predicate_property/2
and
rdf_statistics/1 for
additional information to help optimizers.
- [nondet]rdf_statistics(?KeyValue)
- Obtain statistics on the RDF database. Defined statistics are:
- graphs(-Count)
- Number of named graphs.
- triples(-Count)
- Total number of triples in the database. This is the number of asserted
triples minus the number of retracted ones. The number of visible
triples in a particular context may be different due to visibility rules
defined by the logical update view and transaction isolation.
- resources(-Count)
- Number of resources that appear as subject or object in a triple. See rdf_resource/1.
- properties(-Count)
- Number of current predicates. See rdf_current_predicate/1.
- literals(-Count)
- Number of current literals. See rdf_current_literal/1.
- gc(GCCount, ReclaimedTriples,
ReindexedTriples, Time)
- Information about the garbage collector.
- searched_nodes(-Count)
- Number of nodes expanded by rdf_reachable/3
and
rdf_reachable/5.
- lookup(rdf(S,P,O,G), Count)
- Number of queries that have been performed for this particular
instantiation pattern. Each of S,P,O,G
is either + or -. Fails in case the number of performed queries is zero.
- hash_quality(rdf(S,P,O,G), Buckets, Quality,
PendingResize)
- Statistics on the index for this pattern. Indices are created lazily on
the first relevant query.
- triples_by_graph(Graph, Count)
- This statistics is produced for each named graph. See
triples
for the interpretation of this value.
- [semidet]rdf_match_label(+How,
+Pattern, +Label)
- True if Label matches Pattern according to How. How
is one of
icase
, substring
, word
, prefix
or like
. For backward compatibility, exact
is
a synonym for icase
.
- [semidet]lang_matches(+Lang,
+Pattern)
- True if Lang matches Pattern. This implements XML
language matching conform RFC 4647. Both Lang and Pattern
are dash-separated strings of identifiers or (for Pattern)
the wildcard *. Identifiers are matched case-insensitive and a * matches
any number of identifiers. A short pattern is the same as *.
- [semidet]lang_equal(+Lang1,
+Lang2)
- True if two RFC language specifiers denote the same language
- See also
- lang_matches/2.
- rdf_reset_db
- Remove all triples from the RDF database and reset all its statistics.
- bug
- This predicate checks for active queries, but this check is not properly
synchronized and therefore the use of this predicate is unsafe in
multi-threaded contexts. It is mainly used to run functionality tests
that need to start with an empty database.
- [det]rdf_version(-Version)
- True when Version is the numerical version-id of this
library. The version is computed as
Major*10000 + Minor*100 + Patch.
Storing RDF triples in main memory provides much better performance
than using external databases. Unfortunately, although memory is fairly
cheap these days, main memory is severely limited when compared to
disks. Memory usage breaks down to the following categories. Rough
estimates of the memory usage is given for 64-bit systems. 32-bit
system use slightly more than half these amounts.
- Actually storing the triples. A triple is stored in a C struct of
144 bytes. This struct both holds the quintuple, some bookkeeping
information and the 10 next-pointers for the (max) to hash tables.
- The bucket array for the hashes. Each bucket maintains a
head, and tail pointer, as well as a count for the number
of entries. The bucket array is allocated if a particular index is
created, which implies the first query that requires the index. Each
bucket requires 24 bytes.
Bucket arrays are resized if necessary. Old triples remain at their
original location. This implies that a query may need to scan multiple
buckets. The garbage collector may relocate old indexed triples. It does
so by copying the old triple. The old triple is later reclaimed by GC.
Reindexed triples will be reused, but many reindexed triples may result
in a significant memory fragmentation.
- Resources are maintained in a separate table to support
rdf_resource/1. A resources
requires approximately 32 bytes.
- Identical literals are shared (see rdf_current_literal/1)
and stored in a skip list. A literal requires approximately 40
bytes, excluding the atom used for the lexical representation.
- Resources are stored in the Prolog atom-table. Atoms with the
average length of a resource require approximately 88 bytes.
The hash parameters can be controlled with rdf_set/1.
Applications that are tight on memory and for which the query
characteristics are more or less known can optimize performance and
memory by fixing the hash-tables. By fixing the hash-tables we can
tailor them to the frequent query patterns, we avoid the need for to
check multiple hash buckets (see above) and we avoid memory
fragmentation due to optimizing triples for resized hashes.
set_hash_parameters :-
rdf_set(hash(s, size, 1048576)),
rdf_set(hash(p, size, 1024)),
rdf_set(hash(sp, size, 2097152)),
rdf_set(hash(o, size, 1048576)),
rdf_set(hash(po, size, 2097152)),
rdf_set(hash(spo, size, 2097152)),
rdf_set(hash(g, size, 1024)),
rdf_set(hash(sg, size, 1048576)),
rdf_set(hash(pg, size, 2048)).
- [det]rdf_set(+Term)
- Set properties of the RDF store. Currently defines:
- hash(+Hash, +Parameter, +Value)
- Set properties for a triple index. Hash is one of
s
,
p
, sp
, o
, po
, spo
, g
, sg
or pg
. Parameter is one of:
- size
- Value defines the number of entries in the hash-table.
Value is rounded down to a power of 2. After setting
the size explicitly, auto-sizing for this table is disabled. Setting the
size smaller than the current size results in a
permission_error
exception.
- average_chain_len
- Set maximum average collision number for the hash.
- optimize_threshold
- Related to resizing hash-tables. If 0, all triples are moved to the new
size by the garbage collector. If more then zero, those of the last Value
resize steps remain at their current location. Leaving cells at their
current location reduces memory fragmentation and slows down access.
The garbage collector
The RDF store has a garbage collector that runs in a separate thread
named =__rdf_GC=. The garbage collector removes the following objects:
- Triples that have died before the the generation of last still
active query.
- Entailment matrices for
rdfs:subPropertyOf
relations
that are related to old queries.
In addition, the garbage collector reindexes triples associated to
the hash-tables before the table was resized. The most recent resize
operation leads to the largest number of triples that require
reindexing, while the oldest resize operation causes the largest
slowdown. The parameter optimize_threshold
controlled by rdf_set/1
can be used to determine the number of most recent resize operations for
which triples will not be reindexed. The default is 2.
Normally, the garbage collector does it job in the background at a
low priority. The predicate rdf_gc/0
can be used to reclaim all garbage and optimize all indexes.Warming
up the database
The RDF store performs many operations lazily or in background
threads. For maximum performance, perform the following steps:
- Load all the data without doing queries or retracting data in
between. This avoids creating the indexes and therefore the need to
resize them.
- Perform each of the indexed queries. The following call performs
this. Note that it is irrelevant whether or not the query succeeds.
warm_indexes :-
ignore(rdf(s, _, _)),
ignore(rdf(_, p, _)),
ignore(rdf(_, _, o)),
ignore(rdf(s, p, _)),
ignore(rdf(_, p, o)),
ignore(rdf(s, p, o)),
ignore(rdf(_, _, _, g)),
ignore(rdf(s, _, _, g)),
ignore(rdf(_, p, _, g)).
- Duplicate adminstration is initialized in the background after the
first call that returns a significant amount of duplicates. Creating the
adminstration can be forced by calling rdf_update_duplicates/0.
Predicates:
- [det]rdf_gc
- Run the RDF-DB garbage collector until no garbage is left and all tables
are fully optimized. Under normal operation a separate thread with
identifier
__rdf_GC
performs garbage collection as long as
it is considered‘useful'.
Using rdf_gc/0 should only be
needed to ensure a fully clean database for analysis purposes such as
leak detection.
- [det]rdf_update_duplicates
- Update the duplicate administration of the RDF store. This marks every
triple that is potentionally a duplicate of another as duplicate. Being
potentially a duplicate means that subject, predicate and object are
equivalent and the life-times of the two triples overlap.
The duplicates marks are used to reduce the administrative load of
avoiding duplicate answers. Normally, the duplicates are marked using a
background thread that is started on the first query that produces a
substantial amount of duplicates.