Literal values are ordered and indexed using a skip list. The aim of this index is threefold.
library(semweb/litindex)
.
As string literal matching is most frequently used for searching
purposes, the match is executed case-insensitive and after removal of
diacritics. Case matching and diacritics removal is based on Unicode
character properties and independent from the current locale. Case
conversion is based on the‘simple uppercase mapping' defined by
Unicode and diacritic removal on the‘decomposition type'. The
approach is lightweight, but somewhat simpleminded for some languages.
The tables are generated for Unicode characters upto 0x7fff. For more
information, please check the source-code of the mapping-table generator
unicode_map.pl
available in the sources of this package.
Currently the total order of literals is first based on the type of literal using the ordering numeric < string < term Numeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. Strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.
The ordered tree is used for indexed execution of
literal(prefix(Prefix), Literal)
as well as literal(like(Like), Literal)
if Like does not start with a‘*'. Note that results of
queries that use the tree index are returned in alphabetical order.