This library provides high-performance C-based primitives for manipulating URIs. We decided for a C-based implementation for the much better performance on raw character manipulation. Notably, URI handling primitives are used in time-critical parts of RDF processing. This implementation is based on RFC-3986:
http://labs.apache.org/webarch/uri/rfc/rfc3986.html
The URI processing in this library is rather liberal. That is, we break URIs according to the rules, but we do not validate that the components are valid. Also, percent-decoding for IRIs is liberal. It first tries UTF-8; then ISO-Latin-1 and finally accepts %-characters verbatim.
Earlier experience has shown that strict enforcement of the URI syntax results in many errors that are accepted by many other web-document processing tools.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
scheme
, authority
, path
, search
and fragment
uri_is_global(URI) :- uri_components(URI, Components), uri_data(scheme, Components, Scheme), nonvar(Scheme), atom_length(Scheme, Len), Len > 1.
uri_normalized(URI, Base, NormalizedGlobalURI) :- uri_resolve(URI, Base, GlobalURI), uri_normalized(GlobalURI, NormalizedGlobalURI).
uri_normalized(URI, Base, NormalizedGlobalIRI) :- uri_resolve(URI, Base, GlobalURI), uri_normalized_iri(GlobalURI, NormalizedGlobalIRI).
?- uri_query_components(QS, [a=b, c('d+w'), n-'VU Amsterdam']). QS = 'a=b&c=d%2Bw&n=VU%20Amsterdam'. ?- uri_query_components('a=b&c=d%2Bw&n=VU%20Amsterdam', Q). Q = [a=b, c='d+w', n='VU Amsterdam'].
[ip]
,
returning the ip as host
, without the enclosing []
. When
constructing an authority string and the host contains :
, the
host is embraced in []
. If []
is not used correctly, the
behavior should be considered poorly defined. If there is no
balancing `]` or the host part does not end with `]`, these
characters are considered normal characters and part of the
(invalid) host name.user
, password
, host
and port
query_value
, fragment
, path
or
segment
. Besides alphanumerical characters, the following
characters are passed verbatim (the set is split in logical groups
according to RFC3986).
http
, https
, etc.)path
component. If Path is not absolute it
is taken relative to the path of URI0.Key=Value
pairs of the current search (query)
component. New values replace existing values. If KeyValues
is written as =(KeyValues) the current search component is
ignored. KeyValues is a list, whose elements are one of
Key=Value
, Key-Value
or `Key(Value)`.
Components can be removed by using a variable as value, except
from path
which can be reset using path(/)
and query which can
be dropped using query(=([]))
.