Archives play two roles: they combine multiple documents into a single one and they typically provide compression and sometimes encryption or other services. Bundling multiple resources into a single archive may greatly simplify distribution and guarantee that the individual resources are consistent. SWI-Prolog provides archiving using its (rather arcane) saved-state format. See resource/3 and open_resource/3. It also provides compression by means of library(zlib).
External archives may be accessed through the process interface provided by process_create/3, but this has disadvantages. The one that motivated this library was that using external processes provide no decent platform independent access to archives. Most likely zip files come closest to platform independent access, but there are many different programs for accessing zip files that provide slightly different sets of options and the existence of any of these programs cannot be guaranteed without distributing our own bundled version. Similar arguments hold for Unix tar archives, where just about any Unix-derives system has a tar program but except for very basic commands, the command line options are not compatible and tar is not part of Windows. The only format granted on Windows is .cab, but a program to create them is not part of Windows and the .cab format is rare outside the Windows context.
Discarding availability of archive programs, each archive program
comes with its own set of command line options and its own features and
limitations. Fortunately,
libarchive
provides a consistent interface to a wealth of compression and archiving
formats. The library library(archive)
wraps this library,
providing access to archives using Prolog streams both for the archive
as a whole and the archive entries. E.g., archives may be read from
Prolog streams and each member in turn may be processed using Prolog
streams without materialising data using temporary files.
This library uses libarchive to access a variety of archive formats. The following example lists the entries in an archive:
list_archive(File) :- setup_call_cleanup( archive_open(File, Archive, []), ( repeat, ( archive_next_header(Archive, Path) -> format('~w~n', [Path]), fail ; ! ) ), archive_close(Archive)).
Here is an alternative way of doing this, using archive_foldl/4, a higher level predicate.
list_archive2(File) :- list_archive(File, Headers), maplist(writeln, Headers). list_archive2(File, Headers) :- archive_foldl(add_header, File, Headers, []). add_header(Path, _, [Path|Paths], Paths).
Here is another example which counts the files in the archive and prints file type information, also using archive_foldl/4:
print_entry(Path, Handle, Cnt0, Cnt1) :- archive_header_property(Handle, filetype(Type)), format('File ~w is of type ~w~n', [Path, Type]), Cnt1 is Cnt0 + 1. list_archive_headers(File) :- archive_foldl(print_entry, File, 0, FileCount), format('We have ~w files', [FileCount]).
type(binary)
. If
Data is an already open stream, the caller is responsible for
closing it (but see option close_parent(true)
) and must not
close the stream until after archive_close/1
is called. Mode is either
read
or write
. Details are controlled by Options.
Typically, the option close_parent(true)
is used to also
close the Data stream if the archive is closed using archive_close/1.
For other options when reading, the defaults are typically fine - for
writing, a valid format and optional filters must be specified. The
option
format(raw)
must be used to process compressed streams that
do not contain explicit entries (e.g., gzip'ed data) unambibuously. The
raw
format creates a pseudo archive holding a single
member named data
.
true
(default false
), Data
stream is closed when archive_close/1
is called on Archive. If Data is a file name, the
default is true
.filter(Compression)
. Deprecated.all
is assumed. In write mode, none
is assumed. Supported values are all
, bzip2
, compress
, gzip
,
grzip
, lrzip
, lzip
, lzma
, lzop
, none
, rpm
, uu
and xz
. The value all
is default for read, none
for write.all
is
assumed for read mode. Note that
all
does not include raw
and mtree
.
To open both archive and non-archive files, both format(all)
and
format(raw)
and/or format(mtree)
must be
specified. Supported values are: all
, 7zip
, ar
, cab
, cpio
, empty
, gnutar
,
iso9660
, lha
, mtree
, rar
, raw
, tar
, xar
and zip
. The value all
is default for read.Note that the actually supported compression types and formats may vary depending on the version and installation options of the underlying libarchive library. This predicate raises a domain or permission error if the (explicitly) requested format or filter is not supported.
domain_error(filter, Filter)
if the requested filter is
invalid (e.g., all
for writing). domain_error(format, Format)
if the requested format type
is not supported. permission_error(set, filter, Filter)
if the requested
filter is not supported.close_parent(true)
was specified in
archive_open/4, the
underlying entry stream is closed too. If there is an entry opened with archive_open_entry/2,
actually closing the archive is delayed until the stream associated with
the entry is closed. This can be used to open a stream to an archive
entry without having to worry about closing the archive:
archive_open_named(ArchiveFile, EntryName, Stream) :- archive_open(ArchiveFile, Archive, []), archive_next_header(Archive, EntryName), archive_open_entry(Archive, Stream), archive_close(Archive).
open_archive_entry(ArchiveFile, EntryName, Stream) :- open(ArchiveFile, read, In, [type(binary)]), archive_open(In, Archive, [close_parent(true)]), archive_next_header(Archive, EntryName), archive_open_entry(Archive, Stream).
permission_error(next_header, archive, Handle)
if a
previously opened entry is not closed.file
, link
, socket
, character_device
,
block_device
, directory
or fifo
.
It appears that this library can also return other values. These are
returned as an integer.file
, link
, socket
, character_device
,
block_device
, directory
or fifo
.
It appears that this library can also return other values. These are
returned as an integer.archive_format_name()
.exclude
options takes preference if a member matches both the include
and the exclude
option.existence_error(directory, Dir)
if Dir does
not exist or is not a directory. domain_error(path_prefix(Prefix), Path)
if a path in the
archive does not start with Prefix
Non-archive files are handled as pseudo-archives that hold a single
stream. This is implemented by using archive_open/3
with the options [format(all),format(raw)]
.
Besides options supported by archive_open/4, the following options are supported:
-C
option of the tar
program.cpio
,
gnutar
, iso9660
, xar
and zip
.
Note that a particular installation may support only a subset of these,
depending on the configuration of libarchive
.Archive | File name or stream to be given to archive_open/[3,4]. |
The current version is merely a proof-of-concept. It lacks writing archives and does not support many of the options of the underlying library. The main motivation for starting this library was to achieve portability of the upcomming SWI-Prolog package distribution system. Other functionality will be added on‘as needed' basis.