1/* Part of SWI-Prolog 2 3 Author: Jan Wielemaker and Matt Lilley 4 E-mail: J.Wielemaker@cs.vu.nl 5 WWW: http://www.swi-prolog.org 6 Copyright (c) 2012-2019, VU University Amsterdam 7 All rights reserved. 8 9 Redistribution and use in source and binary forms, with or without 10 modification, are permitted provided that the following conditions 11 are met: 12 13 1. Redistributions of source code must retain the above copyright 14 notice, this list of conditions and the following disclaimer. 15 16 2. Redistributions in binary form must reproduce the above copyright 17 notice, this list of conditions and the following disclaimer in 18 the documentation and/or other materials provided with the 19 distribution. 20 21 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 24 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 25 COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 26 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 27 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 28 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 29 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 31 ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 32 POSSIBILITY OF SUCH DAMAGE. 33*/ 34 35:- module(archive, 36 [ archive_open/3, % +Stream, -Archive, +Options 37 archive_open/4, % +Stream, +Mode, -Archive, +Options 38 archive_create/3, % +OutputFile, +InputFileList, +Options 39 archive_close/1, % +Archive 40 archive_property/2, % +Archive, ?Property 41 archive_next_header/2, % +Archive, -Name 42 archive_open_entry/2, % +Archive, -EntryStream 43 archive_header_property/2, % +Archive, ?Property 44 archive_set_header_property/2, % +Archive, +Property 45 archive_extract/3, % +Archive, +Dir, +Options 46 47 archive_entries/2, % +Archive, -Entries 48 archive_data_stream/3, % +Archive, -DataStream, +Options 49 archive_foldl/4 % :Goal, +Archive, +State0, -State 50 ]). 51:- autoload(library(error), 52 [existence_error/2,domain_error/2,must_be/2]). 53:- autoload(library(filesex), 54 [directory_file_path/3,make_directory_path/1]). 55:- autoload(library(lists),[member/2]). 56:- autoload(library(option),[option/3,option/2]). 57 58:- meta_predicate 59 archive_foldl( , , , ). 60 61/** <module> Access several archive formats 62 63This library uses _libarchive_ to access a variety of archive formats. 64The following example lists the entries in an archive: 65 66 ``` 67 list_archive(File) :- 68 setup_call_cleanup( 69 archive_open(File, Archive, []), 70 ( repeat, 71 ( archive_next_header(Archive, Path) 72 -> format('~w~n', [Path]), 73 fail 74 ; ! 75 ) 76 ), 77 archive_close(Archive)). 78 ``` 79 80Here is an alternative way of doing this, using archive_foldl/4, a 81higher level predicate. 82 83 ``` 84 list_archive2(File) :- 85 list_archive(File, Headers), 86 maplist(writeln, Headers). 87 88 list_archive2(File, Headers) :- 89 archive_foldl(add_header, File, Headers, []). 90 91 add_header(Path, _, [Path|Paths], Paths). 92 ``` 93 94Here is another example which counts the files in the archive and prints 95file type information, also using archive_foldl/4: 96 97 ``` 98 print_entry(Path, Handle, Cnt0, Cnt1) :- 99 archive_header_property(Handle, filetype(Type)), 100 format('File ~w is of type ~w~n', [Path, Type]), 101 Cnt1 is Cnt0 + 1. 102 103 list_archive_headers(File) :- 104 archive_foldl(print_entry, File, 0, FileCount), 105 format('We have ~w files', [FileCount]). 106 ``` 107 108@see https://github.com/libarchive/libarchive/ 109*/ 110 111:- use_foreign_library(foreign(archive4pl)). 112 113%! archive_open(+Data, -Archive, +Options) is det. 114% 115% Wrapper around archive_open/4 that opens the archive in read mode. 116 117archive_open(Stream, Archive, Options) :- 118 archive_open(Stream, read, Archive, Options). 119 120:- predicate_options(archive_open/4, 4, 121 [ close_parent(boolean), 122 filter(oneof([all,bzip2,compress,gzip,grzip,lrzip, 123 lzip,lzma,lzop,none,rpm,uu,xz])), 124 format(oneof([all,'7zip',ar,cab,cpio,empty,gnutar, 125 iso9660,lha,mtree,rar,raw,tar,xar,zip])) 126 ]). 127:- predicate_options(archive_create/3, 3, 128 [ directory(atom), 129 pass_to(archive_open/4, 4) 130 ]). 131 132%! archive_open(+Data, +Mode, -Archive, +Options) is det. 133% 134% Open the archive in Data and unify Archive with a handle to the 135% opened archive. Data is either a file name (as accepted by open/4) 136% or a stream that has been opened with the option type(binary). If 137% Data is an already open stream, the caller is responsible for 138% closing it (but see option close_parent(true)) and must not close 139% the stream until after archive_close/1 is called. Mode is either 140% `read` or `write`. Details are controlled by Options. Typically, 141% the option close_parent(true) is used to also close the Data stream 142% if the archive is closed using archive_close/1. For other options 143% when reading, the defaults are typically fine - for writing, a valid 144% format and optional filters must be specified. The option 145% format(raw) must be used to process compressed streams that do not 146% contain explicit entries (e.g., gzip'ed data) unambibuously. The 147% =raw= format creates a _pseudo archive_ holding a single member 148% named =data=. 149% 150% * close_parent(+Boolean) 151% If this option is =true= (default =false=), Data stream is closed 152% when archive_close/1 is called on Archive. If Data is a file name, 153% the default is =true=. 154% 155% * compression(+Compression) 156% Synomym for filter(Compression). Deprecated. 157% 158% * filter(+Filter) 159% Support the indicated filter. This option may be 160% used multiple times to support multiple filters. In read mode, 161% If no filter options are provided, =all= is assumed. In write 162% mode, =none= is assumed. 163% Supported values are =all=, =bzip2=, =compress=, =gzip=, 164% =grzip=, =lrzip=, =lzip=, =lzma=, =lzop=, =none=, =rpm=, =uu= 165% and =xz=. The value =all= is default for read, =none= for write. 166% 167% * format(+Format) 168% Support the indicated format. This option may be used 169% multiple times to support multiple formats in read mode. 170% In write mode, you must supply a single format. If no format 171% options are provided, =all= is assumed for read mode. Note that 172% =all= does *not* include =raw= and =mtree=. To open both archive 173% and non-archive files, _both_ format(all) and 174% format(raw) and/or format(mtree) must be specified. Supported 175% values are: =all=, =7zip=, =ar=, =cab=, =cpio=, =empty=, =gnutar=, 176% =iso9660=, =lha=, =mtree=, =rar=, =raw=, =tar=, =xar= and =zip=. 177% The value =all= is default for read. 178% 179% Note that the actually supported compression types and formats may 180% vary depending on the version and installation options of the 181% underlying libarchive library. This predicate raises a domain or 182% permission error if the (explicitly) requested format or filter is 183% not supported. 184% 185% @error domain_error(filter, Filter) if the requested 186% filter is invalid (e.g., `all` for writing). 187% @error domain_error(format, Format) if the requested 188% format type is not supported. 189% @error permission_error(set, filter, Filter) if the requested 190% filter is not supported. 191 192archive_open(stream(Stream), Mode, Archive, Options) :- 193 !, 194 archive_open_stream(Stream, Mode, Archive, Options). 195archive_open(Stream, Mode, Archive, Options) :- 196 is_stream(Stream), 197 !, 198 archive_open_stream(Stream, Mode, Archive, Options). 199archive_open(File, Mode, Archive, Options) :- 200 open(File, Mode, Stream, [type(binary)]), 201 catch(archive_open_stream(Stream, Mode, Archive, [close_parent(true)|Options]), 202 E, (close(Stream, [force(true)]), throw(E))). 203 204 205%! archive_close(+Archive) is det. 206% 207% Close the archive. If close_parent(true) was specified in 208% archive_open/4, the underlying entry stream is closed too. If there 209% is an entry opened with archive_open_entry/2, actually closing the 210% archive is delayed until the stream associated with the entry is 211% closed. This can be used to open a stream to an archive entry 212% without having to worry about closing the archive: 213% 214% ``` 215% archive_open_named(ArchiveFile, EntryName, Stream) :- 216% archive_open(ArchiveFile, Archive, []), 217% archive_next_header(Archive, EntryName), 218% archive_open_entry(Archive, Stream), 219% archive_close(Archive). 220% ``` 221 222 223%! archive_property(+Handle, ?Property) is nondet. 224% 225% True when Property is a property of the archive Handle. Defined 226% properties are: 227% 228% * filters(List) 229% True when the indicated filters are applied before reaching 230% the archive format. 231 232archive_property(Handle, Property) :- 233 defined_archive_property(Property), 234 Property =.. [Name,Value], 235 archive_property(Handle, Name, Value). 236 237defined_archive_property(filter(_)). 238 239 240%! archive_next_header(+Handle, -Name) is semidet. 241% 242% Forward to the next entry of the archive for which Name unifies 243% with the pathname of the entry. Fails silently if the end of 244% the archive is reached before success. Name is typically 245% specified if a single entry must be accessed and unbound 246% otherwise. The following example opens a Prolog stream to a 247% given archive entry. Note that _Stream_ must be closed using 248% close/1 and the archive must be closed using archive_close/1 249% after the data has been used. See also setup_call_cleanup/3. 250% 251% ``` 252% open_archive_entry(ArchiveFile, EntryName, Stream) :- 253% open(ArchiveFile, read, In, [type(binary)]), 254% archive_open(In, Archive, [close_parent(true)]), 255% archive_next_header(Archive, EntryName), 256% archive_open_entry(Archive, Stream). 257% ``` 258% 259% @error permission_error(next_header, archive, Handle) if a 260% previously opened entry is not closed. 261 262%! archive_open_entry(+Archive, -Stream) is det. 263% 264% Open the current entry as a stream. Stream must be closed. 265% If the stream is not closed before the next call to 266% archive_next_header/2, a permission error is raised. 267 268 269%! archive_set_header_property(+Archive, +Property) 270% 271% Set Property of the current header. Write-mode only. Defined 272% properties are: 273% 274% * filetype(-Type) 275% Type is one of =file=, =link=, =socket=, =character_device=, 276% =block_device=, =directory= or =fifo=. It appears that this 277% library can also return other values. These are returned as 278% an integer. 279% * mtime(-Time) 280% True when entry was last modified at time. 281% * size(-Bytes) 282% True when entry is Bytes long. 283% * link_target(-Target) 284% Target for a link. Currently only supported for symbolic 285% links. 286 287%! archive_header_property(+Archive, ?Property) 288% 289% True when Property is a property of the current header. Defined 290% properties are: 291% 292% * filetype(-Type) 293% Type is one of =file=, =link=, =socket=, =character_device=, 294% =block_device=, =directory= or =fifo=. It appears that this 295% library can also return other values. These are returned as 296% an integer. 297% * mtime(-Time) 298% True when entry was last modified at time. 299% * size(-Bytes) 300% True when entry is Bytes long. 301% * link_target(-Target) 302% Target for a link. Currently only supported for symbolic 303% links. 304% * format(-Format) 305% Provides the name of the archive format applicable to the 306% current entry. The returned value is the lowercase version 307% of the output of archive_format_name(). 308% * permissions(-Integer) 309% True when entry has the indicated permission mask. 310 311archive_header_property(Archive, Property) :- 312 ( nonvar(Property) 313 -> true 314 ; header_property(Property) 315 ), 316 archive_header_prop_(Archive, Property). 317 318header_property(filetype(_)). 319header_property(mtime(_)). 320header_property(size(_)). 321header_property(link_target(_)). 322header_property(format(_)). 323header_property(permissions(_)). 324 325 326%! archive_extract(+ArchiveFile, +Dir, +Options) 327% 328% Extract files from the given archive into Dir. Supported 329% options: 330% 331% * remove_prefix(+Prefix) 332% Strip Prefix from all entries before extracting. If Prefix 333% is a list, then each prefix is tried in order, succeding at 334% the first one that matches. If no prefixes match, an error 335% is reported. If Prefix is an atom, then that prefix is removed. 336% * exclude(+ListOfPatterns) 337% Ignore members that match one of the given patterns. 338% Patterns are handed to wildcard_match/2. 339% * include(+ListOfPatterns) 340% Include members that match one of the given patterns. 341% Patterns are handed to wildcard_match/2. The `exclude` 342% options takes preference if a member matches both the `include` 343% and the `exclude` option. 344% 345% @error existence_error(directory, Dir) if Dir does not exist 346% or is not a directory. 347% @error domain_error(path_prefix(Prefix), Path) if a path in 348% the archive does not start with Prefix 349% @tbd Add options 350 351archive_extract(Archive, Dir, Options) :- 352 ( exists_directory(Dir) 353 -> true 354 ; existence_error(directory, Dir) 355 ), 356 setup_call_cleanup( 357 archive_open(Archive, Handle, Options), 358 extract(Handle, Dir, Options), 359 archive_close(Handle)). 360 361extract(Archive, Dir, Options) :- 362 archive_next_header(Archive, Path), 363 !, 364 option(include(InclPatterns), Options, ['*']), 365 option(exclude(ExclPatterns), Options, []), 366 ( archive_header_property(Archive, filetype(file)), 367 \+ matches(ExclPatterns, Path), 368 matches(InclPatterns, Path) 369 -> archive_header_property(Archive, permissions(Perm)), 370 remove_prefix(Options, Path, ExtractPath), 371 directory_file_path(Dir, ExtractPath, Target), 372 file_directory_name(Target, FileDir), 373 make_directory_path(FileDir), 374 setup_call_cleanup( 375 archive_open_entry(Archive, In), 376 setup_call_cleanup( 377 open(Target, write, Out, [type(binary)]), 378 copy_stream_data(In, Out), 379 close(Out)), 380 close(In)), 381 set_permissions(Perm, Target) 382 ; true 383 ), 384 extract(Archive, Dir, Options). 385extract(_, _, _). 386 387%! matches(+Patterns, +Path) is semidet. 388% 389% True when Path matches a pattern in Patterns. 390 391matches([], _Path) :- 392 !, 393 fail. 394matches(Patterns, Path) :- 395 split_string(Path, "/", "/", Parts), 396 member(Segment, Parts), 397 Segment \== "", 398 member(Pattern, Patterns), 399 wildcard_match(Pattern, Segment), 400 !. 401 402remove_prefix(Options, Path, ExtractPath) :- 403 ( option(remove_prefix(Remove), Options) 404 -> ( is_list(Remove) 405 -> ( member(P, Remove), 406 atom_concat(P, ExtractPath, Path) 407 -> true 408 ; domain_error(path_prefix(Remove), Path) 409 ) 410 ; ( atom_concat(Remove, ExtractPath, Path) 411 -> true 412 ; domain_error(path_prefix(Remove), Path) 413 ) 414 ) 415 ; ExtractPath = Path 416 ). 417 418%! set_permissions(+Perm:integer, +Target:atom) 419% 420% Restore the permissions. Currently only restores the executable 421% permission. 422 423set_permissions(Perm, Target) :- 424 Perm /\ 0o100 =\= 0, 425 !, 426 '$mark_executable'(Target). 427set_permissions(_, _). 428 429 430 /******************************* 431 * HIGH LEVEL PREDICATES * 432 *******************************/ 433 434%! archive_entries(+Archive, -Paths) is det. 435% 436% True when Paths is a list of pathnames appearing in Archive. 437 438archive_entries(Archive, Paths) :- 439 setup_call_cleanup( 440 archive_open(Archive, Handle, []), 441 contents(Handle, Paths), 442 archive_close(Handle)). 443 444contents(Handle, [Path|T]) :- 445 archive_next_header(Handle, Path), 446 !, 447 contents(Handle, T). 448contents(_, []). 449 450%! archive_data_stream(+Archive, -DataStream, +Options) is nondet. 451% 452% True when DataStream is a stream to a data object inside 453% Archive. This predicate transparently unpacks data inside 454% _possibly nested_ archives, e.g., a _tar_ file inside a _zip_ 455% file. It applies the appropriate decompression filters and thus 456% ensures that Prolog reads the plain data from DataStream. 457% DataStream must be closed after the content has been processed. 458% Backtracking opens the next member of the (nested) archive. This 459% predicate processes the following options: 460% 461% - meta_data(-Data:list(dict)) 462% If provided, Data is unified with a list of filters applied to 463% the (nested) archive to open the current DataStream. The first 464% element describes the outermost archive. Each Data dict 465% contains the header properties (archive_header_property/2) as 466% well as the keys: 467% 468% - filters(Filters:list(atom)) 469% Filter list as obtained from archive_property/2 470% - name(Atom) 471% Name of the entry. 472% 473% Non-archive files are handled as pseudo-archives that hold a 474% single stream. This is implemented by using archive_open/3 with 475% the options `[format(all),format(raw)]`. 476 477archive_data_stream(Archive, DataStream, Options) :- 478 option(meta_data(MetaData), Options, _), 479 archive_content(Archive, DataStream, MetaData, []). 480 481archive_content(Archive, Entry, [EntryMetadata|PipeMetadataTail], PipeMetadata2) :- 482 archive_property(Archive, filter(Filters)), 483 repeat, 484 ( archive_next_header(Archive, EntryName) 485 -> findall(EntryProperty, 486 archive_header_property(Archive, EntryProperty), 487 EntryProperties), 488 dict_create(EntryMetadata, archive_meta_data, 489 [ filters(Filters), 490 name(EntryName) 491 | EntryProperties 492 ]), 493 ( EntryMetadata.filetype == file 494 -> archive_open_entry(Archive, Entry0), 495 ( EntryName == data, 496 EntryMetadata.format == raw 497 -> % This is the last entry in this nested branch. 498 % We therefore close the choicepoint created by repeat/0. 499 % Not closing this choicepoint would cause 500 % archive_next_header/2 to throw an exception. 501 !, 502 PipeMetadataTail = PipeMetadata2, 503 Entry = Entry0 504 ; PipeMetadataTail = PipeMetadata1, 505 open_substream(Entry0, 506 Entry, 507 PipeMetadata1, 508 PipeMetadata2) 509 ) 510 ; fail 511 ) 512 ; !, 513 fail 514 ). 515 516open_substream(In, Entry, ArchiveMetadata, PipeTailMetadata) :- 517 setup_call_cleanup( 518 archive_open(stream(In), 519 Archive, 520 [ close_parent(true), 521 format(all), 522 format(raw) 523 ]), 524 archive_content(Archive, Entry, ArchiveMetadata, PipeTailMetadata), 525 archive_close(Archive)). 526 527 528%! archive_create(+OutputFile, +InputFiles, +Options) is det. 529% 530% Convenience predicate to create an archive in OutputFile with 531% data from a list of InputFiles and the given Options. 532% 533% Besides options supported by archive_open/4, the following 534% options are supported: 535% 536% * directory(+Directory) 537% Changes the directory before adding input files. If this is 538% specified, paths of input files must be relative to 539% Directory and archived files will not have Directory 540% as leading path. This is to simulate =|-C|= option of 541% the =tar= program. 542% 543% * format(+Format) 544% Write mode supports the following formats: `7zip`, `cpio`, 545% `gnutar`, `iso9660`, `xar` and `zip`. Note that a particular 546% installation may support only a subset of these, depending on 547% the configuration of `libarchive`. 548 549archive_create(OutputFile, InputFiles, Options) :- 550 must_be(list(text), InputFiles), 551 option(directory(BaseDir), Options, '.'), 552 setup_call_cleanup( 553 archive_open(OutputFile, write, Archive, Options), 554 archive_create_1(Archive, BaseDir, BaseDir, InputFiles, top), 555 archive_close(Archive)). 556 557archive_create_1(_, _, _, [], _) :- !. 558archive_create_1(Archive, Base, Current, ['.'|Files], sub) :- 559 !, 560 archive_create_1(Archive, Base, Current, Files, sub). 561archive_create_1(Archive, Base, Current, ['..'|Files], Where) :- 562 !, 563 archive_create_1(Archive, Base, Current, Files, Where). 564archive_create_1(Archive, Base, Current, [File|Files], Where) :- 565 directory_file_path(Current, File, Filename), 566 archive_create_2(Archive, Base, Filename), 567 archive_create_1(Archive, Base, Current, Files, Where). 568 569archive_create_2(Archive, Base, Directory) :- 570 exists_directory(Directory), 571 !, 572 entry_name(Base, Directory, Directory0), 573 archive_next_header(Archive, Directory0), 574 time_file(Directory, Time), 575 archive_set_header_property(Archive, mtime(Time)), 576 archive_set_header_property(Archive, filetype(directory)), 577 archive_open_entry(Archive, EntryStream), 578 close(EntryStream), 579 directory_files(Directory, Files), 580 archive_create_1(Archive, Base, Directory, Files, sub). 581archive_create_2(Archive, Base, Filename) :- 582 entry_name(Base, Filename, Filename0), 583 archive_next_header(Archive, Filename0), 584 size_file(Filename, Size), 585 time_file(Filename, Time), 586 archive_set_header_property(Archive, size(Size)), 587 archive_set_header_property(Archive, mtime(Time)), 588 setup_call_cleanup( 589 archive_open_entry(Archive, EntryStream), 590 setup_call_cleanup( 591 open(Filename, read, DataStream, [type(binary)]), 592 copy_stream_data(DataStream, EntryStream), 593 close(DataStream)), 594 close(EntryStream)). 595 596entry_name('.', Name, Name) :- !. 597entry_name(Base, Name, EntryName) :- 598 directory_file_path(Base, EntryName, Name). 599 600%! archive_foldl(:Goal, +Archive, +State0, -State). 601% 602% Operates like foldl/4 but for the entries in the archive. For each 603% member of the archive, Goal called as `call(:Goal, +Path, +Handle, 604% +S0, -S1). Here, `S0` is current state of the _accumulator_ 605% (starting with State0) and `S1` is the next state of the 606% accumulator, producing State after the last member of the archive. 607% 608% @see archive_header_property/2, archive_open/4. 609% 610% @arg Archive File name or stream to be given to archive_open/[3,4]. 611 612archive_foldl(Goal, Archive, State0, State) :- 613 setup_call_cleanup( 614 archive_open(Archive, Handle, [close_parent(true)]), 615 archive_foldl_(Goal, Handle, State0, State), 616 archive_close(Handle) 617 ). 618 619archive_foldl_(Goal, Handle, State0, State) :- 620 ( archive_next_header(Handle, Path) 621 -> call(Goal, Path, Handle, State0, State1), 622 archive_foldl_(Goal, Handle, State1, State) 623 ; State = State0 624 ). 625 626 627 /******************************* 628 * MESSAGES * 629 *******************************/ 630 631:- multifile prolog:error_message//1. 632 633prologerror_message(archive_error(Code, Message)) --> 634 [ 'Archive error (code ~p): ~w'-[Code, Message] ]