History

Harshavardhana c13cbc64d1 fix multiple typos in documentation		2021-08-04 14:15:45 -07:00
..
examples	fix multiple typos in documentation	2021-08-04 14:15:45 -07:00
README.md	feat: support of ZIP list/get/head as S3 extension (#12267 )	2021-06-10 08:17:03 -07:00

README.md

Perform S3 operations in a ZIP content

Overview

MinIO implements an S3 extension to list, stat and download files inside a ZIP file stored in any bucket. A perfect use case scenario is when you have a lot of small files archived in multiple ZIP files. Uploading them is faster than uploading small files individually. Besides, your S3 applications will be able to access to the data with little performance overhead.

The main limitation is that to update or delete content of a file inside a ZIP file the entire ZIP file must be replaced.

How to enable S3 ZIP behavior ?

Ensure to set the following header x-minio-extract to true in your S3 requests.

How to access to files inside a ZIP archive

Accessing to contents inside an archive can be done using regular S3 API with a modified request path. You just need to append the path of the content inside the archive to the path of the archive itself.

e.g.: To download 2021/taxes.csv archived in financial.zip and stored under a bucket named company-data, you can issue a GET request using the following path 'company-data/financial.zip/2021/taxes.csv`

Contents properties

All properties except the file size are tied to the zip file. This means that modification date, headers, tags, etc. can only be set for the zip file as a whole. In similar fashion, replication will replicate the zip file as a whole and not individual files.

Code Examples

Using minio-go library Using AWS JS SDK v2 Using boto3

Requirements and limits

ListObjectsV2 can only list the most recent ZIP archive version of your object, applicable only for versioned buckets.
ListObjectsV2 API calls must be used to list zip file content.
Names inside ZIP files are kept unmodified, but some may lead to invalid paths. See Object key naming guidelines on safe names.
This API behavior is limited for following read operations on files inside a zip archive:
- HeadObject
- GetObject
- ListObjectsV2
A maximum of 100,000 files inside a single ZIP archive is recommended for best performance and memory usage trade-off.
If the ZIP file directory isn't located within the last 100MB the file will not be parsed.