minio/docs/large-bucket/DESIGN.md
Harshavardhana fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00

4.8 KiB

Command-line

NAME:
  minio server - Start object storage server.

USAGE:
  minio server [FLAGS] DIR1 [DIR2..]
  minio server [FLAGS] DIR{1...64}

DIR:
  DIR points to a directory on a filesystem. When you want to combine multiple drives
  into a single large system, pass one directory per filesystem separated by space.
  You may also use a `...` convention to abbreviate the directory arguments. Remote
  directories in a distributed setup are encoded as HTTP(s) URIs.

Limitations

  • Minimum of 4 disks are needed for distributed erasure coded configuration.
  • Maximum of 32 distinct nodes are supported in distributed configuration.

Common usage

Single disk filesystem export

minio server dir1

Standalone erasure coded configuration with 4 disks.

minio server dir1 dir2 dir3 dir4

Standalone erasure coded configuration with 4 sets with 16 disks each.

minio server dir{1...64}

Distributed erasure coded configuration with 64 sets with 16 disks each.

minio server http://host{1...16}/export{1...64} - good

Other usages

Advanced use cases with multiple ellipses

Standalone erasure coded configuration with 4 sets with 16 disks each, which spawns disks across controllers.

minio server /mnt/controller{1...4}/data{1...16}

Standalone erasure coded configuration with 16 sets 16 disks per set, across mnts, across controllers.

minio server /mnt{1..4}/controller{1...4}/data{1...16}

Distributed erasure coded configuration with 2 sets 16 disks per set across hosts.

minio server http://host{1...32}/disk1

Distributed erasure coded configuration with rack level redundancy 32 sets in total, 16 disks per set.

minio server http://rack{1...4}-host{1...8}.example.net/export{1...16}

Distributed erasure coded configuration with no rack level redundancy but redundancy with in the rack we split the arguments, 32 sets in total, 16 disks per set.

minio server http://rack1-host{1...8}.example.net/export{1...16} http://rack2-host{1...8}.example.net/export{1...16} http://rack3-host{1...8}.example.net/export{1...16} http://rack4-host{1...8}.example.net/export{1...16}

Expected expansion for double ellipses

minio server http://host{1...4}/export{1...8}

Expected expansion

> http://host1/export1
> http://host2/export1
> http://host3/export1
> http://host4/export1
> http://host1/export2
> http://host2/export2
> http://host3/export2
> http://host4/export2
> http://host1/export3
> http://host2/export3
> http://host3/export3
> http://host4/export3
> http://host1/export4
> http://host2/export4
> http://host3/export4
> http://host4/export4
> http://host1/export5
> http://host2/export5
> http://host3/export5
> http://host4/export5
> http://host1/export6
> http://host2/export6
> http://host3/export6
> http://host4/export6
> http://host1/export7
> http://host2/export7
> http://host3/export7
> http://host4/export7
> http://host1/export8
> http://host2/export8
> http://host3/export8
> http://host4/export8

Backend format.json changes

New format.json has new fields

  • disk is changed to this
  • jbod is changed to sets , along with this change sets is also a two dimensional list representing total sets and disks per set.

A sample format.json looks like below

{
  "version": "1",
  "format": "xl",
  "xl": {
    "version": "2",
    "this": "4ec63786-3dbd-4a9e-96f5-535f6e850fb1",
    "sets": [
    [
      "4ec63786-3dbd-4a9e-96f5-535f6e850fb1",
      "1f3cf889-bc90-44ca-be2a-732b53be6c9d",
      "4b23eede-1846-482c-b96f-bfb647f058d3",
      "e1f17302-a850-419d-8cdb-a9f884a63c92"
    ], [
      "2ca4c5c1-dccb-4198-a840-309fea3b5449",
      "6d1e666e-a22c-4db4-a038-2545c2ccb6d5",
      "d4fa35ab-710f-4423-a7c2-e1ca33124df0",
      "88c65e8b-00cb-4037-a801-2549119c9a33"
       ]
    ],
    "distributionAlgo": "CRCMOD"
  }
}

New format-xl.go behavior is format structure is used as a opaque type, Format field signifies the format of the backend. Once the format has been identified it is now the job of the identified backend to further interpret the next structures and validate them.

type formatType string

const (
     formatFS formatType = "fs"
     formatXL            = "xl"
)

type format struct {
     Version string
     Format  BackendFormat
}

Current format

type formatXLV1 struct{
     format
     XL struct{
        Version string
        Disk string
        JBOD []string
     }
}

New format

type formatXLV2 struct {
        Version string `json:"version"`
        Format  string `json:"format"`
        XL      struct {
                Version          string     `json:"version"`
                This             string     `json:"this"`
                Sets             [][]string `json:"sets"`
                DistributionAlgo string     `json:"distributionAlgo"`
        } `json:"xl"`
}