minio/docs/metrics/healthcheck
Harshavardhana c0ac25bfff
fix: readiness needs to be like liveness (#9941)
Readiness as no reasoning to be cluster scope
because that is not how the k8s networking works
for pods, all the pods to a deployment are not
sharing the network in a singleton. Instead they
are run as local scopes to themselves, with
readiness failures the pod is potentially taken
out of the network to be resolvable - this
affects the distributed setup in myriad of
different ways.

Instead readiness should behave like liveness
with local scope alone, and should be a dummy
implementation.

This PR all the startup times and overal k8s
startup time dramatically improves.

Added another handler called as `/minio/health/cluster`
to understand the cluster scope health.
2020-06-30 11:28:27 -07:00
..
README.md fix: readiness needs to be like liveness (#9941) 2020-06-30 11:28:27 -07:00

MinIO Healthcheck

MinIO server exposes three un-authenticated, healthcheck endpoints liveness probe, readiness probe and a cluster probe at /minio/health/live, /minio/health/ready and /minio/health/cluster respectively.

Liveness probe

This probe always responds with '200 OK'. When liveness probe fails, Kubernetes like platforms restart the container.

          livenessProbe:
            httpGet:
              path: /minio/health/live
              port: 9000
              scheme: HTTP
            initialDelaySeconds: 3
            periodSeconds: 1
            timeoutSeconds: 1
            successThreshold: 1
            failureThreshold: 3

Readiness probe

This probe always responds with '200 OK'. When readiness probe fails, Kubernetes like platforms do not forward traffic to a pod.

          readinessProbe:
            httpGet:
              path: /minio/health/ready
              port: 9000
              scheme: HTTP
            initialDelaySeconds: 3
            periodSeconds: 1
            timeoutSeconds: 1
            successThreshold: 1
            failureThreshold: 3

Cluster probe

This probe is not useful in almost all cases, this is meant for administrators to see if quorum is available in any given cluster. The reply is '200 OK' if cluster has quorum if not it returns '503 Service Unavailable'.