minio

Author	SHA1	Message	Date
Nitish Tiwari	32017454ee	fix typo in Grafana dashboard json (#12471 )	2021-06-09 08:04:12 -07:00
Nitish Tiwari	00c5d7e1b3	Add healing related metrics in official dashboard (#12456 )	2021-06-07 12:46:54 -07:00
Poorna Krishnamoorthy	3690de0c6b	Drop Pending size and count from replication metrics (#12378 ) Real-time metrics calculated in-memory rely on the initial replication metrics saved with data usage. However, this can lag behind the actual state of the cluster at the time of server restart leading to inaccurate Pending size/counts reported to Prometheus. Dropping the Pending metrics as this can be more reliably monitored by applications with replication notifications. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>	2021-05-31 20:26:52 -07:00
Nitish Tiwari	a592d3be19	fix the dashboard to use $rate_interval (#12277 ) refer https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/ for further information	2021-05-12 08:06:47 -07:00
Harshavardhana	2fd9c13b50	rename minio-cluster to minio-job as per prometheus config	2021-05-06 12:39:58 -07:00
Nitish Tiwari	ddc1e4b5b3	Update Grafana dashboard to use the new v2 cluster metrics (#12220 ) Fixes #11543	2021-05-06 14:44:03 +05:30
Harshavardhana	8a9d15ace2	update prometheus metrics with failed_count	2021-04-04 09:52:37 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00
Ritesh H Shukla	23b03dadb8	Add process uptime metric (#11844 )	2021-03-20 21:23:27 -07:00
Harshavardhana	2c198ae7b6	fix: prometheus metrics disks_online count when disks are down (#11689 ) prometheus metrics was using total disks instead of online disk count, when disks were down, this PR fixes this and also adds a new metric for total_disk_count	2021-03-03 11:18:41 -08:00
Krishna Srinivas	876b79b8d8	read-health check endpoint returns success if cluster can serve read requests (#11310 )	2021-02-09 01:00:44 -08:00
Ritesh H Shukla	c4848f9b4f	Add process start time to cluster metrics. (#11405 )	2021-02-01 23:02:18 -08:00
Ritesh H Shukla	0bf2d84f96	update new metrics url docs (#11342 )	2021-01-25 01:03:07 -08:00
Ritesh H Shukla	7575c24037	Add open FD and FD limit to cluster metrics (#11328 )	2021-01-22 18:30:16 -08:00
Harshavardhana	c080f04e66	fix: prometheus metrics link typo update to latest	2021-01-22 01:53:23 -08:00
Ritesh H Shukla	b4add82bb6	Updated Prometheus metrics (#11141 ) * Add metrics for nodes online and offline * Add cluster capacity metrics * Introduce v2 metrics	2021-01-18 20:35:38 -08:00
Harshavardhana	14792cdbc6	docs: fix the metrics formatting (#11081 )	2020-12-10 18:15:47 -08:00
Harshavardhana	97856bfebf	fix: grafana double counting for bucket usage, histrogram and objects (#11070 )	2020-12-09 20:30:37 -08:00
Nitish Tiwari	54d243cd98	fix: grafana dashboard calculating online nodes (#11041 ) Also use a generic name instead of diff names per revision	2020-12-09 00:26:42 -08:00
Ritesh H Shukla	04848dfa1c	Add documentation for bucket replication related metrics (#11055 )	2020-12-08 12:48:10 -08:00
Harshavardhana	4a564336fe	Revert "Add metrics for nodes online and offline (#11050 )" This reverts commit `f60bbdf86b`.	2020-12-08 09:23:35 -08:00
Ritesh H Shukla	f60bbdf86b	Add metrics for nodes online and offline (#11050 )	2020-12-08 01:06:27 -08:00
Poorna Krishnamoorthy	f3beb1236a	Add cache usage, total capacity to prometheus metrics (#11026 )	2020-12-07 16:35:11 -08:00
Nitish Tiwari	6ff12f5f01	Add the dashboard json file (#11028 ) This will allow users to contribute to the dashboard as needed.	2020-12-04 16:27:41 -08:00
Nitish Tiwari	de9b64834e	fix: update grafana dashboard docs (#11023 ) Refer to the official Grafana dashboard	2020-12-03 15:56:15 -08:00
Anis Elleuch	8e8ddf7233	doc: Add definition of 1KB and 1MB in prometheus (#10857 )	2020-11-09 10:05:01 -08:00
Harshavardhana	9c042a503b	remove deprecate readiness from healthcheck docs (#10659 )	2020-10-12 18:56:03 -07:00
Harshavardhana	8b74a72b21	fix: rename READY deadline to CLUSTER deadline ENV (#10535 )	2020-09-23 09:14:33 -07:00
Derek Bender	3168e93730	fix typo in healthcheck README.md (#10518 )	2020-09-18 09:52:37 -07:00
Harshavardhana	ec06089eda	fix: re-implement cluster healthcheck (#10101 )	2020-07-20 18:31:22 -07:00
Harshavardhana	3520e946a2	fix: versioning docs add more examples	2020-07-11 00:57:46 -07:00
Harshavardhana	d5ff1c8e3b	fix docs image urls to be absolute path	2020-07-11 00:27:30 -07:00
Nitish Tiwari	30c251efd3	Add Grafana dashboard (#10000 )	2020-07-09 12:01:58 -07:00
Harshavardhana	c0ac25bfff	fix: readiness needs to be like liveness (#9941 ) Readiness as no reasoning to be cluster scope because that is not how the k8s networking works for pods, all the pods to a deployment are not sharing the network in a singleton. Instead they are run as local scopes to themselves, with readiness failures the pod is potentially taken out of the network to be resolvable - this affects the distributed setup in myriad of different ways. Instead readiness should behave like liveness with local scope alone, and should be a dummy implementation. This PR all the startup times and overal k8s startup time dramatically improves. Added another handler called as `/minio/health/cluster` to understand the cluster scope health.	2020-06-30 11:28:27 -07:00
Harshavardhana	f9aa239973	fix: export prometheus metrics for cache GC triggers (#9815 ) Bonus change to use channel to serialize triggers, instead of using atomic variables. More efficient mechanism for synchronization. Co-authored-by: Nitish Tiwari <nitish@minio.io>	2020-06-15 09:05:35 -07:00
Harshavardhana	5e529a1c96	simplify context timeout for readiness (#9772 ) additionally also add CORS support to restrict for specific origin, adds a new config and updated the documentation as well	2020-06-04 14:58:34 -07:00
Harshavardhana	53aaa5d2a5	Export bucket usage counts as part of bucket metrics (#9710 ) Bonus fixes in quota enforcement to use the new datastructure and use timedValue to cache a value/reload automatically avoids one less global variable.	2020-05-27 06:45:43 -07:00
poornas	336460f67e	fix: gateway_s3_bytes_sent metric for all API methods (#9242 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2020-04-01 12:52:31 -07:00
Nitish Tiwari	6b984410d5	Add support for self-healing related metrics in Prometheus (#9079 ) Fixes #8988 Co-authored-by: Anis Elleuch <vadmeste@users.noreply.github.com> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-03-24 22:40:45 -07:00
Nitish Tiwari	63be4709b7	Add metrics support for Azure & GCS Gateway (#8954 ) We added support for caching and S3 related metrics in #8591. As a continuation, it would be helpful to add support for Azure & GCS gateway related metrics as well.	2020-02-11 21:08:01 +05:30
Kevin Humphreys	656146b699	doc: Prometheus metrics name fix (#8774 ) changed docs to reflect proper Prometheus metrics	2020-01-09 18:36:58 -08:00
Harshavardhana	5e40b9a563	fix: docs for live/ready check implementation details	2020-01-09 18:29:24 -08:00
Joe Adams	89d1221217	Fix typo in prometheus monitoring docs (#8780 )	2020-01-09 09:08:41 -08:00
Harshavardhana	5eab3db344	Fix doc reference for prometheus (#8748 )	2020-01-05 13:44:39 -08:00
Nitish Tiwari	3df7285c3c	Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591 ) This PR adds support below metrics - Cache Hit Count - Cache Miss Count - Data served from Cache (in Bytes) - Bytes received from AWS S3 - Bytes sent to AWS S3 - Number of requests sent to AWS S3 Fixes #8549	2019-12-05 23:16:06 -08:00
Praveen raj Mani	8836d57e3c	The prometheus metrics refractoring (#8003 ) The measures are consolidated to the following metrics - `disk_storage_used` : Disk space used by the disk. - `disk_storage_available`: Available disk space left on the disk. - `disk_storage_total`: Total disk space on the disk. - `disks_offline`: Total number of offline disks in current MinIO instance. - `disks_total`: Total number of disks in current MinIO instance. - `s3_requests_total`: Total number of s3 requests in current MinIO instance. - `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance. - `s3_requests_current`: Total number of active s3 requests in current MinIO instance. - `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance. - `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance. - `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance. - `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance. - `minio_version_info`: Current MinIO version with commit-id. - `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests. And this PR also modifies the current StorageInfo queries - Decouples StorageInfo from ServerInfo . - StorageInfo is enhanced to give endpoint information. NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR Fixes #7873	2019-10-22 21:01:14 -07:00
Harshavardhana	e85df07518	Add prometheus auth-type to turn-off authentication (#8356 ) Also this PR moves the original doc from cookbook to MinIO repo under docs/metrics/prometheus/ Fixes #8323	2019-10-04 23:48:59 +05:30
Praveen raj Mani	ad75683bde	Authorize prometheus endpoint with bearer token (#7640 )	2019-09-22 20:27:12 +05:30
Harshavardhana	5a28ef0d47	Bump readiness check upto 10000 go-routines (#8057 ) Most of our current workloads reach this value regularly, it doesn't make sense to keep 1000 go-routine limit.	2019-08-10 18:13:14 +05:30
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00

1 2

53 commits