minio

Author	SHA1	Message	Date
Nitish Tiwari	a592d3be19	fix the dashboard to use $rate_interval (#12277 ) refer https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/ for further information	2021-05-12 08:06:47 -07:00
Harshavardhana	2fd9c13b50	rename minio-cluster to minio-job as per prometheus config	2021-05-06 12:39:58 -07:00
Nitish Tiwari	ddc1e4b5b3	Update Grafana dashboard to use the new v2 cluster metrics (#12220 ) Fixes #11543	2021-05-06 14:44:03 +05:30
Harshavardhana	8a9d15ace2	update prometheus metrics with failed_count	2021-04-04 09:52:37 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00
Ritesh H Shukla	23b03dadb8	Add process uptime metric (#11844 )	2021-03-20 21:23:27 -07:00
Harshavardhana	2c198ae7b6	fix: prometheus metrics disks_online count when disks are down (#11689 ) prometheus metrics was using total disks instead of online disk count, when disks were down, this PR fixes this and also adds a new metric for total_disk_count	2021-03-03 11:18:41 -08:00
Krishna Srinivas	876b79b8d8	read-health check endpoint returns success if cluster can serve read requests (#11310 )	2021-02-09 01:00:44 -08:00
Ritesh H Shukla	c4848f9b4f	Add process start time to cluster metrics. (#11405 )	2021-02-01 23:02:18 -08:00
Ritesh H Shukla	0bf2d84f96	update new metrics url docs (#11342 )	2021-01-25 01:03:07 -08:00
Ritesh H Shukla	7575c24037	Add open FD and FD limit to cluster metrics (#11328 )	2021-01-22 18:30:16 -08:00
Harshavardhana	c080f04e66	fix: prometheus metrics link typo update to latest	2021-01-22 01:53:23 -08:00
Ritesh H Shukla	b4add82bb6	Updated Prometheus metrics (#11141 ) * Add metrics for nodes online and offline * Add cluster capacity metrics * Introduce v2 metrics	2021-01-18 20:35:38 -08:00
Harshavardhana	14792cdbc6	docs: fix the metrics formatting (#11081 )	2020-12-10 18:15:47 -08:00
Harshavardhana	97856bfebf	fix: grafana double counting for bucket usage, histrogram and objects (#11070 )	2020-12-09 20:30:37 -08:00
Nitish Tiwari	54d243cd98	fix: grafana dashboard calculating online nodes (#11041 ) Also use a generic name instead of diff names per revision	2020-12-09 00:26:42 -08:00
Ritesh H Shukla	04848dfa1c	Add documentation for bucket replication related metrics (#11055 )	2020-12-08 12:48:10 -08:00
Harshavardhana	4a564336fe	Revert "Add metrics for nodes online and offline (#11050 )" This reverts commit `f60bbdf86b`.	2020-12-08 09:23:35 -08:00
Ritesh H Shukla	f60bbdf86b	Add metrics for nodes online and offline (#11050 )	2020-12-08 01:06:27 -08:00
Poorna Krishnamoorthy	f3beb1236a	Add cache usage, total capacity to prometheus metrics (#11026 )	2020-12-07 16:35:11 -08:00
Nitish Tiwari	6ff12f5f01	Add the dashboard json file (#11028 ) This will allow users to contribute to the dashboard as needed.	2020-12-04 16:27:41 -08:00
Nitish Tiwari	de9b64834e	fix: update grafana dashboard docs (#11023 ) Refer to the official Grafana dashboard	2020-12-03 15:56:15 -08:00
Anis Elleuch	8e8ddf7233	doc: Add definition of 1KB and 1MB in prometheus (#10857 )	2020-11-09 10:05:01 -08:00
Harshavardhana	9c042a503b	remove deprecate readiness from healthcheck docs (#10659 )	2020-10-12 18:56:03 -07:00
Harshavardhana	8b74a72b21	fix: rename READY deadline to CLUSTER deadline ENV (#10535 )	2020-09-23 09:14:33 -07:00
Derek Bender	3168e93730	fix typo in healthcheck README.md (#10518 )	2020-09-18 09:52:37 -07:00
Harshavardhana	ec06089eda	fix: re-implement cluster healthcheck (#10101 )	2020-07-20 18:31:22 -07:00
Harshavardhana	3520e946a2	fix: versioning docs add more examples	2020-07-11 00:57:46 -07:00
Harshavardhana	d5ff1c8e3b	fix docs image urls to be absolute path	2020-07-11 00:27:30 -07:00
Nitish Tiwari	30c251efd3	Add Grafana dashboard (#10000 )	2020-07-09 12:01:58 -07:00
Harshavardhana	c0ac25bfff	fix: readiness needs to be like liveness (#9941 ) Readiness as no reasoning to be cluster scope because that is not how the k8s networking works for pods, all the pods to a deployment are not sharing the network in a singleton. Instead they are run as local scopes to themselves, with readiness failures the pod is potentially taken out of the network to be resolvable - this affects the distributed setup in myriad of different ways. Instead readiness should behave like liveness with local scope alone, and should be a dummy implementation. This PR all the startup times and overal k8s startup time dramatically improves. Added another handler called as `/minio/health/cluster` to understand the cluster scope health.	2020-06-30 11:28:27 -07:00
Harshavardhana	f9aa239973	fix: export prometheus metrics for cache GC triggers (#9815 ) Bonus change to use channel to serialize triggers, instead of using atomic variables. More efficient mechanism for synchronization. Co-authored-by: Nitish Tiwari <nitish@minio.io>	2020-06-15 09:05:35 -07:00
Harshavardhana	5e529a1c96	simplify context timeout for readiness (#9772 ) additionally also add CORS support to restrict for specific origin, adds a new config and updated the documentation as well	2020-06-04 14:58:34 -07:00
Harshavardhana	53aaa5d2a5	Export bucket usage counts as part of bucket metrics (#9710 ) Bonus fixes in quota enforcement to use the new datastructure and use timedValue to cache a value/reload automatically avoids one less global variable.	2020-05-27 06:45:43 -07:00
poornas	336460f67e	fix: gateway_s3_bytes_sent metric for all API methods (#9242 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2020-04-01 12:52:31 -07:00
Nitish Tiwari	6b984410d5	Add support for self-healing related metrics in Prometheus (#9079 ) Fixes #8988 Co-authored-by: Anis Elleuch <vadmeste@users.noreply.github.com> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-03-24 22:40:45 -07:00
Nitish Tiwari	63be4709b7	Add metrics support for Azure & GCS Gateway (#8954 ) We added support for caching and S3 related metrics in #8591. As a continuation, it would be helpful to add support for Azure & GCS gateway related metrics as well.	2020-02-11 21:08:01 +05:30
Kevin Humphreys	656146b699	doc: Prometheus metrics name fix (#8774 ) changed docs to reflect proper Prometheus metrics	2020-01-09 18:36:58 -08:00
Harshavardhana	5e40b9a563	fix: docs for live/ready check implementation details	2020-01-09 18:29:24 -08:00
Joe Adams	89d1221217	Fix typo in prometheus monitoring docs (#8780 )	2020-01-09 09:08:41 -08:00
Harshavardhana	5eab3db344	Fix doc reference for prometheus (#8748 )	2020-01-05 13:44:39 -08:00
Nitish Tiwari	3df7285c3c	Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591 ) This PR adds support below metrics - Cache Hit Count - Cache Miss Count - Data served from Cache (in Bytes) - Bytes received from AWS S3 - Bytes sent to AWS S3 - Number of requests sent to AWS S3 Fixes #8549	2019-12-05 23:16:06 -08:00
Praveen raj Mani	8836d57e3c	The prometheus metrics refractoring (#8003 ) The measures are consolidated to the following metrics - `disk_storage_used` : Disk space used by the disk. - `disk_storage_available`: Available disk space left on the disk. - `disk_storage_total`: Total disk space on the disk. - `disks_offline`: Total number of offline disks in current MinIO instance. - `disks_total`: Total number of disks in current MinIO instance. - `s3_requests_total`: Total number of s3 requests in current MinIO instance. - `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance. - `s3_requests_current`: Total number of active s3 requests in current MinIO instance. - `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance. - `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance. - `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance. - `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance. - `minio_version_info`: Current MinIO version with commit-id. - `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests. And this PR also modifies the current StorageInfo queries - Decouples StorageInfo from ServerInfo . - StorageInfo is enhanced to give endpoint information. NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR Fixes #7873	2019-10-22 21:01:14 -07:00
Harshavardhana	e85df07518	Add prometheus auth-type to turn-off authentication (#8356 ) Also this PR moves the original doc from cookbook to MinIO repo under docs/metrics/prometheus/ Fixes #8323	2019-10-04 23:48:59 +05:30
Praveen raj Mani	ad75683bde	Authorize prometheus endpoint with bearer token (#7640 )	2019-09-22 20:27:12 +05:30
Harshavardhana	5a28ef0d47	Bump readiness check upto 10000 go-routines (#8057 ) Most of our current workloads reach this value regularly, it doesn't make sense to keep 1000 go-routine limit.	2019-08-10 18:13:14 +05:30
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Harshavardhana	bab4c90c45	Fix broken links in docs (#6700 )	2018-10-25 11:39:31 +05:30
Nitish Tiwari	41496e1406	Fix broken healthcheck link (#5935 )	2018-05-16 14:43:25 -07:00
Nitish Tiwari	9cab0f25e0	Add top level metrics document to summarize monitoring endpoints (#5923 ) Minio server supports healthcheck and prometheus related unauthenticated endpoints. This document summarizes this information in a single place and add links for more detailed documentation if needed.	2018-05-15 12:23:21 -07:00

50 commits