Kubernetes监控

Kubernetes监控

[info] Kubernetes监控注意事项

tags和labels变的非常重要，是应用分组的重要依据

多维度的监控，包括集群内的所有节点、容器、容器内的应用以及Kubernetes自身

天生的分布式，对复杂应用的监控聚合是个挑战

cAdvisor

cAdvisor是一个来自Google的容器监控工具，也是kubelet内置的容器资源收集工具。它会自动收集本机容器CPU、内存、网络和文件系统的资源占用情况，并对外提供cAdvisor原生的API（默认端口为--cadvisor-port=4194）。

InfluxDB和Grafana

InfluxDB是一个开源分布式时序、事件和指标数据库；而Grafana则是InfluxDB的dashboard，提供了强大的图表展示功能。它们常被组合使用展示图表化的监控数据。

Heapster

前面提到的cAdvisor只提供了单机的容器资源占用情况，而Heapster则提供了整个集群的资源监控，并支持持久化数据存储到InfluxDB、Google Cloud Monitoring或者其他的存储后端。

Heapster从kubelet提供的API采集节点和容器的资源占用：

另外，Heapster的/metrics API提供了Prometheus格式的数据。

部署Heapster、InfluxDB和Grafana

在Kubernetes部署成功后，dashboard、DNS和监控的服务也会默认部署好，比如通过cluster/kube-up.sh部署的集群默认会开启以下服务：

$ kubectl cluster-info
Kubernetes master is running at https://kubernetes-master
Heapster is running at https://kubernetes-master/api/v1/proxy/namespaces/kube-system/services/heapster
KubeDNS is running at https://kubernetes-master/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://kubernetes-master/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Grafana is running at https://kubernetes-master/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
InfluxDB is running at https://kubernetes-master/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb

如果这些服务没有自动部署的话，可以根据cluster/addons来添加需要的服务。

Prometheus

Prometheus是另外一个监控和时间序列数据库，并且还提供了告警的功能。它提供了强大的查询语言和HTTP接口，也支持将数据导出到Grafana中展示。

使用Prometheus监控Kubernetes需要配置好数据源，一个简单的示例是prometheus.yml：

kubectl apply -f https://raw.githubusercontent.com/feiskyer/kubernetes-handbook/master/monitor/prometheus.txt

推荐使用Prometheus Operator或Prometheus Chart来部署和管理Prometheus。

# 注意：未开启RBAC时，需要去掉 --set rbac.create=true 选项
helm install --name my-release stable/prometheus --set rbac.create=true

Kubelet Metrics

从v1.7开始，Kubelet metrics API 不再包含 cadvisor metrics，而是提供了一个独立的 API 接口：

Kubelet metrics: http://127.0.0.1:8001/api/v1/proxy/nodes/<node-name>/metrics
Cadvisor metrics: http://127.0.0.1:8001/api/v1/proxy/nodes/<node-name>/metrics/cadvisor

然而 cadvisor metrics 通常是监控系统必需的数据。给 Prometheus 增加新的 target 可以解决这个问题，比如通过 helm 部署的 Prometheus 将配置保存在 configmap 中

# 查询configmap
kubectl get configmap -l app=prometheus -l component=server

修改这个 configmap，kubectl edit configmap my-release-prometheus-server，增加如下内容

      # Scrape config for Kubelet cAdvisor.
      #
      # This is required for Kubernetes 1.7 and later, where cAdvisor metrics
      # (those whose names begin with 'container_') have been removed from the
      # Kubelet metrics endpoint.  This job scrapes the cAdvisor endpoint to
      # retrieve those metrics.
      #
      # In Kubernetes v1.7+, these metrics are only exposed on the cAdvisor
      # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
      # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
      # the --cadvisor-port=0 Kubelet flag).
      #
      # This job is not necessary and should be removed in Kubernetes 1.6 and
      # earlier versions, or it will cause the metrics to be scraped twice.
      - job_name: 'kubernetes-cadvisor'

        # Default to scraping over https. If required, just disable this or change to
        # `http`.
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

        kubernetes_sd_configs:
        - role: node

        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

监控