OpenShift v4x health check
Contents
1 Links
- https://docs.openshift.com/container-platform/3.9/day_two_guide/environment_health_checks.html
- https://docs.openshift.com/container-platform/4.4/backup_and_restore/replacing-unhealthy-etcd-member.html
- https://kubernetes.io/docs/concepts/
2 Nodes
Kubernetes runs your workload by placing containers into Pods to run on Nodes. A node may be a virtual or physical machine, depending on the cluster. Each node contains the services necessary to run Pods
2.1 Overview
[chris@control(zabbix-dev/system:admin) ~]$ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master01 Ready master,worker 40d v1.17.1 192.168.100.221 <none> RHEL CoreOS 44.81.202005062110-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 master02 Ready master,worker 40d v1.17.1 192.168.100.222 <none> RHEL CoreOS 44.81.202005062110-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 master03 Ready master,worker 40d v1.17.1 192.168.100.223 <none> RHEL CoreOS 44.81.202005062110-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 worker01 Ready worker 40d v1.17.1 192.168.100.231 <none> RHEL CoreOS 44.81.202005062110-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 worker02 Ready worker 40d v1.17.1 192.168.100.232 <none> RHEL CoreOS 44.81.202005062110-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8
2.2 Ressources
Usage: oc adm top [flags] Available Commands: images Show usage statistics for Images imagestreams Show usage statistics for ImageStreams node Display Resource (CPU/Memory/Storage) usage of nodes pod Display Resource (CPU/Memory/Storage) usage of pods
[chris@control(default/system:admin) ~]$ oc adm top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master01 796m 22% 3601Mi 52% master02 852m 24% 3626Mi 52% master03 578m 16% 2494Mi 36% worker01 596m 17% 2644Mi 38% worker02 538m 15% 2426Mi 35%
2.3 Pending certificate signing requests
certificate signing request are issued by OpenShift automatically. But you have to approve them manually.
pending CSR's are mostly resulting in a cluster that is not fully functioning.
[chris@control(openshift-console/system:admin) ~]$ oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-2g82l 3m23s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-bz74n 9m25s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
[chris@control(openshift-console/system:admin) ~]$ oc get csr -o name | xargs oc adm certificate approve certificatesigningrequest.certificates.k8s.io/csr-2g82l approved certificatesigningrequest.certificates.k8s.io/csr-bz74n approved
[chris@control(openshift-console/system:admin) ~]$ oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-2g82l 3m55s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-bz74n 9m57s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
3 Kubernetes API health endpoints
The Kubernetes API server provides API endpoints to indicate the current status of the API server.
kubectl get --raw='/readyz?verbose'
4 etcd
etcd is a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data
[chris@control(zabbix-dev/system:admin) ~]$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}' master02,master01,master03 members are available, have not started, are unhealthy, are unknown
5 router
There are many ways to get traffic into the cluster. The most common approach is to use the OpenShift Container Platform router as the ingress point for external traffic destined for services in your OpenShift Container Platform installation.
[chris@control(default/system:admin) ~]$ oc get deployment,pod --namespace openshift-ingress NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/router-default 2/2 2 2 40d NAME READY STATUS RESTARTS AGE pod/router-default-5fdb964dfb-kkl5p 1/1 Running 0 3d1h pod/router-default-5fdb964dfb-nb8ff 1/1 Running 0 3d1h
6 registry
OpenShift Container Platform can build container images from your source code, deploy them, and manage their lifecycle. To enable this, OpenShift Container Platform provides an internal, integrated container image registry that can be deployed in your OpenShift Container Platform environment to locally manage images.
[chris@control(default/system:admin) ~]$ oc get pod,deployment -n openshift-image-registry NAME READY STATUS RESTARTS AGE pod/cluster-image-registry-operator-7bff4c7595-hkbqx 2/2 Running 0 2d23h pod/image-registry-6b6745b4f9-wqwdx 1/1 Running 0 3d2h pod/node-ca-6wgpw 1/1 Running 0 3d2h pod/node-ca-gjmhw 1/1 Running 0 3d2h pod/node-ca-gnp7n 1/1 Running 0 3d2h pod/node-ca-gtvt9 1/1 Running 0 3d2h pod/node-ca-ps7v9 1/1 Running 0 3d2h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-image-registry-operator 1/1 1 1 40d deployment.apps/image-registry 1/1 1 1 40d
7 ClusterOperators - Version 4x
Conceptually, Operators take human operational knowledge and encode it into software that is more easily shared with consumers.
Operators are pieces of software that ease the operational complexity of running another piece of software. They act like an extension of the software vendor’s engineering team, watching over a Kubernetes environment (such as OpenShift Container Platform) and using its current state to make decisions in real time. Advanced Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, like skipping a software backup process to save time.
[chris@control(zabbix-dev/system:admin) ~]$ oc -n default get clusteroperators NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.4 True False False 35d cloud-credential 4.4.4 True False False 40d cluster-autoscaler 4.4.4 True False False 40d ... service-catalog-apiserver 4.4.4 True False False 40d service-catalog-controller-manager 4.4.4 True False False 40d storage 4.4.4 True False False 2d23h
8 Deployment
A Deployment provides declarative updates for Pods and ReplicaSets.
You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate. You can define Deployments to create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments
[chris@control(zabbix-dev/system:admin) ~]$ oc get deployment --all-namespaces NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE openshift-apiserver-operator openshift-apiserver-operator 1/1 1 1 40d openshift-apiserver apiserver 3/3 3 3 3d openshift-authentication-operator authentication-operator 1/1 1 1 40d ...
9 ReplicaSet
A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Pod template
[chris@control(zabbix-dev/system:admin) ~]$ oc get replicaset --all-namespaces | egrep -v ' 0 .* 0 ' NAMESPACE NAME DESIRED CURRENT READY AGE openshift-apiserver-operator openshift-apiserver-operator-8596449546 1 1 1 3d openshift-apiserver apiserver-95c79c585 3 3 3 2d21h openshift-authentication-operator authentication-operator-66f85cff9 1 1 1 3d openshift-authentication oauth-openshift-5d8d554669 2 2 2 34h ...
10 Pods (restarts)
A set of one or more containers that are deployed onto a Node together and share a unique IP and Volumes (persistent storage). Pods also define the security and runtime policy for each container.
[chris@control(zabbix-dev/system:admin) ~]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver-operator openshift-apiserver-operator-8596449546-kmmt6 1/1 Running 0 2d20h openshift-apiserver apiserver-95c79c585-b4h7f 1/1 Running 0 2d20h openshift-apiserver apiserver-95c79c585-h5pxq 1/1 Running 0 2d20h openshift-apiserver apiserver-95c79c585-w2xq2 1/1 Running 0 2d20h openshift-authentication-operator authentication-operator-66f85cff9-zcjhb 1/1 Running 0 2d20h openshift-authentication oauth-openshift-5d8d554669-9wxng 1/1 Running 0 34h openshift-authentication oauth-openshift-5d8d554669-vgp8f 1/1 Running 0 34h openshift-cloud-credential-operator cloud-credential-operator-695f4895db-5nv2b 1/1 Running 0 2d20h openshift-cluster-machine-approver machine-approver-685c8468fb-rpmtq 2/2 Running 0 2d20h ...
11 StatefulSets
StatefulSet is the workload API object used to manage stateful applications.
Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.
[chris@control(zabbix-dev/system:admin) ~]$ oc get statefulset --all-namespaces NAMESPACE NAME READY AGE openshift-monitoring alertmanager-main 3/3 40d openshift-monitoring prometheus-k8s 2/2 40d
12 DaemonSet
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
- running a cluster storage daemon, such as glusterd, ceph, on each node.
- running a logs collection daemon on every node, such as fluentd or filebeat.
- running a node monitoring daemon on every node, such as Prometheus Node Exporter, Flowmill, Sysdig Agent, collectd, Dynatrace OneAgent, AppDynamics Agent, Datadog agent, New Relic agent, Ganglia gmond, Instana Agent or Elastic Metricbeat.
[chris@control(zabbix-dev/system:admin) ~]$ oc get daemonset --all-namespaces NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE openshift-cluster-node-tuning-operator tuned 5 5 5 5 5 kubernetes.io/os=linux 2d23h openshift-controller-manager controller-manager 3 3 3 3 3 node-role.kubernetes.io/master= 40d openshift-dns dns-default 5 5 5 5 5 kubernetes.io/os=linux 40d ... openshift-sdn ovs 5 5 5 5 5 kubernetes.io/os=linux 40d openshift-sdn sdn 5 5 5 5 5 kubernetes.io/os=linux 40d openshift-sdn sdn-controller 3 3 3 3 3 node-role.kubernetes.io/master= 40d
13 ReplicationControlers
- Result of an Deployment by DeploymentConfig
[chris@control(zabbix-dev/system:admin) ~]$ oc get replicationcontroller --all-namespaces NAMESPACE NAME DESIRED CURRENT READY AGE zabbix-dev mariadb-1 1 1 1 2d1h zabbix-dev zabbix-cachet-1 0 0 0 45h zabbix-dev zabbix-server-mysql-1 1 1 1 2d1h zabbix-dev zabbix-web-nginx-mysql-1 1 1 1 2d1h
14 Persistent Volumes
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource.
[chris@control(test/system:admin) ~]$ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE ... pv10 5Gi RWO Retain Available 40d pv11 5Gi RWO Retain Released test/mariadb 40d ... pv18 5Gi RWO Retain Available 40d pv19 5Gi RWO Retain Available 40d pv20 5Gi RWO Retain Bound zabbix-dev/mariadb 40d pv36 5Gi RWX Retain Available 40d ... pv40 5Gi RWX Retain Available 40d registry-pv 100Gi RWX Retain Bound openshift-image-registry/registry-pvc 40d
15 Persistent Volumes Claims
A PersistentVolumeClaim is used by a pod as a volume. OpenShift Enterprise finds the claim with the given name in the same namespace as the pod, then uses the claim to find the corresponding PersistentVolume to mount.
[chris@control(test/system:admin) ~]$ oc get pvc --all-namespaces NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE openshift-image-registry registry-pvc Bound registry-pv 100Gi RWX 40d test mariadb Pending 11s zabbix-dev mariadb Bound pv20 5Gi RWO 2d1h zabbix-dev zabbix-server-mysql-claim Bound pv38 5Gi RWX 2d1h
16 events
[chris@control(test/system:admin) ~]$ oc get events --field-selector type!=Normal --watch LAST SEEN TYPE REASON OBJECT MESSAGE <unknown> Warning FailedScheduling pod/mariadb-1-bcb8h error while running "VolumeBinding" filter plugin for pod "mariadb-1-bcb8h": pod has unbound immediate PersistentVolumeClaims <unknown> Warning FailedScheduling pod/mariadb-1-bcb8h error while running "VolumeBinding" filter plugin for pod "mariadb-1-bcb8h": pod has unbound immediate PersistentVolumeClaims <unknown> Warning FailedScheduling pod/mariadb-1-bcb8h skip schedule deleting pod: test/mariadb-1-bcb8h
[chris@control(test/system:admin) ~]$ oc get event --watch -o yaml action: Scheduling ... message: 'error while running "VolumeBinding" filter plugin for pod "mariadb-1-bcb8h":
[chris@control(test/system:admin) ~]$ kubectl get event --watch LAST SEEN TYPE REASON OBJECT MESSAGE 107s Normal ReplicationControllerScaled deploymentconfig/mariadb Scaled replication controller "mariadb-1" from 1 to 0