OpenShift v4x health check

From Bitbull Wiki
Jump to navigation Jump to search

1 Links

2 Nodes

Kubernetes runs your workload by placing containers into Pods to run on Nodes. A node may be a virtual or physical machine, depending on the cluster. Each node contains the services necessary to run Pods

2.1 Overview

[chris@control(zabbix-dev/system:admin) ~]$ oc get nodes -o wide
NAME       STATUS   ROLES           AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                                   KERNEL-VERSION                CONTAINER-RUNTIME
master01   Ready    master,worker   40d   v1.17.1   192.168.100.221   <none>        RHEL CoreOS 44.81.202005062110-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8
master02   Ready    master,worker   40d   v1.17.1   192.168.100.222   <none>        RHEL CoreOS 44.81.202005062110-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8
master03   Ready    master,worker   40d   v1.17.1   192.168.100.223   <none>        RHEL CoreOS 44.81.202005062110-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8
worker01   Ready    worker          40d   v1.17.1   192.168.100.231   <none>        RHEL CoreOS 44.81.202005062110-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8
worker02   Ready    worker          40d   v1.17.1   192.168.100.232   <none>        RHEL CoreOS 44.81.202005062110-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8

2.2 Ressources

Usage:
  oc adm top [flags]

Available Commands:
  images       Show usage statistics for Images
  imagestreams Show usage statistics for ImageStreams
  node         Display Resource (CPU/Memory/Storage) usage of nodes
  pod          Display Resource (CPU/Memory/Storage) usage of pods
[chris@control(default/system:admin) ~]$ oc adm top nodes
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master01   796m         22%    3601Mi          52%       
master02   852m         24%    3626Mi          52%       
master03   578m         16%    2494Mi          36%       
worker01   596m         17%    2644Mi          38%       
worker02   538m         15%    2426Mi          35%

2.3 Pending certificate signing requests

certificate signing request are issued by OpenShift automatically. But you have to approve them manually.
pending CSR's are mostly resulting in a cluster that is not fully functioning.

[chris@control(openshift-console/system:admin) ~]$ oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-2g82l   3m23s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-bz74n   9m25s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
[chris@control(openshift-console/system:admin) ~]$ oc get csr -o name | xargs oc adm certificate approve
certificatesigningrequest.certificates.k8s.io/csr-2g82l approved
certificatesigningrequest.certificates.k8s.io/csr-bz74n approved
[chris@control(openshift-console/system:admin) ~]$ oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-2g82l   3m55s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-bz74n   9m57s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued




3 Kubernetes API health endpoints

The Kubernetes API server provides API endpoints to indicate the current status of the API server.

kubectl get --raw='/readyz?verbose'

4 etcd

etcd is a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data

[chris@control(zabbix-dev/system:admin) ~]$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}'
master02,master01,master03 members are available,  have not started,  are unhealthy,  are unknown

5 router

There are many ways to get traffic into the cluster. The most common approach is to use the OpenShift Container Platform router as the ingress point for external traffic destined for services in your OpenShift Container Platform installation.

[chris@control(default/system:admin) ~]$ oc get deployment,pod --namespace openshift-ingress
NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/router-default   2/2     2            2           40d

NAME                                  READY   STATUS    RESTARTS   AGE
pod/router-default-5fdb964dfb-kkl5p   1/1     Running   0          3d1h
pod/router-default-5fdb964dfb-nb8ff   1/1     Running   0          3d1h

6 registry

OpenShift Container Platform can build container images from your source code, deploy them, and manage their lifecycle. To enable this, OpenShift Container Platform provides an internal, integrated container image registry that can be deployed in your OpenShift Container Platform environment to locally manage images.


[chris@control(default/system:admin) ~]$ oc get pod,deployment -n openshift-image-registry
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/cluster-image-registry-operator-7bff4c7595-hkbqx   2/2     Running   0          2d23h
pod/image-registry-6b6745b4f9-wqwdx                    1/1     Running   0          3d2h
pod/node-ca-6wgpw                                      1/1     Running   0          3d2h
pod/node-ca-gjmhw                                      1/1     Running   0          3d2h
pod/node-ca-gnp7n                                      1/1     Running   0          3d2h
pod/node-ca-gtvt9                                      1/1     Running   0          3d2h
pod/node-ca-ps7v9                                      1/1     Running   0          3d2h

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-image-registry-operator   1/1     1            1           40d
deployment.apps/image-registry                    1/1     1            1           40d

7 ClusterOperators - Version 4x

Conceptually, Operators take human operational knowledge and encode it into software that is more easily shared with consumers.
Operators are pieces of software that ease the operational complexity of running another piece of software. They act like an extension of the software vendor’s engineering team, watching over a Kubernetes environment (such as OpenShift Container Platform) and using its current state to make decisions in real time. Advanced Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, like skipping a software backup process to save time.

[chris@control(zabbix-dev/system:admin) ~]$ oc -n default get clusteroperators
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.4     True        False         False      35d
cloud-credential                           4.4.4     True        False         False      40d
cluster-autoscaler                         4.4.4     True        False         False      40d
 ...
service-catalog-apiserver                  4.4.4     True        False         False      40d
service-catalog-controller-manager         4.4.4     True        False         False      40d
storage                                    4.4.4     True        False         False      2d23h


8 Deployment

A Deployment provides declarative updates for Pods and ReplicaSets.
You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate. You can define Deployments to create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments

[chris@control(zabbix-dev/system:admin) ~]$ oc get deployment --all-namespaces
NAMESPACE                                               NAME                                                    READY   UP-TO-DATE   AVAILABLE   AGE
openshift-apiserver-operator                            openshift-apiserver-operator                            1/1     1            1           40d
openshift-apiserver                                     apiserver                                               3/3     3            3           3d
openshift-authentication-operator                       authentication-operator                                 1/1     1            1           40d
...

9 ReplicaSet

A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Pod template

[chris@control(zabbix-dev/system:admin) ~]$ oc get replicaset --all-namespaces  | egrep -v ' 0 .* 0 '
NAMESPACE                                               NAME                                                               DESIRED   CURRENT   READY   AGE
openshift-apiserver-operator                            openshift-apiserver-operator-8596449546                            1         1         1       3d
openshift-apiserver                                     apiserver-95c79c585                                                3         3         3       2d21h
openshift-authentication-operator                       authentication-operator-66f85cff9                                  1         1         1       3d
openshift-authentication                                oauth-openshift-5d8d554669                                         2         2         2       34h
...

10 Pods (restarts)

A set of one or more containers that are deployed onto a Node together and share a unique IP and Volumes (persistent storage). Pods also define the security and runtime policy for each container.

[chris@control(zabbix-dev/system:admin) ~]$ oc get pods --all-namespaces
NAMESPACE                                               NAME                                                              READY   STATUS      RESTARTS   AGE
openshift-apiserver-operator                            openshift-apiserver-operator-8596449546-kmmt6                     1/1     Running     0          2d20h
openshift-apiserver                                     apiserver-95c79c585-b4h7f                                         1/1     Running     0          2d20h
openshift-apiserver                                     apiserver-95c79c585-h5pxq                                         1/1     Running     0          2d20h
openshift-apiserver                                     apiserver-95c79c585-w2xq2                                         1/1     Running     0          2d20h
openshift-authentication-operator                       authentication-operator-66f85cff9-zcjhb                           1/1     Running     0          2d20h
openshift-authentication                                oauth-openshift-5d8d554669-9wxng                                  1/1     Running     0          34h
openshift-authentication                                oauth-openshift-5d8d554669-vgp8f                                  1/1     Running     0          34h
openshift-cloud-credential-operator                     cloud-credential-operator-695f4895db-5nv2b                        1/1     Running     0          2d20h
openshift-cluster-machine-approver                      machine-approver-685c8468fb-rpmtq                                 2/2     Running     0          2d20h
...


11 StatefulSets

StatefulSet is the workload API object used to manage stateful applications.
Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.

[chris@control(zabbix-dev/system:admin) ~]$ oc get statefulset --all-namespaces
NAMESPACE              NAME                READY   AGE
openshift-monitoring   alertmanager-main   3/3     40d
openshift-monitoring   prometheus-k8s      2/2     40d

12 DaemonSet

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.

Some typical uses of a DaemonSet are:

  • running a cluster storage daemon, such as glusterd, ceph, on each node.
  • running a logs collection daemon on every node, such as fluentd or filebeat.
  • running a node monitoring daemon on every node, such as Prometheus Node Exporter, Flowmill, Sysdig Agent, collectd, Dynatrace OneAgent, AppDynamics Agent, Datadog agent, New Relic agent, Ganglia gmond, Instana Agent or Elastic Metricbeat.
[chris@control(zabbix-dev/system:admin) ~]$ oc get daemonset --all-namespaces
NAMESPACE                                NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
openshift-cluster-node-tuning-operator   tuned                         5         5         5       5            5           kubernetes.io/os=linux            2d23h
openshift-controller-manager             controller-manager            3         3         3       3            3           node-role.kubernetes.io/master=   40d
openshift-dns                            dns-default                   5         5         5       5            5           kubernetes.io/os=linux            40d
...
openshift-sdn                            ovs                           5         5         5       5            5           kubernetes.io/os=linux            40d
openshift-sdn                            sdn                           5         5         5       5            5           kubernetes.io/os=linux            40d
openshift-sdn                            sdn-controller                3         3         3       3            3           node-role.kubernetes.io/master=   40d

13 ReplicationControlers

  • Result of an Deployment by DeploymentConfig
[chris@control(zabbix-dev/system:admin) ~]$ oc get replicationcontroller --all-namespaces
NAMESPACE    NAME                       DESIRED   CURRENT   READY   AGE
zabbix-dev   mariadb-1                  1         1         1       2d1h
zabbix-dev   zabbix-cachet-1            0         0         0       45h
zabbix-dev   zabbix-server-mysql-1      1         1         1       2d1h
zabbix-dev   zabbix-web-nginx-mysql-1   1         1         1       2d1h

14 Persistent Volumes

A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource.

[chris@control(test/system:admin) ~]$ oc get pv
NAME          CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                   STORAGECLASS   REASON   AGE
...
pv10          5Gi        RWO            Retain           Available                                                                   40d
pv11          5Gi        RWO            Retain           Released    test/mariadb                                                    40d
...
pv18          5Gi        RWO            Retain           Available                                                                   40d
pv19          5Gi        RWO            Retain           Available                                                                   40d
pv20          5Gi        RWO            Retain           Bound       zabbix-dev/mariadb                                              40d
pv36          5Gi        RWX            Retain           Available                                                                   40d
...
pv40          5Gi        RWX            Retain           Available                                                                   40d
registry-pv   100Gi      RWX            Retain           Bound       openshift-image-registry/registry-pvc                           40d

15 Persistent Volumes Claims

A PersistentVolumeClaim is used by a pod as a volume. OpenShift Enterprise finds the claim with the given name in the same namespace as the pod, then uses the claim to find the corresponding PersistentVolume to mount.

[chris@control(test/system:admin) ~]$ oc get pvc --all-namespaces
NAMESPACE                  NAME                        STATUS    VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS   AGE
openshift-image-registry   registry-pvc                Bound     registry-pv   100Gi      RWX                           40d
test                       mariadb                     Pending                                                          11s
zabbix-dev                 mariadb                     Bound     pv20          5Gi        RWO                           2d1h
zabbix-dev                 zabbix-server-mysql-claim   Bound     pv38          5Gi        RWX                           2d1h

16 events

[chris@control(test/system:admin) ~]$ oc get events --field-selector type!=Normal --watch
LAST SEEN   TYPE      REASON             OBJECT                MESSAGE
<unknown>   Warning   FailedScheduling   pod/mariadb-1-bcb8h   error while running "VolumeBinding" filter plugin for pod "mariadb-1-bcb8h": pod has unbound immediate PersistentVolumeClaims
<unknown>   Warning   FailedScheduling   pod/mariadb-1-bcb8h   error while running "VolumeBinding" filter plugin for pod "mariadb-1-bcb8h": pod has unbound immediate PersistentVolumeClaims
<unknown>   Warning   FailedScheduling   pod/mariadb-1-bcb8h   skip schedule deleting pod: test/mariadb-1-bcb8h
[chris@control(test/system:admin) ~]$ oc get event  --watch -o yaml
action: Scheduling
...
message: 'error while running "VolumeBinding" filter plugin for pod "mariadb-1-bcb8h":
[chris@control(test/system:admin) ~]$ kubectl get event  --watch
LAST SEEN   TYPE      REASON                        OBJECT                            MESSAGE
107s        Normal    ReplicationControllerScaled   deploymentconfig/mariadb          Scaled replication controller "mariadb-1" from 1 to 0