Bookmark this page

Guided Exercise: Cluster and Node Maintenance with Kubernetes Cron Jobs

Automate periodic cluster node cleaning for a development environment.

Outcomes

  • Manually delete unused images from the nodes.

  • Automate the image pruning by using a cron job.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

[student@workstation ~]$ lab start appsec-prune

Instructions

  1. Log in to the OpenShift cluster and switch to the appsec-prune project.

    1. Log in to the cluster as the admin user.

      [student@workstation ~]$ oc login -u admin -p redhatocp \
        https://api.ocp4.example.com:6443
      Login successful.
      
      ...output omitted...
    2. Create the appsec-prune project.

      [student@workstation ~]$ oc new-project appsec-prune
      Now using project "appsec-prune" on server "https://api.ocp4.example.com:6443".
      
      ...output omitted...
    3. Change to the ~/DO280/labs/appsec-prune directory.

      [student@workstation ~]$ cd ~/DO280/labs/appsec-prune
      [student@workstation appsec-prune]$
  2. Clean up the unused container images in the node.

    1. List the deployments and pods in the prune-apps namespace. Each deployment has a pod that uses a different image.

      [student@workstation appsec-prune]$ oc get deployments -n prune-apps -o wide
      NAME        ...  IMAGES                                                ...
      nginx-ubi7  ...  registry.ocp4.example.com:8443/ubi7/nginx-118:latest  ...
      nginx-ubi8  ...  registry.ocp4.example.com:8443/ubi8/nginx-118:latest  ...
      nginx-ubi9  ...  registry.ocp4.example.com:8443/ubi9/nginx-120:latest  ...
      
      [student@workstation appsec-prune]$ oc get pods -n prune-apps
      NAME                              READY   STATUS    RESTARTS   AGE
      pod/nginx-ubi7-594f548665-qvfq6   1/1     Running   0          5m
      pod/nginx-ubi8-855f6959b-jvs6h    1/1     Running   0          5m
      pod/nginx-ubi9-dd4c566d7-7vrrv    1/1     Running   0          5m
    2. List the container images in the node. The node has three httpd images and three nginx images.

      [student@workstation appsec-prune]$ oc debug node/master01 -- \
        chroot /host crictl images | egrep '^IMAGE|httpd|nginx'
      ...output omitted...
      Starting pod/master01-debug ...
      To use host binaries, run `chroot /host`
      IMAGE                                                TAG     IMAGE ID       ...
      registry.ocp4.example.com:8443/rhscl/httpd-24-rhel7  latest  c19a96fc0b019  ...
      registry.ocp4.example.com:8443/ubi8/httpd-24         latest  e54df115d5f0c  ...
      registry.ocp4.example.com:8443/ubi9/httpd-24         latest  4afe283d911ab  ...
      registry.ocp4.example.com:8443/ubi7/nginx-118        latest  3adc6d109b363  ...
      registry.ocp4.example.com:8443/ubi8/nginx-118        latest  90f91167f6d1d  ...
      registry.ocp4.example.com:8443/ubi9/nginx-120        latest  0227435f34784  ...
      
      Removing debug pod ...
      ...output omitted...
    3. Remove the unused images in the node. Only the httpd container images are deleted, because no other container uses them.

      [student@workstation appsec-prune]$ oc debug node/master01 -- \
        chroot /host crictl rmi --prune
      ...output omitted...
      Starting pod/master01-debug ...
      To use host binaries, run `chroot /host`
      E1213 00:43:40.788951  166213 remote_image.go:266] "RemoveImage from image service failed" err="rpc error: code = Unknown desc = Image used by 5027ebb4...: image is in use by a container" image="c464e04f..."  1
      Deleted: registry.ocp4.example.com:8443/rhscl/httpd-24-rhel7:latest
      Deleted: registry.ocp4.example.com:8443/ubi8/httpd-24:latest
      Deleted: registry.ocp4.example.com:8443/ubi9/httpd-24:latest
      
      Removing debug pod ...
      ...output omitted...

      1

      You can ignore the error that a container is using the image.

    4. Delete the deployments in the prune-apps namespace to remove the pods that use the nginx images.

      [student@workstation appsec-prune]$ oc delete deployment nginx-ubi{7,8,9} \
        -n prune-apps
      deployment.apps "nginx-ubi7" deleted
      deployment.apps "nginx-ubi8" deleted
      deployment.apps "nginx-ubi9" deleted

      Note

      The cron job removes the unused container images in a later step.

  3. Create a cron job to automate the image pruning process.

    1. Edit the ~/DO280/labs/appsec-prune/configmap-prune.yaml file to match the following specification:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: maintenance
        labels:
          ge: appsec-prune
          app: crictl
      data:
        maintenance.sh: |
          #!/bin/bash -eu
          NODES=$(oc get nodes -o=name)
          for NODE in ${NODES}
          do
            echo ${NODE}
            oc debug ${NODE} -- \
              chroot /host \
                /bin/bash -euxc 'crictl images ; crictl rmi --prune'
          done

      Note

      The ~/DO280/solutions/appsec-prune/configmap-prune.yaml file contains the correct configuration and can be used for comparison.

    2. Create the configuration map:

      [student@workstation appsec-prune]$ oc apply -f configmap-prune.yaml
      configmap/maintenance created
    3. Edit the ~/DO280/labs/appsec-prune/cronjob-prune.yaml file to match the following specification:

      apiVersion: batch/v1
      kind: CronJob
      metadata:
        name: image-pruner
        labels:
          ge: appsec-prune
          app: crictl
      spec:
        schedule: '*/4 * * * *'
        jobTemplate:
          spec:
            template:
              spec:
                dnsPolicy: ClusterFirst
                restartPolicy: Never
                containers:
                - name: crictl
                  image: registry.ocp4.example.com:8443/openshift/origin-cli:4.14  1
                  resources: {}
                  command:
                  - /opt/maintenance.sh
                  volumeMounts:
                  - name: scripts
                    mountPath: /opt
                volumes:
                - name: scripts
                  configMap:
                    name: maintenance
                    defaultMode: 0555

      1

      The registry.ocp4.example.com:8443/openshift/origin-cli:4.14 container image is a copy of the official quay.io/openshift/origin-cli:4.14 image that contains the oc command.

      Note

      The ~/DO280/solutions/appsec-prune/cronjob-prune.yaml file contains the correct configuration and can be used for comparison.

    4. Apply the changes to the image pruner resource.

      [student@workstation appsec-prune]$ oc apply -f cronjob-prune.yaml
      cronjob.batch/image-pruner created

      Note

      A warning indicates that the pod would violate several policies. The pod fails when the cron job is executed, because it lacks permissions to execute the maintenance task. A fix for this issue is implemented in a later step.

    5. Wait until the cron job is scheduled, and get the name of the associated job. The job completion status is 0/1, and the pod has an error status. Press Ctrl+C to exit the watch command.

      [student@workstation appsec-prune]$ watch oc get cronjobs,jobs,pods
      Every 2.0s: oc get cronjobs,jobs,pods      workstation: Mon Feb 13 13:00:47 2024
      
      NAME                         SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
      cronjob.batch/image-pruner   */4 * * * *   False     1        53s             6m
      
      NAME                              COMPLETIONS   DURATION   AGE
      job.batch/image-pruner-27883800   0/1           30s        30s
      
      NAME                              READY   STATUS   RESTARTS   AGE
      pod/image-pruner-27950092-g76lb   0/1     Error    0          15s
    6. Get the logs of the pod. A permission error is displayed.

      [student@workstation appsec-prune]$ oc logs pod/image-pruner-27950092-g76lb
      Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:appsec-prune:default" cannot list resource "nodes" in API group "" at the cluster scope
    7. Delete the failed cron job. This action deletes the failed job and pod resources.

      [student@workstation appsec-prune]$ oc delete cronjob/image-pruner
      cronjob.batch "image-pruner" deleted
  4. Set the appropriate permissions to run the image pruner cron job.

    1. Add the privileged SCC to the default service account of the namespace.

      [student@workstation ~]$ oc adm policy add-scc-to-user -z default privileged
      clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "default"
    2. Add the cluster-admin role to the default service account of the namespace.

      [student@workstation ~]$ oc adm policy add-cluster-role-to-user \
        cluster-admin -z default
      clusterrole.rbac.authorization.k8s.io/cluster-admin added: "default"
    3. Create the cron job resource again.

      [student@workstation appsec-prune]$ oc apply -f cronjob-prune.yaml
      cronjob.batch/image-pruner created
    4. Wait until the new job and the pod are created. Press Ctrl+C to exit the watch command when the job and the pod are marked as completed.

      [student@workstation appsec-prune]$ watch oc get cronjobs,jobs,pods
      Every 2.0s: oc get cronjobs,jobs,pods      workstation: Mon Feb 13 11:58:44 2024
      
      NAME                        SCHEDULE      SUSPEND  ACTIVE  LAST SCHEDULE  AGE
      cronjob.batch/image-pruner  */4 * * * *   False    0       30s            2m
      
      NAME                              COMPLETIONS   DURATION   AGE
      job.batch/image-pruner-27883660   1/1           9s         30s
      
      NAME                              READY   STATUS      RESTARTS   AGE
      pod/image-pruner-27883660-2ghvv   0/1     Completed   0          30s
    5. Get the logs of the pod that executed the maintenance task.

      [student@workstation appsec-prune]$ oc logs pod/image-pruner-27883660-2ghvv | tail
      ...output omitted...
      + crictl rmi --prune
      E0106 18:08:31.686489  374926 remote_image.go:266] "RemoveImage from image service failed" err="rpc error: code = Unknown desc = Image used by 0c9ab998...: image is in use by a container" image="c464e04f..." 1
      Deleted: registry.ocp4.example.com:8443/ubi7/nginx-118:latest
      Deleted: registry.ocp4.example.com:8443/ubi8/nginx-118:latest
      Deleted: registry.ocp4.example.com:8443/ubi9/nginx-120:latest
      
      Removing debug pod ...
      ...output omitted...

      1

      You can ignore the error that a container is using the image.

  5. Clean up resources.

    1. Change to the student user home directory.

      [student@workstation appsec-prune]$ cd
      [student@workstation ~]$
    2. Ensure that you are working on the appsec-prune project.

      [student@workstation ~]$ oc project
      Using project "appsec-prune" on server "https://api.ocp4.example.com:6443".
    3. Remove the cron job resource and the configuration map.

      [student@workstation ~]$ oc delete cronjob/image-pruner configmap/maintenance
      cronjob.batch "image-pruner" deleted
      configmap "maintenance" deleted
    4. Remove the security constraint from the service account.

      [student@workstation ~]$ oc adm policy remove-scc-from-user \
        -z default privileged
      clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged removed: "default"
    5. Remove the role from the service account.

      [student@workstation ~]$ oc adm policy remove-cluster-role-from-user \
        cluster-admin -z default
      clusterrole.rbac.authorization.k8s.io/cluster-admin removed: "default"
    6. Delete the appsec-prune project.

      [student@workstation ~]$ oc delete project appsec-prune prune-apps
      project.project.openshift.io "appsec-prune" deleted
      project.project.openshift.io "prune-apps" deleted

Finish

On the workstation machine, use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish appsec-prune

Revision: do280-4.14-08d11e1