DO120 - ch02s10

Bookmark this page

Guided Exercise: Configure Node Autoscaling

Configure a node autoscaler and deploy an application that scales to require cluster nodes.

Outcomes

Enable autoscaling nodes in a Red Hat OpenShift on AWS (ROSA) machine pool.
Create and monitor Kubernetes horizontal pod autoscaler (HPA) resources.

To perform this exercise, ensure that you have completed the section called “Guided Exercise: Configure Developer Self-service for a ROSA Cluster ”.

Procedure 2.5. Instructions

Verify that you are logged in to your ROSA cluster from the OpenShift CLI.
1. Open a command-line terminal on your system, and then run the oc whoami command to verify your connection to the ROSA cluster. If the command succeeds, then skip to the next step.
```
$ oc whoami
wlombardogh
```
  The username is different in your command output.
2. If the command returns an error, then reconnect to your ROSA cluster. Run the rosa describe cluster command to retrieve the URL of the OpenShift web console.
```
$ rosa describe cluster --cluster do120-cluster
...output omitted...
Console URL:     https://console-openshift-console.apps.do120-cluster.jf96.p1.openshiftapps.com
...output omitted...
```
  The URL in the preceding output is different on your system.
3. Open a web browser, and then navigate to the OpenShift web console URL. Click github-do120. If you are not already logged in to GitHub, then provide your GitHub credentials.
4. Click your name in the upper right corner of the web console, and then click Copy login command. If the login page is displayed, then click github-do120 and use your GitHub credentials for authentication.
5. Click Display Token, and then copy the oc login --token command to the clipboard.
6. Paste the command into the command-line terminal, and then run the command.
```
$ oc login --token=sha256~1NofZkVCi3qCBcBJGc6XiOJTK5SDXF2ZYwhAARx5yJg
  --server=https://api.do120-cluster.jf96.p1.openshiftapps.com:6443
Logged into "https://api.do120-cluster.jf96.p1.openshiftapps.com:6443" as "wlombardogh" using the token provided.
...output omitted...
```
  In the preceding command, the token and the URL are different on your system.

Verify whether the memory machine pool exists on your ROSA cluster. You created this machine pool in a previous exercise. If the pool is not present, then create it.

List the machine pools. If the memory machine pool exists, then skip to the next step.

$ rosa list machinepools --cluster do120-cluster
ID       ... INSTANCE TYPE  LABELS           TAINTS ...
Default  ... m5.xlarge                              ...
memory   ... r5a.xlarge     workload=memory  memory-optimized=32GiB:NoSchedule ...

If the memory machine pool does not exist, then create it.

On a Microsoft Windows system, replace the line continuation character (\) in the following long command with the backtick (`) character, which is the line continuation character in PowerShell.

$ rosa create machinepool --cluster do120-cluster --name memory --replicas 2 \
  --instance-type r5a.xlarge --labels workload=memory \
  --taints memory-optimized=32GiB:NoSchedule
I: Fetching instance types
I: Machine pool 'memory' created successfully on cluster 'do120-cluster'
I: To view all machine pools, run 'rosa list machinepools -c do120-cluster'

Use the oc get machinesets command to verify that the machines for the new machine pool are ready. It takes 10 minutes for ROSA to provision the new machines. Rerun the command regularly until it reports that the two new machines are ready and available.

$ oc get machinesets -n openshift-machine-api
NAME                                  DESIRED  CURRENT   READY   AVAILABLE   AGE
do120-cluster-c8drv-infra-us-east-1a  2        2         2       2           4h4m
do120-cluster-c8drv-memory-us-east-1a 2        2         2       2           10m
do120-cluster-c8drv-worker-us-east-1a 2        2         2       2           4h27m

The machine set names in the preceding output are different on your system.

Activate the autoscaler for the memory machine pool. Set the minimum number of replicas to two, and the maximum to four.

Use the rosa edit machinepool command to update the configuration.

$ rosa edit machinepool --cluster do120-cluster --enable-autoscaling \
  --min-replicas 2 --max-replicas 4 memory
I: Updated machine pool 'memory' on cluster 'do120-cluster'

Verify the machine pool configuration.

$ rosa list machinepool --cluster do120-cluster
ID       AUTOSCALING  REPLICAS  INSTANCE TYPE  LABELS          ...
Default  No           2         m5.xlarge                      ...
memory   Yes          2-4       r5a.xlarge     workload=memory ...

ROSA creates an OpenShift machineautoscaler resource when you activate the autoscaler for a machine pool. Verify that the machineautoscaler resource exists for the memory machine pool.

$ oc get machineautoscaler -n openshift-machine-api
NAME                                    ...   MIN   MAX   AGE
do120-cluster-c8drv-memory-us-east-1a   ...   2     4     23s

Create the configure-auto project, and then deploy the application from the long-load-hpa.yaml resource file.

Use the oc new-project command to create the configure-auto project.

$ oc new-project configure-auto
Now using project "configure-auto" on server "https://api.do120-cluster.jf96.p1.openshiftapps.com:6443".
...output omitted...

Download the long-load-hpa.yaml resource file at https://raw.githubusercontent.com/RedHatTraining/DO12X-apps/main/ROSA/configure-auto/long-load-hpa.yaml.

Review the long-load-hpa.yaml file. You do not have to change its contents.

---
apiVersion: v1
kind: List
metadata: {}
items:
  - apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler 
    metadata:
      labels:
        app: long-load
      name: long-load
    spec:
      minReplicas: 1  
      maxReplicas: 3
      scaleTargetRef: 
        apiVersion: apps/v1
        kind: Deployment
        name: long-load
      metrics:
        - type: Resource
          resource:
            name: memory 
            target:
              type: Utilization
              averageUtilization: 2 
  - apiVersion: apps/v1
    kind: Deployment 
    metadata:
      labels:
        app: long-load
      name: long-load
...output omitted...
          containers:
            - name: long-load
              image: quay.io/redhattraining/long-load:v1
              resources:
                requests:
                  memory: 20Gi 
...output omitted...

	The file includes the HPA resource definition.
	The minimum number of replicas for the deployment resource that the HPA targets is one. The maximum number of replicas is three.
	The HPA scales the `long-load` deployment, which this resource file also declares.
	The HPA uses the average memory usage of the pods as the scaling criteria.
	The HPA scales up the application when the average memory utilization of all the pods in the deployment is above 2% of the requested memory.
	The resource file declares the `long-load` deployment.
	The deployment requires 20 GiB of memory to run. The HPA uses this value to compute the utilization percentage.

Note

In this exercise, to trigger application scaling with limited memory consumption, the HPA sets the threshold to 2% of the memory that the deployment requests, which is 20 GiB. In production, a more realistic threshold would be 80%, for example.

Use the oc apply command to deploy the application.

$ oc apply -f long-load-hpa.yaml
horizontalpodautoscaler.autoscaling/long-load created
deployment.apps/long-load created
service/long-load created
route.route.openshift.io/long-load created

Wait for the pod to start. You might have to rerun the command several times for the pod to be ready and to report a Running status.
```
$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
long-load-58dd4ddfdc-x5xml   1/1     Running   0          82s
```
The pod name in the preceding output is different on your system.

Monitor the HPA resource, and then increase the memory load of the application.
The application exposes the /leak API endpoint. Every time that you send an HTTP GET request to this endpoint, the application consumes an additional 480 MiB block of memory.
Send a request to the API, and then verify that OpenShift deploys new pods as the memory load increases.
1. Open a new terminal window, and then run the oc get hpa command in watch mode. Wait for the command to report usage in the TARGETS column.
```
$ oc get hpa --watch
NAME       REFERENCE              TARGETS        MINPODS  MAXPODS  REPLICAS   AGE
long-load  Deployment/long-load   <unknown>/2%   1        3        1          16s
long-load  Deployment/long-load   0%/2%          1        3        1          45s
```
  Leave the command running, and do not interrupt it.
2. Switch back to the first terminal, and then retrieve the application URL.
```
$ oc get route
NAME       HOST/PORT ...
long-load  long-load-configure-auto.apps.do120-cluster.jf96.p1.openshiftapps.com ...
```
  The hostname in the preceding output might be different on your system.
3. Open a web browser. Add the /leak path to the URL from the preceding output, and then access the application at https://long-load-configure-auto.apps.do120-cluster.jf96.p1.openshiftapps.com/leak. Do not close the page.
4. Watch the output of the oc get hpa command in the second terminal. After a minute, OpenShift deploys an additional pod.
```
NAME       REFERENCE              TARGETS        MINPODS  MAXPODS  REPLICAS   AGE
...output omitted...
long-load  Deployment/long-load   0%/2%          1        3        1          16m
long-load  Deployment/long-load   3%/2%          1        3        1          17m
long-load  Deployment/long-load   3%/2%          1        3        2          17m
```
  Leave the command running, and do not interrupt it.
5. Switch back to the first terminal, and then verify that two pods are now running.
```
$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
long-load-58dd4ddfdc-jqj94   1/1     Running   0          17m
long-load-58dd4ddfdc-x27hx   1/1     Running   0          24s
```

As the application memory usage continues to increase, OpenShift deploys a third pod. Because each pod requires 20 GiB of memory, the initial two machines in the memory machine pool do not have enough resources to run the workload. Remember that the machine pool uses r5a.xlarge Amazon Elastic Compute Cloud (EC2) instances that have only 32 GiB of memory.

Refresh the web browser page to send another request to the /leak endpoint.

Watch the output of the oc get hpa command in the second terminal. After a minute, OpenShift deploys a third pod.

NAME       REFERENCE              TARGETS        MINPODS  MAXPODS  REPLICAS   AGE
...output omitted...
long-load  Deployment/long-load   1%/2%          1        3        2          31m
long-load  Deployment/long-load   3%/2%          1        3        2          31m
long-load  Deployment/long-load   3%/2%          1        3        3          32m

You can exit the oc get hpa command.

List the pods. The third pod is in the pending state.

$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
long-load-58dd4ddfdc-f6rvz   1/1     Running   0          4m24s
long-load-58dd4ddfdc-jqj94   1/1     Running   0          33m
long-load-58dd4ddfdc-srpdf   0/1     Pending   0          114s

Retrieve the events for this pending pod. The cluster autoscaler initiates a scale up of the do120-cluster-c8drv-memory-us-east-1a machine set from two to three machines. It can take two minutes for OpenShift to report the event.

$ oc describe pod long-load-58dd4ddfdc-srpdf
...output omitted...
Events:
  Type     Reason            Age   From                Message
  ----     ------            ----  ----                -------
  Warning  FailedScheduling  55s   default-scheduler   0/9 nodes are available: 2 Insufficient memory, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/9 nodes are available: 2 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling.
  Normal   TriggeredScaleUp  44s   cluster-autoscaler  pod triggered scale-up: [{MachineSet/openshift-machine-api/do120-cluster-c8drv-memory-us-east-1a 2->3 (max: 4)}]

Monitor the OpenShift machine set resource that corresponds to the ROSA memory machine pool. Verify that OpenShift deploys an additional node to accommodate for the pending pod.

List the machine sets in the openshift-machine-api namespace. OpenShift initiates the deployment of a third machine. The machine is not ready yet.

$ oc get machineset -n openshift-machine-api
NAME                                    DESIRED  CURRENT  READY  AVAILABLE   AGE
do120-cluster-c8drv-infra-us-east-1a    2        2        2      2           2d6h
do120-cluster-c8drv-memory-us-east-1a   3        3        2      2           86m
do120-cluster-c8drv-worker-us-east-1a   2        2        2      2           2d6h

It takes 10 minutes for ROSA to provision the new machine. Rerun the command regularly until it reports that the new machine is ready and available.

$ oc get machineset -n openshift-machine-api
NAME                                    DESIRED  CURRENT  READY  AVAILABLE   AGE
do120-cluster-c8drv-infra-us-east-1a    2        2        2      2           2d6h
do120-cluster-c8drv-memory-us-east-1a   3        3        3      3           96m
do120-cluster-c8drv-worker-us-east-1a   2        2        2      2           2d6h

List the pods. The third pod is now running.

$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
long-load-58dd4ddfdc-f6rvz   1/1     Running   0          27m
long-load-58dd4ddfdc-jqj94   1/1     Running   0          56m
long-load-58dd4ddfdc-srpdf   1/1     Running   0          25m

Optional. Delete the long-load application, and then verify that OpenShift scales down the machine pool.

Use the oc delete command to delete the long-load application.

$ oc delete all -l app=long-load
pod "long-load-58dd4ddfdc-f6rvz" deleted
pod "long-load-58dd4ddfdc-jqj94" deleted
pod "long-load-58dd4ddfdc-srpdf" deleted
service "long-load" deleted
deployment.apps "long-load" deleted
horizontalpodautoscaler.autoscaling "long-load" deleted
route.route.openshift.io "long-load" deleted

Now that you deleted the memory-consuming long-load application, the load on the machines from the memory machine pool decreases. Wait for OpenShift to scale down the machine pool back to two machines. It takes 15 minutes for the operation to complete. Rerun the command regularly until it reports that the machine set has two machines.

$ oc get machineset -n openshift-machine-api
NAME                                    DESIRED  CURRENT  READY  AVAILABLE   AGE
do120-cluster-c8drv-infra-us-east-1a    2        2        2      2           2d6h
do120-cluster-c8drv-memory-us-east-1a   2        2        2      2           99m
do120-cluster-c8drv-worker-us-east-1a   2        2        2      2           2d6h

List the machine and the node resources to verify that OpenShift deletes the extra resources when the machine set scales down. The oc get machines command reports only two machines. The oc get nodes command reports four worker nodes: two for the Default machine pool, and two for the memory machine pool.

$ oc get machines -n openshift-machine-api
NAME                                          PHASE     TYPE ...
...output omitted...
do120-cluster-c8drv-memory-us-east-1a-clzhh   Running   r5a.xlarge ...
do120-cluster-c8drv-memory-us-east-1a-xgzg7   Running   r5a.xlarge ...
...output omitted...
$ oc get nodes
NAME                           STATUS   ROLES                  ...
ip-10-0-134-130.ec2.internal   Ready    worker                 ...
ip-10-0-150-2.ec2.internal     Ready    infra,worker           ...
ip-10-0-161-192.ec2.internal   Ready    control-plane,master   ...
ip-10-0-150-131.ec2.internal   Ready    worker                 ...
ip-10-0-164-51.ec2.internal    Ready    worker                 ...
ip-10-0-198-213.ec2.internal   Ready    infra,worker           ...
ip-10-0-201-162.ec2.internal   Ready    control-plane,master   ...
ip-10-0-237-34.ec2.internal    Ready    control-plane,master   ...
ip-10-0-240-168.ec2.internal   Ready    worker                 ...

The machine and node names in the preceding output are different on your system.

Clean up your work by deleting the configure-auto project and the memory machine pool.

Use the oc delete project command to delete the project.

$ oc delete project configure-auto
project.project.openshift.io "configure-auto" deleted

Delete the memory machine pool.

$ rosa delete machinepool --cluster do120-cluster memory
? Are you sure you want to delete machine pool 'memory' on cluster 'do120-cluster'? Yes
I: Successfully deleted machine pool 'memory' from cluster 'do120-cluster'

Do not delete your ROSA cluster, because later exercises use it.

Revision: do120-4.12-b978842

Introduction to Red Hat OpenShift Service on AWS (ROSA)

Guided Exercise: Configure Node Autoscaling

Note