Configure a node autoscaler and deploy an application that scales to require cluster nodes.
Outcomes
Enable autoscaling nodes in a Red Hat OpenShift on AWS (ROSA) machine pool.
Create and monitor Kubernetes horizontal pod autoscaler (HPA) resources.
To perform this exercise, ensure that you have completed the section called “Guided Exercise: Configure Developer Self-service for a ROSA Cluster ”.
Procedure 2.5. Instructions
Verify that you are logged in to your ROSA cluster from the OpenShift CLI.
Open a command-line terminal on your system, and then run the oc whoami command to verify your connection to the ROSA cluster.
If the command succeeds, then skip to the next step.
$ oc whoami
wlombardoghThe username is different in your command output.
If the command returns an error, then reconnect to your ROSA cluster.
Run the rosa describe cluster command to retrieve the URL of the OpenShift web console.
$rosa describe cluster --cluster do120-cluster...output omitted... Console URL:https://console-openshift-console.apps.do120-cluster.jf96.p1.openshiftapps.com...output omitted...
The URL in the preceding output is different on your system.
Open a web browser, and then navigate to the OpenShift web console URL. Click . If you are not already logged in to GitHub, then provide your GitHub credentials.
Click your name in the upper right corner of the web console, and then click . If the login page is displayed, then click and use your GitHub credentials for authentication.
Click , and then copy the oc login --token command to the clipboard.
Paste the command into the command-line terminal, and then run the command.
$oc login --token=sha256~1NofZkVCi3qCBcBJGc6XiOJTK5SDXF2ZYwhAARx5yJg--server=https://api.do120-cluster.Logged into "https://api.do120-cluster.jf96.p1.openshiftapps.com:6443" as "wlombardogh" using the token provided. ...output omitted...jf96.p1.openshiftapps.com:6443
In the preceding command, the token and the URL are different on your system.
Verify whether the memory machine pool exists on your ROSA cluster.
You created this machine pool in a previous exercise.
If the pool is not present, then create it.
List the machine pools.
If the memory machine pool exists, then skip to the next step.
$rosa list machinepools --cluster do120-clusterID ... INSTANCE TYPE LABELS TAINTS ... Default ... m5.xlarge ...memory... r5a.xlarge workload=memory memory-optimized=32GiB:NoSchedule ...
If the memory machine pool does not exist, then create it.
On a Microsoft Windows system, replace the line continuation character (\) in the following long command with the backtick (`) character, which is the line continuation character in PowerShell.
$rosa create machinepool --cluster do120-cluster --name memory --replicas 2 \--instance-type r5a.xlarge --labels workload=memory \--taints memory-optimized=32GiB:NoScheduleI: Fetching instance types I: Machine pool 'memory' created successfully on cluster 'do120-cluster' I: To view all machine pools, run 'rosa list machinepools -c do120-cluster'
Use the oc get machinesets command to verify that the machines for the new machine pool are ready.
It takes 10 minutes for ROSA to provision the new machines.
Rerun the command regularly until it reports that the two new machines are ready and available.
$oc get machinesets -n openshift-machine-apiNAME DESIRED CURRENT READY AVAILABLE AGE do120-cluster-c8drv-infra-us-east-1a 2 2 2 2 4h4m do120-cluster-c8drv-memory-us-east-1a 2 22 210m do120-cluster-c8drv-worker-us-east-1a 2 2 2 2 4h27m
The machine set names in the preceding output are different on your system.
Activate the autoscaler for the memory machine pool.
Set the minimum number of replicas to two, and the maximum to four.
Use the rosa edit machinepool command to update the configuration.
$rosa edit machinepool --cluster do120-cluster --enable-autoscaling \--min-replicas 2 --max-replicas 4 memoryI: Updated machine pool 'memory' on cluster 'do120-cluster'
Verify the machine pool configuration.
$rosa list machinepool --cluster do120-clusterID AUTOSCALING REPLICAS INSTANCE TYPE LABELS ... Default No 2 m5.xlarge ... memoryYes 2-4r5a.xlarge workload=memory ...
ROSA creates an OpenShift machineautoscaler resource when you activate the autoscaler for a machine pool.
Verify that the machineautoscaler resource exists for the memory machine pool.
$oc get machineautoscaler -n openshift-machine-apiNAME ... MIN MAX AGEdo120-cluster-c8drv-memory-us-east-1a...2 423s
Create the configure-auto project, and then deploy the application from the long-load-hpa.yaml resource file.
Use the oc new-project command to create the configure-auto project.
$ oc new-project configure-auto
Now using project "configure-auto" on server "https://api.do120-cluster.jf96.p1.openshiftapps.com:6443".
...output omitted...Download the long-load-hpa.yaml resource file at https://raw.githubusercontent.com/RedHatTraining/DO12X-apps/main/ROSA/configure-auto/long-load-hpa.yaml.
Review the long-load-hpa.yaml file.
You do not have to change its contents.
---
apiVersion: v1
kind: List
metadata: {}
items:
- apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
labels:
app: long-load
name: long-load
spec:
minReplicas: 1
maxReplicas: 3
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: long-load
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 2
- apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: long-load
name: long-load
...output omitted...
containers:
- name: long-load
image: quay.io/redhattraining/long-load:v1
resources:
requests:
memory: 20Gi
...output omitted...The file includes the HPA resource definition. | |
The minimum number of replicas for the deployment resource that the HPA targets is one. The maximum number of replicas is three. | |
The HPA scales the | |
The HPA uses the average memory usage of the pods as the scaling criteria. | |
The HPA scales up the application when the average memory utilization of all the pods in the deployment is above 2% of the requested memory. | |
The resource file declares the | |
The deployment requires 20 GiB of memory to run. The HPA uses this value to compute the utilization percentage. |
In this exercise, to trigger application scaling with limited memory consumption, the HPA sets the threshold to 2% of the memory that the deployment requests, which is 20 GiB. In production, a more realistic threshold would be 80%, for example.
Use the oc apply command to deploy the application.
$ oc apply -f long-load-hpa.yaml
horizontalpodautoscaler.autoscaling/long-load created
deployment.apps/long-load created
service/long-load created
route.route.openshift.io/long-load createdWait for the pod to start.
You might have to rerun the command several times for the pod to be ready and to report a Running status.
$oc get podsNAME READY STATUS RESTARTS AGE long-load-58dd4ddfdc-x5xml1/1 Running0 82s
The pod name in the preceding output is different on your system.
Monitor the HPA resource, and then increase the memory load of the application.
The application exposes the /leak API endpoint.
Every time that you send an HTTP GET request to this endpoint, the application consumes an additional 480 MiB block of memory.
Send a request to the API, and then verify that OpenShift deploys new pods as the memory load increases.
Open a new terminal window, and then run the oc get hpa command in watch mode.
Wait for the command to report usage in the TARGETS column.
$oc get hpa --watchNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE long-load Deployment/long-load <unknown>/2% 1 3 1 16s long-load Deployment/long-load0%/2%1 3 1 45s
Leave the command running, and do not interrupt it.
Switch back to the first terminal, and then retrieve the application URL.
$oc get routeNAME HOST/PORT ... long-loadlong-load-configure-auto.apps.do120-cluster.jf96.p1.openshiftapps.com...
The hostname in the preceding output might be different on your system.
Open a web browser.
Add the /leak path to the URL from the preceding output, and then access the application at https://long-load-configure-auto.apps.do120-cluster.jf96.p1.openshiftapps.com/leak.
Do not close the page.
Watch the output of the oc get hpa command in the second terminal.
After a minute, OpenShift deploys an additional pod.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE ...output omitted... long-load Deployment/long-load 0%/2% 1 3 1 16m long-load Deployment/long-load3%/2%1 3 1 17m long-load Deployment/long-load3%/2%1 3217m
Leave the command running, and do not interrupt it.
Switch back to the first terminal, and then verify that two pods are now running.
$oc get podsNAME READY STATUS RESTARTS AGE long-load-58dd4ddfdc-jqj94 1/1Running0 17m long-load-58dd4ddfdc-x27hx 1/1Running0 24s
As the application memory usage continues to increase, OpenShift deploys a third pod.
Because each pod requires 20 GiB of memory, the initial two machines in the memory machine pool do not have enough resources to run the workload.
Remember that the machine pool uses r5a.xlarge Amazon Elastic Compute Cloud (EC2) instances that have only 32 GiB of memory.
Refresh the web browser page to send another request to the /leak endpoint.
Watch the output of the oc get hpa command in the second terminal.
After a minute, OpenShift deploys a third pod.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
...output omitted...
long-load Deployment/long-load 1%/2% 1 3 2 31m
long-load Deployment/long-load 3%/2% 1 3 2 31m
long-load Deployment/long-load 3%/2% 1 3 3 32mYou can exit the oc get hpa command.
List the pods. The third pod is in the pending state.
$oc get podsNAME READY STATUS RESTARTS AGE long-load-58dd4ddfdc-f6rvz 1/1 Running 0 4m24s long-load-58dd4ddfdc-jqj94 1/1 Running 0 33m long-load-58dd4ddfdc-srpdf0/1 Pending0 114s
Retrieve the events for this pending pod.
The cluster autoscaler initiates a scale up of the do120-cluster-c8drv-memory-us-east-1a machine set from two to three machines.
It can take two minutes for OpenShift to report the event.
$oc describe pod long-load-...output omitted... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 55s default-scheduler 0/9 nodes are available:58dd4ddfdc-srpdf2 Insufficient memory, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/9 nodes are available: 2 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling.Normal TriggeredScaleUp 44s cluster-autoscaler pod triggered scale-up: [{MachineSet/openshift-machine-api/do120-cluster-c8drv-memory-us-east-1a 2->3 (max: 4)}]
Monitor the OpenShift machine set resource that corresponds to the ROSA memory machine pool.
Verify that OpenShift deploys an additional node to accommodate for the pending pod.
List the machine sets in the openshift-machine-api namespace.
OpenShift initiates the deployment of a third machine.
The machine is not ready yet.
$oc get machineset -n openshift-machine-apiNAME DESIRED CURRENT READY AVAILABLE AGE do120-cluster-c8drv-infra-us-east-1a 2 2 2 2 2d6h do120-cluster-c8drv-memory-us-east-1a3 3 2 286m do120-cluster-c8drv-worker-us-east-1a 2 2 2 2 2d6h
It takes 10 minutes for ROSA to provision the new machine. Rerun the command regularly until it reports that the new machine is ready and available.
$oc get machineset -n openshift-machine-apiNAME DESIRED CURRENT READY AVAILABLE AGE do120-cluster-c8drv-infra-us-east-1a 2 2 2 2 2d6h do120-cluster-c8drv-memory-us-east-1a 3 33 396m do120-cluster-c8drv-worker-us-east-1a 2 2 2 2 2d6h
List the pods. The third pod is now running.
$oc get podsNAME READY STATUS RESTARTS AGE long-load-58dd4ddfdc-f6rvz 1/1 Running 0 27m long-load-58dd4ddfdc-jqj94 1/1 Running 0 56m long-load-58dd4ddfdc-srpdf1/1 Running0 25m
Optional.
Delete the long-load application, and then verify that OpenShift scales down the machine pool.
Use the oc delete command to delete the long-load application.
$ oc delete all -l app=long-load
pod "long-load-58dd4ddfdc-f6rvz" deleted
pod "long-load-58dd4ddfdc-jqj94" deleted
pod "long-load-58dd4ddfdc-srpdf" deleted
service "long-load" deleted
deployment.apps "long-load" deleted
horizontalpodautoscaler.autoscaling "long-load" deleted
route.route.openshift.io "long-load" deletedNow that you deleted the memory-consuming long-load application, the load on the machines from the memory machine pool decreases.
Wait for OpenShift to scale down the machine pool back to two machines.
It takes 15 minutes for the operation to complete.
Rerun the command regularly until it reports that the machine set has two machines.
$oc get machineset -n openshift-machine-apiNAME DESIRED CURRENT READY AVAILABLE AGE do120-cluster-c8drv-infra-us-east-1a 2 2 2 2 2d6h do120-cluster-c8drv-memory-us-east-1a2 2 2 299m do120-cluster-c8drv-worker-us-east-1a 2 2 2 2 2d6h
List the machine and the node resources to verify that OpenShift deletes the extra resources when the machine set scales down.
The oc get machines command reports only two machines.
The oc get nodes command reports four worker nodes: two for the Default machine pool, and two for the memory machine pool.
$oc get machines -n openshift-machine-apiNAME PHASE TYPE ... ...output omitted...do120-cluster-c8drv-memory-us-east-1a-clzhh Running r5a.xlarge ...do120-cluster-c8drv-memory-us-east-1a-xgzg7 Running r5a.xlarge ... ...output omitted... $oc get nodesNAME STATUS ROLES ... ip-10-0-134-130.ec2.internal Readyworker... ip-10-0-150-2.ec2.internal Ready infra,worker ... ip-10-0-161-192.ec2.internal Ready control-plane,master ... ip-10-0-150-131.ec2.internal Readyworker... ip-10-0-164-51.ec2.internal Readyworker... ip-10-0-198-213.ec2.internal Ready infra,worker ... ip-10-0-201-162.ec2.internal Ready control-plane,master ... ip-10-0-237-34.ec2.internal Ready control-plane,master ... ip-10-0-240-168.ec2.internal Readyworker...
The machine and node names in the preceding output are different on your system.
Clean up your work by deleting the configure-auto project and the memory machine pool.
Use the oc delete project command to delete the project.
$ oc delete project configure-auto
project.project.openshift.io "configure-auto" deletedDelete the memory machine pool.
$rosa delete machinepool --cluster do120-cluster memory? Are you sure you want to delete machine pool 'memory' on cluster 'do120-cluster'?YesI: Successfully deleted machine pool 'memory' from cluster 'do120-cluster'
Do not delete your ROSA cluster, because later exercises use it.