Create a node pool that uses a larger memory instance type.
Outcomes
Inspect the Amazon Elastic Compute Cloud (EC2) instance types that the nodes of a Red Hat OpenShift on AWS (ROSA) cluster use.
Select an EC2 instance type for your workload, and create a ROSA machine pool.
Use labels to select a machine pool for running your workload.
Use taints and tolerations to prevent unwanted workloads from running on machine pools.
To perform this exercise, ensure that you have completed the section called “Guided Exercise: Configure Developer Self-service for a ROSA Cluster ”.
Procedure 2.4. Instructions
Verify that you are logged in to your ROSA cluster from the OpenShift CLI.
Open a command-line terminal on your system, and then run the oc whoami command to verify your connection to the ROSA cluster.
If the command succeeds, then skip to the next step.
$ oc whoami
wlombardoghThe username is different in your command output.
If the command returns an error, then reconnect to your ROSA cluster.
Run the rosa describe cluster command to retrieve the URL of the OpenShift web console.
$rosa describe cluster --cluster do120-cluster...output omitted... Console URL:https://console-openshift-console.apps.do120-cluster.jf96.p1.openshiftapps.com...output omitted...
The URL in the preceding output is different on your system.
Open a web browser, and then navigate to the OpenShift web console URL. Click . If you are not already logged in to GitHub, then provide your GitHub credentials.
Click your name in the upper right corner of the web console, and then click . If the login page is displayed, then click and use your GitHub credentials for authentication.
Click , and then copy the oc login --token command to the clipboard.
Paste the command into the command-line terminal, and then run the command.
$oc login --token=sha256~1NofZkVCi3qCBcBJGc6XiOJTK5SDXF2ZYwhAARx5yJg--server=https://api.do120-cluster.Logged into "https://api.do120-cluster.jf96.p1.openshiftapps.com:6443" as "wlombardogh" using the token provided. ...output omitted...jf96.p1.openshiftapps.com:6443
In the preceding command, the token and the URL are different on your system.
Use the OpenShift CLI to retrieve the EC2 instance types that the ROSA cluster uses for the control plane and compute nodes.
List the machine resources in the openshift-machine-api namespace.
Machine resources describe the hosts that the cluster nodes use.
The machine names include the infra, master, or worker node types.
The TYPE column lists the EC2 instance type of each machine.
$oc get machines -n openshift-machine-apiNAME PHASE TYPE REGION ... do120-cluster-c8drv-infra-us-east-1a-qjw9w Runningr5.xlargeus-east-1 ... do120-cluster-c8drv-infra-us-east-1a-rrm6c Runningr5.xlargeus-east-1 ... do120-cluster-c8drv-master-0 Runningm5.2xlargeus-east-1 ... do120-cluster-c8drv-master-1 Runningm5.2xlargeus-east-1 ... do120-cluster-c8drv-master-2 Runningm5.2xlargeus-east-1 ... do120-cluster-c8drv-worker-us-east-1a-brnvp Runningm5.xlargeus-east-1 ... do120-cluster-c8drv-worker-us-east-1a-tnhfn Runningm5.xlargeus-east-1 ...
The machine names in the preceding output are different on your system.
Use the ROSA CLI to list the available machine pools in your cluster. Compare the result with the OpenShift machine set resources.
List the ROSA machine pools.
Only the Default machine pool exists.
ROSA does not create machine pools for the control plane and the infrastructure nodes.
$rosa list machinepools --cluster do120-clusterID AUTOSCALING REPLICAS INSTANCE TYPE LABELS TAINTS ...DefaultNo 2 m5.xlarge ...
List the OpenShift machine set resources in the openshift-machine-api namespace.
The Default machine pool translates as the do120-cluster- machine set.
ROSA does not create a machine pool for the c8drv-worker-us-east-1ado120-cluster- machine set, because the Red Hat Site Reliability Engineering (SRE) team manages the infrastructure nodes for you.c8drv-infra-us-east-1a
$ oc get machinesets -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE ...
do120-cluster-c8drv-infra-us-east-1a 2 2 2 2 ...
do120-cluster-c8drv-worker-us-east-1a 2 2 2 2 ...The names of the machine set resources in the preceding output are different on your system.
You plan to deploy an application that requires 20 GiB of memory.
Verify whether the Default machine pool can run that workload.
Create a machine pool that deploys nodes with enough memory, and that uses memory-optimized EC2 instances.
The Default machine pool uses EC2 instances of the m5.xlarge type.
Use the aws ec2 describe-instance-types command to retrieve the amount of memory that this instance type provides.
On a Microsoft Windows system, replace the line continuation character (\) in the following long command with the backtick (`) character, which is the line continuation character in PowerShell.
The command shows that the m5.xlarge EC2 instances have 16 GiB of RAM, which is not enough to run your workload.
$aws ec2 describe-instance-types --instance-types m5.xlarge \--query "InstanceTypes[].MemoryInfo"[ {"SizeInMiB": 16384} ]
Run the rosa list instance-types command to list the available EC2 instance types in your AWS Region.
Select a memory-optimized instance type with 32 MiB of memory.
The exercise uses the r5a.xlarge type in later steps.
$rosa list instance-typesID CATEGORY CPU_CORES MEMORY dl1.24xlarge accelerated_computing 96 768.0 GiB g4dn.12xlarge accelerated_computing 48 192.0 GiB g4dn.16xlarge accelerated_computing 64 256.0 GiB ...output omitted... r5ad.xlarge memory_optimized 4 32.0 GiBr5a.xlarge memory_optimized432.0 GiBr5d.12xlarge memory_optimized 48 384.0 GiB ...output omitted...
Create a machine pool named memory that deploys two r5a.xlarge instances.
To select and reserve this machine pool for the memory-intensive workload, declare the workload=memory label and the memory-optimized=32GiB:NoSchedule taint.
$rosa create machinepool --cluster do120-cluster --interactive? Machine pool name:memory? Enable autoscaling (optional):No? Replicas:2I: Fetching instance types ? Instance type:r5a.xlarge? Labels (optional):workload=memory? Taints (optional):memory-optimized=32GiB:NoSchedule? Use spot instances (optional):NoI: Machine pool 'memory' created successfully on cluster 'do120-cluster' I: To view all machine pools, run 'rosa list machinepools -c do120-cluster'
Use the rosa list machinepools command to verify your new machine pool.
$rosa list machinepools -c do120-clusterID ... INSTANCE TYPE LABELS TAINTS ... Default ... m5.xlarge ...memory...r5a.xlarge workload=memory memory-optimized=32GiB:NoSchedule...
List the cluster nodes. Because the additional machines are still deploying, the nodes for these machines do not exist yet.
$oc get nodesNAME STATUS ROLES ... ip-10-0-134-130.ec2.internal Readyworker... ip-10-0-150-2.ec2.internal Readyinfra,worker ... ip-10-0-161-192.ec2.internal Readycontrol-plane,master ... ip-10-0-198-213.ec2.internal Readyinfra,worker ... ip-10-0-201-162.ec2.internal Readycontrol-plane,master ... ip-10-0-237-34.ec2.internal Readycontrol-plane,master ... ip-10-0-240-168.ec2.internal Readyworker...
The node names in the preceding output are different on your system. However, you should have three control plane nodes, two infrastructure nodes, and two worker nodes.
List the machines to verify that two additional machines are in the Provisioned phase.
$oc get machines -n openshift-machine-apiNAME PHASE TYPE ... ...output omitted... do120-cluster-c8drv-memory-us-east-1a-clzhhProvisionedr5a.xlarge ... do120-cluster-c8drv-memory-us-east-1a-xgzg7Provisionedr5a.xlarge ... ...output omitted...
Use the oc get machinesets command to verify that the machines for the new machine pool are ready.
It takes 10 minutes for ROSA to provision the new machines.
Rerun the command regularly until it reports that the two new machines are ready and available.
$oc get machinesets -n openshift-machine-apiNAME DESIRED CURRENT READY AVAILABLE AGE do120-cluster-c8drv-infra-us-east-1a 2 2 2 2 4h4m do120-cluster-c8drv-memory-us-east-1a 2 22 210m do120-cluster-c8drv-worker-us-east-1a 2 2 2 2 4h27m
List the machines in the machine set. The machine names use the machine set name as a prefix.
$oc get machines -n openshift-machine-apiNAME PHASE TYPE ... ...output omitted...do120-cluster-c8drv-memory-us-east-1a-clzhh Running r5a.xlarge ...do120-cluster-c8drv-memory-us-east-1a-xgzg7 Running r5a.xlarge ... ...output omitted...
List the cluster nodes again. This time, two additional worker nodes are listed.
$oc get nodesNAME STATUS ROLES ... ip-10-0-134-130.ec2.internal Ready worker ... ip-10-0-150-2.ec2.internal Ready infra,worker ... ip-10-0-161-192.ec2.internal Ready control-plane,master ...ip-10-0-150-131.ec2.internal Ready worker...ip-10-0-164-51.ec2.internal Ready worker... ip-10-0-198-213.ec2.internal Ready infra,worker ... ip-10-0-201-162.ec2.internal Ready control-plane,master ... ip-10-0-237-34.ec2.internal Ready control-plane,master ... ip-10-0-240-168.ec2.internal Ready worker ...
The node names in the preceding output are different on your system.
You can also retrieve the node names from the new machine objects.
The name of the node is available in the status section of the machine resource.
Note these node names.
In a later step, you verify that the workload that you deploy runs on these nodes.
$oc get machinedo120-cluster-c8drv-memory-us-east-1a-clzhh\-n openshift-machine-api -o jsonpath-as-json="{.status.nodeRef.name}"[ "" ] $ip-10-0-150-131.ec2.internaloc get machinedo120-cluster-c8drv-memory-us-east-1a-xgzg7\-n openshift-machine-api -o jsonpath-as-json="{.status.nodeRef.name}"[ "" ]ip-10-0-164-51.ec2.internal
The node names in the preceding output are different on your system.
Create the configure-nodes project, and then deploy the application from the long-load.yaml resource file.
Use the oc new-project command to create the configure-nodes project.
$ oc new-project configure-nodes
Now using project "configure-nodes" on server "https://api.do120-cluster.jf96.p1.openshiftapps.com:6443".
...output omitted...Download the long-load.yaml resource file at https://raw.githubusercontent.com/RedHatTraining/DO12X-apps/main/ROSA/configure-nodes/long-load.yaml.
Review the long-load.yaml file.
You do not have to change its contents.
...output omitted... spec: replicas: 2selector: matchLabels: app: long-load template: metadata: labels: app: long-load spec: nodeSelector:
workload: memory tolerations:
- key: "memory-optimized" operator: "Equal" value: "32GiB" effect: "NoSchedule" containers: - name: long-load image: quay.io/redhattraining/long-load:v1
resources: requests: memory: 20Gi
...output omitted...
The workload creates two replicas. | |
The application must run on nodes that have the | |
To be able to deploy the application on the | |
The Red Hat Training team prepared the | |
The application requires 20 GiB of memory to run. |
Use the oc apply command to deploy the application.
$ oc apply -f long-load.yaml
deployment.apps/long-load created
service/long-load created
route.route.openshift.io/long-load createdVerify that OpenShift deploys the two replicas of the long-load application on nodes from the memory machine pool.
Use the oc get pods command to verify that the application pods are running.
Add the -o wide option to list the names of the nodes.
The nodes that the command returns correspond to the nodes from the memory machine pool that you retrieved in a preceding step.
$oc get pods -o wideNAME READY STATUS ... NODE ... long-load-64d9b756d6-cjrn7 1/1Running...ip-10-0-150-131.ec2.internal... long-load-64d9b756d6-kmt75 1/1Running...ip-10-0-164-51.ec2.internal...
The pod names in the preceding output are different on your system.
Verify that a workload that does not define the memory-optimized=32GiB:NoSchedule toleration cannot run on nodes from the memory machine pool.
Download the no-tolerations.yaml resource file at https://raw.githubusercontent.com/RedHatTraining/DO12X-apps/main/ROSA/configure-nodes/no-tolerations.yaml.
Review the no-tolerations.yaml file.
You do not have to change its contents.
The resource file defines a node selector that targets the nodes from the memory machine pool.
However, the file does not define tolerations that would allow the workload to run on these nodes.
...output omitted... template: metadata: labels: app: no-tolerations spec:nodeSelector:workload: memorycontainers: - name: mariadb image: registry.redhat.io/rhel9/mariadb-105 ...output omitted...
Use the oc apply command to deploy the application.
$ oc apply -f no-tolerations.yaml
deployment.apps/no-tolerations createdVerify that the pod for the no-tolerations deployment does not start, and stays in the Pending state.
$oc get podsNAME READY STATUS RESTARTS AGE long-load-64d9b756d6-cjrn7 1/1 Running 0 79m long-load-64d9b756d6-kmt75 1/1 Running 0 79mno-tolerations-79dbc8d55b-4968d0/1Pending0 19s
The pod names in the preceding output are different on your system.
Retrieve the events for the pending pod. No nodes are available for running the workload. The workload does not have a toleration that matches the node taints.
$oc describe pod no-tolerations-...output omitted... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m32s default-scheduler79dbc8d55b-4968d0/9 nodes are available: 2 node(s) didn't match Pod's node affinity/selector,2 node(s) had untolerated taint {memory-optimized: 32GiB}, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling.
Clean up your work by deleting the configure-nodes project.
$ oc delete project configure-nodes
project.project.openshift.io "configure-nodes" deletedDo not delete the memory machine pool, because later exercises use it.