Monitor execution of jobs on different execution nodes and maintain and adjust the automation mesh.
When you install automation controller, it automatically creates the controlplane and default instance groups. The controlplane instance group is used by internal jobs that synchronize project contents and perform maintenance tasks. The default instance group runs user jobs that do not specify any other instance group.
You can create additional instance groups when you install automation controller by adding groups to the inventory for the setup.sh command with names that begin with instance_group_, followed by the name of the instance group you want to create.
You can also use the automation controller web UI to quickly create, modify, and delete instance groups without running the installation script.
You cannot delete the controlplane or the default instance groups.
Navigate to the web UI for one of the control nodes and log in as the admin user.
Navigate to → and then click → .
Enter a name for the instance group and then click .
You can assign existing execution nodes to an instance group. If you created instance groups for geographic locations, then you might add execution nodes within those geographic locations to the appropriate instance groups. Using execution nodes that are in close geographic proximity to managed hosts reduces latency.
Navigate to → and then click the link for an instance group.
Click the tab and then click .
Select the nodes that you want to add to the instance group, and then click .
The following screen capture shows hosts that belong to the instance group.
![]() |
Automation controller monitors the health of instances. On the instance group’s tab in the web UI, hover over the health status for each instance to see the date and time stamp for the last health check. If desired, you can manually run a health check for one or more instances.
To view the date and time of the last health check, hover over the health status icon for an instance, expand the brief details for an instance, or click the link for the instance name to view the full instance details.
To manually run a health check, select one or more instances and then click .
Disassociating a node from an instance group removes the node from the instance group. You might do this so that you can associate the node with a different instance group or because you need to remove the node from your cluster.
Navigate to → and then click the name of the desired instance group.
On the tab, select the node that you want to remove from the instance group, and then click . In the pop-up window, click to confirm.
Disassociating a node from an instance group does not remove the node from your cluster.
If you want to remove a node from your cluster, then you must add node_state=deprovision to the appropriate node or group in your installation script’s inventory file and then run the installation script again.
You can explicitly configure inventories and job templates to use a particular instance group by default. If you do not specify an instance group for a job in either the job template or the inventory, then automation controller launches the job using the default instance group. When you install Ansible Automation Platform, the default instance group includes all hybrid and execution nodes (all nodes that can run jobs).
However, automation controller might select an execution node in the default instance group that is geographically distant on the network from the managed host, resulting in less than optimal performance.
If you configure an inventory to select an instance group, then when a job template uses that inventory, automation controller assigns jobs to the execution nodes in that instance group. Configure an inventory to select an instance group using the following procedure:
Navigate to → and then click the icon for the inventory that should use an instance group.
Click the search icon for .
Select the desired instance group from the list, click , and then click .
Like inventories, you can configure job templates to use instance groups. If an instance group is defined in both an inventory and in a job template that uses the inventory, then the instance group defined in the job template takes precedence.
Navigate to → and then click the icon for the job template that you want to modify.
Click the search icon for .
Select the desired instance group from the list and click .
Click .
A job always runs in an instance group. The job might use an instance group defined in the job template, in the inventory, or it might use the default instance group.
The tab for a job displays the name of the instance group and the name of the execution node used to run that job.
Navigate to → and then click the icon for the desired job template.
After the job completes, click the tab.
Job details display the name of the instance group and the name of the execution node used by the job.
You can manually test the resilience of the control and execution planes by making the nodes unavailable and then running jobs to verify that they still succeed.
Navigate to → and click the icon for the resource.
Navigate to → and then click the link for the job. The job has the type.
Click the tab and notice that the job used the instance group. Make note of the execution node used by the job.
Navigate to → and then disable the execution node identified in the previous step. Set to off. The status changes to .
Navigate to → and then click the icon for the job.
After the job completes, click the tab and notice that the job used a different execution node.
Navigate to → and then enable the previously disabled execution node. Set to off. The status changes to .
Navigate to → and click the icon for the resource.
After the job completes, click the tab. Make note of the instance group and execution node used by the job.
Navigate to → and then click the link for the instance group identified in the previous step.
On the tab, disable the previously identified execution node by setting to off. The status changes to .
Navigate to → and then click the icon for the job.
After the job completes, click the tab and notice that the job used a different execution node.
Navigate to → and then enable the previously disabled execution node. Set to off. The status changes to .
Navigate to → in the automation controller web UI to see an overview of the current status of automation mesh and its nodes.
![]() |
Healthy nodes are marked with a checkmark and are colored green.
Unavailable nodes are marked with an exclamation point and are colored red.
Disabled nodes are marked with a circle and are colored gray.
In the following example, the control2, controller, exec1, and hop1 nodes are healthy. The exec2 node is in an error state and the exec3 node is disabled.
![]() |
You can use the icons at the upper-right of the page to zoom in and out, resize the diagram to fit the screen, reset the zoom to its default level, and to turn on and off the descriptive legend. You can click and drag to change the position of the diagram, and use your mouse wheel to zoom in and out.
If you hover over any of the nodes, the web UI highlights the lines representing the peer relationships between that node and the other nodes in automation mesh. If you click any of the nodes, the web UI displays additional information about that node under to the right of the diagram.
This section demonstrates useful commands for monitoring and troubleshooting automation mesh. Log in as the awx user on one of the automation controller machines and run the following commands.
You can use the awx-manage list_instances command to list all the instances in the mesh. The command shows the status of each node.
Active and available nodes appear in green. These nodes display a Healthy status in the automation controller web UI.
Unavailable nodes appear in red and the version of ansible-runner displays as question marks. These nodes display an Error status in the automation controller web UI.
Disabled nodes contain the [DISABLED] text and appear in gray. These nodes display Disabled in the automation controller web UI.
In the following example, the control2, controller, exec1, and hop1 nodes are active and available. The exec2 node is unavailable and the exec3 node is disabled.
[awx@controller ~]$awx-manage list_instances[controlplane capacity=53 policy=100%] control2.lab.example.com capacity=16 node_type=control ... controller.lab.example.com capacity=37 node_type=control ... [default capacity=16 policy=100%] exec1.lab.example.com capacity=16 node_type=execution version=ansible-runner... exec2.lab.example.com capacity=0 node_type=execution version=ansible-runner-???[DISABLED]exec3.lab.example.com capacity=0 node_type=execution version=ansi... [ungrouped capacity=0] hop1.lab.example.com node_type=hop heartbeat="2022-06-01 17:46:51"
You can use the receptorctl command to test communication on the automation mesh. The receptorctl command provides several subcommands, including:
receptorctl status to get the status of the entire automation mesh.
receptorctl ping to test connectivity between the current node and another node in the automation mesh.
receptorctl traceroute to determine the route and latency of communication on the automation mesh between the current node and another node.
The command requires that you specify the systemd socket unit for the automation mesh receptor service. The following examples use the /var/run/awx-receptor/receptor.sock unit.
Use the status subcommand to view the entire mesh, including all of the nodes and how the nodes are connected.
In this example, the Route section indicates that communication to the exec3.lab.example.com execution node is routed through the hop1.lab.example.com hop node.
[awx@controller ~]#receptorctl --socket /var/run/awx-receptor/receptor.sock \>statusNode ID: controller.lab.example.com Version: 1.2.3 System CPU Count: 4 System Memory MiB: 5752 Connection Cost exec1.lab.example.com 1 exec2.lab.example.com 1 control2.lab.example.com 1 hop1.lab.example.com 1 Known Node Known Connections control2.lab.example.com controller.lab.example.com: 1 exec1.lab.example.co... controller.lab.example.com control2.lab.example.com: 1 exec1.lab.example.com:... exec1.lab.example.com control2.lab.example.com: 1 controller.lab.example... exec2.lab.example.com control2.lab.example.com: 1 controller.lab.example... exec3.lab.example.com hop1.lab.example.com: 1 hop1.lab.example.com control2.lab.example.com: 1 controller.lab.example... Route Via control2.lab.example.com control2.lab.example.com exec1.lab.example.com exec1.lab.example.com exec2.lab.example.com exec2.lab.example.comexec3.lab.example.com hop1.lab.example.comhop1.lab.example.com hop1.lab.example.com Node Service Type ... Tags exec1.lab.example.com control StreamTLS ... {'type': 'Control Service'} exec2.lab.example.com control StreamTLS ... {'type': 'Control Service'} control2.lab.example.com control StreamTLS ... {'type': 'Control Service'} controller.lab.example.com control StreamTLS ... {'type': 'Control Service'} exec3.lab.example.com control StreamTLS ... {'type': 'Control Service'} hop1.lab.example.com control StreamTLS ... {'type': 'Control Service'} Node Secure Work Types exec1.lab.example.com ansible-runner exec2.lab.example.com ansible-runner control2.lab.example.com local, kubernetes-runtime-auth, kubernetes-inclust... controller.lab.example.com local, kubernetes-runtime-auth, kubernetes-inclust... exec3.lab.example.com ansible-runner
Use the ping subcommand to test connectivity between the current host and another host.
The following example tests connectivity between the controller.lab.example.com and exec2.lab.example.com hosts.
[awx@controller ~]#receptorctl --socket /var/run/awx-receptor/receptor.sock \>ping exec2.lab.example.comReply from exec2.lab.example.com in 1.461675ms Reply from exec2.lab.example.com in 504.934µs Reply from exec2.lab.example.com in 528.547µs Reply from exec2.lab.example.com in 722.001µs
Use the traceroute subcommand to view the route between nodes. In the following example, the controller.lab.example.com node connects to the exec3.lab.example.com node through the hop1.lab.example.com node.
[awx@controller ~]#receptorctl --socket /var/run/awx-receptor/receptor.sock \>traceroute exec3.lab.example.com0: controller.lab.example.com in 507.316µs1: hop1.lab.example.com in 1.032767ms2: exec3.lab.example.com in 820.719µs