Abstract
| Goal |
Perform routine maintenance and administration of Red Hat Ansible Automation Platform. |
| Objectives |
|
| Sections |
|
Describe the low-level components of automation controller, locate and examine relevant log files, control its services, and perform basic troubleshooting.
Automation controller is a web application made up of a number of cooperating processes and services. Four main network services are enabled, which start the rest of the components of automation controller:
Nginx provides the web server that hosts the automation controller application and supports the web UI and the API.
PostgreSQL is the database that stores most automation controller data, configuration, and history.
Supervisord is a process control system that itself manages the various components of the automation controller application to perform operations such as schedule and run jobs, listen for callbacks from running jobs, and so on.
Receptor provides an overlay network intended to ease the distribution of work across a large and dispersed collection of workers.
A fifth component also used by automation controller is the memcached memory object caching daemon, which is used as a local caching service.
These network services communicate with each other using normal network protocols. For a self-contained automation controller server, the main ports that need to be exposed outside the system are 80/tcp and 443/tcp, to allow clients to access the web UI and API.
However, the other services might also expose ports to external clients unless specifically protected. For example, the PostgreSQL service listens for connections from anywhere on 5432/tcp, and receptor needs access to certain ports for automation mesh communications. You can control access to these ports with a firewall, but it is important that you allow network communication from the hosts that need access to those ports, but deny access to hosts that should not be using those services.
This is one reason why setting good passwords for the PostgreSQL service in the inventory file used to install automation controller is important. These services can be contacted by internet clients directly by default, and weak passwords can leave them vulnerable to remote attack.
Automation controller ships with automation-controller-service, an administrative utility script that can start, stop, and restart all the controller services running on the current controller node.
This includes the message queue components, and the database if it is an integrated installation on that host. External databases must be explicitly managed by the administrator.
The automation-controller-service script is installed at /usr/bin/automation-controller-service and can be run as follows:
[root@control ~]#automation-controller-service status● automation-controller.service - Automation Controller service Loaded: loaded (/etc/systemd/system/automation-controller.service; enabled; vendor preset: disabled)Active: active(exited) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago Process: 10217 ExecStart=/bin/true (code=exited, status=0/SUCCESS) ...output omitted... ● redis.service - Redis persistent key-value database Loaded: loaded (/usr/lib/systemd/system/redis.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/redis.service.d └─limit.conf, override.confActive: active(running) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago Process: 6128 ExecStop=/usr/libexec/redis-shutdown (code=exited, status=0/SUCCESS) ...output omitted... ● nginx.service - The nginx HTTP and reverse proxy server Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/nginx.service.d └─override.confActive: active(running) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago Process: 10210 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS) ...output omitted... ● supervisord.service - Process Monitoring and Control Daemon Loaded: loaded (/usr/lib/systemd/system/supervisord.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/supervisord.service.d └─override.confActive: active(running) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago Process: 10200 ExecStart=/usr/bin/supervisord -c /etc/supervisord.conf (code=exited, status=0/SUCCESS) ...output omitted... ● receptor.service - Receptor Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/receptor.service.d └─override.confActive: active(running) since Tue 2022-06-28 23:49:41 EDT; 2h 26min ago ...output omitted...
To access the list of available options, run the automation-controller-service command without any options:
[root@control ~]# automation-controller-service
Usage: automation-controller-service start|stop|restart|statusThe following example illustrates the effect of stopping the automation controller with automation-controller-service:
[root@control ~]#automation-controller-service stop[root@controller ~]#automation-controller-service status● automation-controller.service - Automation Controller service Loaded: loaded (/etc/systemd/system/automation-controller.service; enabled; vendor preset: disabled)Active: inactive(dead) since Thu 2022-06-30 06:09:14 EDT; 3s ago Process: 2614 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 2614 (code=exited, status=0/SUCCESS) ...output omitted... ● receptor.service - Receptor Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/receptor.service.d └─override.confActive: active(running) since Thu 2022-06-30 01:13:06 EDT; 4h 56min ago
Compare that to the next example that illustrates the effect of starting the automation controller with automation-controller-service:
[root@control ~]#automation-controller-service start[root@control ~]#automation-controller-service status● automation-controller.service - Automation Controller service Loaded: loaded (/etc/systemd/system/automation-controller.service; enabled; vendor preset: disabled)Active: active(exited) since Thu 2022-06-30 06:11:41 EDT; 2s ago Process: 3054 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 3054 (code=exited, status=0/SUCCESS) ...output omitted... ● receptor.service - Receptor Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/receptor.service.d └─override.confActive: active(running) since Thu 2022-06-30 01:13:06 EDT; 4h 58min ago ...output omitted...
automation-controller-service does not start or stop receptor.service. This is to allow you to reload or restart receptor on your nodes without restarting the control plane, or vice versa.
To start, stop, or restart receptor.service, use the systemctl command.
Supervisord is a process control system often used to control Django-based applications such as automation controller. It is used to manage and monitor long-running processes or daemons, and to automatically restart them as needed. In automation controller, supervisord manages important components of the automation controller application itself.
You can use the supervisorctl status command to see the list of automation controller processes controlled by the supervisord service:
[root@control ~]# supervisorctl status
master-event-listener RUNNING pid 10997, uptime 0:02:03
tower-processes:awx-callback-receiver RUNNING pid 10999, uptime 0:02:03
tower-processes:awx-daphne RUNNING pid 11001, uptime 0:02:03
tower-processes:awx-dispatcher RUNNING pid 10998, uptime 0:02:03
tower-processes:awx-rsyslogd RUNNING pid 11032, uptime 0:02:00
tower-processes:awx-uwsgi RUNNING pid 11000, uptime 0:02:03
tower-processes:awx-wsbroadcast RUNNING pid 11002, uptime 0:02:03As you can see in the preceding output, supervisord controls a number of processes owned by the awx user.
The main configuration files for automation controller are kept in the /etc/tower directory. These include settings files for the automation controller application, the TLS certificate for nginx, and other key files.
Perhaps the most important of these files for the automation controller application is the /etc/tower/settings.py file, which specifies the locations for job output, project storage, and other directories.
The other individual services might have service-specific configuration files elsewhere on the system, such as the /etc/nginx files used by the web server.
The automation controller application log files are stored in one of two centralized locations:
/var/log/tower/
/var/log/supervisor/
Automation controller server errors are logged in the /var/log/tower/ directory. Some key files in the /var/log/tower/ directory include:
/var/log/tower/tower.log: The main log file for the automation controller application.
/var/log/tower/task_system.log: The log file that captures the logs of tasks that the controller is running in the background, such as adding cluster instances and logs related to information gathering as well as processing for analytics, and so on.
The /var/log/supervisor/ directory stores log files for services, daemons, and applications managed by supervisord. The supervisord.log file in this directory is the main log file for the service that controls all these daemons. The other files contain log information about the activity of those daemons.
Automation controller can also send detailed logs to external log aggregation services. Log aggregation can offer insight into automation controller technical trends or usage. The data can be used to monitor for anomalies, analyze events, and correlate events. Splunk, Elastic stack (formerly ELK stack), Loggly, and Sumologic are all log aggregation and data analysis systems that can be used with automation controller.
For more information on how to configure such services, see the References section.
This discussion has focused on looking at the log files to troubleshoot problems with the automation controller server itself.
If you encounter errors running playbooks that do not appear to be related to actual errors in the automation controller configuration, remember to look at the output of your launched jobs in the automation controller web UI or the API.
A number of other key files for automation controller are kept in the /var/lib/awx directory. This directory includes:
/var/lib/awx/projects: This is the main directory for projects. For projects that use source control, automation controller clones the project to this directory. The name of each directory includes the identification number for the project and the project name.
/var/lib/awx/job_status: Job status output from playbooks is stored in this directory.
If you are unable to run playbooks due to playbook errors, try the following suggestions:
Are you authenticating as the user currently running the commands? If not, review how the username has been set up or pass the --user=username or -u username options to specify a user.
Is your YAML file correctly indented? The indentation level is significant in YAML. Ensure you align your white space correctly.
You can use yamllint to test the syntax of your playbook.
You can also use --syntax-check with ansible-navigator to identify syntax errors and fix them.
[user@demo ~]$ ansible-navigator test_playbook.yml --syntax-checkRed Hat Ansible Automation Platform 2.2 also provides ansible-lint as a tech preview tool to review your playbooks for possible issues.
Items beginning with a dash (-) are considered list items or plays. Items with the format of key: value operate as hashes or dictionaries. Ensure that you do not have extra or missing - plays in your file.
Review your license status and the number of unique hosts that the automation controller server manages.
If the license has expired, or too many hosts are registered, launching jobs might not be possible.
If you encounter connectivity issues when running playbooks, try the following suggestions:
Verify that you can establish an SSH or WinRM connection with the managed host. Ansible depends upon SSH (or WinRM for Microsoft Windows systems) to access the servers you are managing.
Review your inventory file. Review the host names and IP addresses.
If your playbooks are not showing up in the job template list, then review the playbook’s YAML syntax and make sure that it can be parsed by Ansible.
When you are trying to run a job and it stays in the Pending state, try the following suggestions:
Ensure that the automation controller server has enough memory available and that the services governed by supervisord are running. Run the supervisorctl status command.
Ensure that the partition where the /var/ directory is located has more than 1 GB of space available. Jobs cannot complete when there is insufficient free space in the /var/ directory.
Restart the automation controller infrastructure using the automation-controller-service restart command.
If you encounter the error message Skipping: No Hosts Matched when you are trying to run a playbook through automation controller, review these possibilities:
Review and make sure that the host patterns used by the hosts declaration in your play matches the group or host names in the inventory. The host patterns are case-sensitive.
Make sure that your group names have no spaces. Modify them to use underscores or no spaces to ensure that the groups are correctly recognized.
If you have specified a limit in the job template, make sure that it is a valid limit and that it matches something in your inventory.
Automation controller ships with the awx-manage command-line utility, which can be used to access detailed internal automation controller information. The awx-manage command must be run as root or as the awx (automation controller) user. This utility is most commonly used to reset the automation controller’s admin password.
The password for the built-in automation controller System Administrator account, admin, is initially set when the automation controller server is installed. The awx-manage command provides a way to change the administrator password from the command line. To do this, as the root or awx user on the automation controller server, use the changepassword option:
[root@control ~]#awx-manage changepassword adminChanging password for user 'admin' Password:new_passwordPassword (again):new_passwordPassword changed successfully for user 'admin'
You can also change the password for the admin user through the automation controller web UI.
You can also create a new automation controller superuser, with administrative privileges if needed. To create a new superuser, you can use awx-manage with the createsuperuser option.
[root@control ~]#awx-manage createsuperuserUsername (leave blank to use 'root'):admin3Email address:admin@demo.example.comPassword:Password (again):new_passwordSuperuser created successfully.new_password