Bookmark this page

Chapter 9.  Maintaining Red Hat Ansible Automation Platform

Abstract

Goal

Perform routine maintenance and administration of Red Hat Ansible Automation Platform.

Objectives
  • Describe the low-level components of automation controller, locate and examine relevant log files, control its services, and perform basic troubleshooting.

  • Back up and restore automation controller and automation hub databases and configuration files.

Sections
  • Performing Basic Troubleshooting of Automation Controller (and Guided Exercise)

  • Backing up and Restoring Red Hat Ansible Automation Platform (and Guided Exercise) (and Quiz)

Performing Basic Troubleshooting of Automation Controller

Objectives

  • Describe the low-level components of automation controller, locate and examine relevant log files, control its services, and perform basic troubleshooting.

Automation Controller Components

Automation controller is a web application made up of a number of cooperating processes and services. Four main network services are enabled, which start the rest of the components of automation controller:

  • Nginx provides the web server that hosts the automation controller application and supports the web UI and the API.

  • PostgreSQL is the database that stores most automation controller data, configuration, and history.

  • Supervisord is a process control system that itself manages the various components of the automation controller application to perform operations such as schedule and run jobs, listen for callbacks from running jobs, and so on.

  • Receptor provides an overlay network intended to ease the distribution of work across a large and dispersed collection of workers.

A fifth component also used by automation controller is the memcached memory object caching daemon, which is used as a local caching service.

These network services communicate with each other using normal network protocols. For a self-contained automation controller server, the main ports that need to be exposed outside the system are 80/tcp and 443/tcp, to allow clients to access the web UI and API.

However, the other services might also expose ports to external clients unless specifically protected. For example, the PostgreSQL service listens for connections from anywhere on 5432/tcp, and receptor needs access to certain ports for automation mesh communications. You can control access to these ports with a firewall, but it is important that you allow network communication from the hosts that need access to those ports, but deny access to hosts that should not be using those services.

Warning

This is one reason why setting good passwords for the PostgreSQL service in the inventory file used to install automation controller is important. These services can be contacted by internet clients directly by default, and weak passwords can leave them vulnerable to remote attack.

Starting, Stopping, and Restarting Automation Controller

Automation controller ships with automation-controller-service, an administrative utility script that can start, stop, and restart all the controller services running on the current controller node.

This includes the message queue components, and the database if it is an integrated installation on that host. External databases must be explicitly managed by the administrator.

The automation-controller-service script is installed at /usr/bin/automation-controller-service and can be run as follows:

[root@control ~]# automation-controller-service status
● automation-controller.service - Automation Controller service
   Loaded: loaded (/etc/systemd/system/automation-controller.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago
  Process: 10217 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
...output omitted...

● redis.service - Redis persistent key-value database
   Loaded: loaded (/usr/lib/systemd/system/redis.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/redis.service.d
           └─limit.conf, override.conf
   Active: active (running) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago
  Process: 6128 ExecStop=/usr/libexec/redis-shutdown (code=exited, status=0/SUCCESS)
...output omitted...

● nginx.service - The nginx HTTP and reverse proxy server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/nginx.service.d
           └─override.conf
   Active: active (running) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago
  Process: 10210 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS)
...output omitted...

● supervisord.service - Process Monitoring and Control Daemon
   Loaded: loaded (/usr/lib/systemd/system/supervisord.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/supervisord.service.d
           └─override.conf
   Active: active (running) since Wed 2022-06-29 00:34:54 EDT; 1h 41min ago
  Process: 10200 ExecStart=/usr/bin/supervisord -c /etc/supervisord.conf (code=exited, status=0/SUCCESS)
...output omitted...

● receptor.service - Receptor
   Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/receptor.service.d
           └─override.conf
   Active: active (running) since Tue 2022-06-28 23:49:41 EDT; 2h 26min ago
...output omitted...

To access the list of available options, run the automation-controller-service command without any options:

[root@control ~]# automation-controller-service
Usage: automation-controller-service start|stop|restart|status

The following example illustrates the effect of stopping the automation controller with automation-controller-service:

[root@control ~]# automation-controller-service stop
[root@controller ~]# automation-controller-service status
● automation-controller.service - Automation Controller service
   Loaded: loaded (/etc/systemd/system/automation-controller.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2022-06-30 06:09:14 EDT; 3s ago
  Process: 2614 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 2614 (code=exited, status=0/SUCCESS)

...output omitted...

● receptor.service - Receptor
   Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/receptor.service.d
           └─override.conf
   Active: active (running) since Thu 2022-06-30 01:13:06 EDT; 4h 56min ago

Compare that to the next example that illustrates the effect of starting the automation controller with automation-controller-service:

[root@control ~]# automation-controller-service start
[root@control ~]# automation-controller-service status
● automation-controller.service - Automation Controller service
   Loaded: loaded (/etc/systemd/system/automation-controller.service; enabled; vendor preset: disabled)
   Active: active (exited) since Thu 2022-06-30 06:11:41 EDT; 2s ago
  Process: 3054 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 3054 (code=exited, status=0/SUCCESS)

...output omitted...

● receptor.service - Receptor
   Loaded: loaded (/usr/lib/systemd/system/receptor.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/receptor.service.d
           └─override.conf
   Active: active (running) since Thu 2022-06-30 01:13:06 EDT; 4h 58min ago

...output omitted...

Important

automation-controller-service does not start or stop receptor.service. This is to allow you to reload or restart receptor on your nodes without restarting the control plane, or vice versa.

To start, stop, or restart receptor.service, use the systemctl command.

Supervisord Components

Supervisord is a process control system often used to control Django-based applications such as automation controller. It is used to manage and monitor long-running processes or daemons, and to automatically restart them as needed. In automation controller, supervisord manages important components of the automation controller application itself.

You can use the supervisorctl status command to see the list of automation controller processes controlled by the supervisord service:

[root@control ~]# supervisorctl status
master-event-listener                   RUNNING   pid 10997, uptime 0:02:03
tower-processes:awx-callback-receiver   RUNNING   pid 10999, uptime 0:02:03
tower-processes:awx-daphne              RUNNING   pid 11001, uptime 0:02:03
tower-processes:awx-dispatcher          RUNNING   pid 10998, uptime 0:02:03
tower-processes:awx-rsyslogd            RUNNING   pid 11032, uptime 0:02:00
tower-processes:awx-uwsgi               RUNNING   pid 11000, uptime 0:02:03
tower-processes:awx-wsbroadcast         RUNNING   pid 11002, uptime 0:02:03

As you can see in the preceding output, supervisord controls a number of processes owned by the awx user.

Automation Controller Configuration and Log Files

Configuration Files

The main configuration files for automation controller are kept in the /etc/tower directory. These include settings files for the automation controller application, the TLS certificate for nginx, and other key files.

Perhaps the most important of these files for the automation controller application is the /etc/tower/settings.py file, which specifies the locations for job output, project storage, and other directories.

The other individual services might have service-specific configuration files elsewhere on the system, such as the /etc/nginx files used by the web server.

Log Files

The automation controller application log files are stored in one of two centralized locations:

  • /var/log/tower/

  • /var/log/supervisor/

Automation controller server errors are logged in the /var/log/tower/ directory. Some key files in the /var/log/tower/ directory include:

  • /var/log/tower/tower.log: The main log file for the automation controller application.

  • /var/log/tower/task_system.log: The log file that captures the logs of tasks that the controller is running in the background, such as adding cluster instances and logs related to information gathering as well as processing for analytics, and so on.

The /var/log/supervisor/ directory stores log files for services, daemons, and applications managed by supervisord. The supervisord.log file in this directory is the main log file for the service that controls all these daemons. The other files contain log information about the activity of those daemons.

Automation controller can also send detailed logs to external log aggregation services. Log aggregation can offer insight into automation controller technical trends or usage. The data can be used to monitor for anomalies, analyze events, and correlate events. Splunk, Elastic stack (formerly ELK stack), Loggly, and Sumologic are all log aggregation and data analysis systems that can be used with automation controller.

For more information on how to configure such services, see the References section.

Important

This discussion has focused on looking at the log files to troubleshoot problems with the automation controller server itself.

If you encounter errors running playbooks that do not appear to be related to actual errors in the automation controller configuration, remember to look at the output of your launched jobs in the automation controller web UI or the API.

Other Automation Controller Files

A number of other key files for automation controller are kept in the /var/lib/awx directory. This directory includes:

  • /var/lib/awx/projects: This is the main directory for projects. For projects that use source control, automation controller clones the project to this directory. The name of each directory includes the identification number for the project and the project name.

  • /var/lib/awx/job_status: Job status output from playbooks is stored in this directory.

Common Troubleshooting Scenarios

Problems Running Playbooks

If you are unable to run playbooks due to playbook errors, try the following suggestions:

  • Are you authenticating as the user currently running the commands? If not, review how the username has been set up or pass the --user=username or -u username options to specify a user.

  • Is your YAML file correctly indented? The indentation level is significant in YAML. Ensure you align your white space correctly.

You can use yamllint to test the syntax of your playbook.

You can also use --syntax-check with ansible-navigator to identify syntax errors and fix them.

[user@demo ~]$ ansible-navigator test_playbook.yml --syntax-check
  • Red Hat Ansible Automation Platform 2.2 also provides ansible-lint as a tech preview tool to review your playbooks for possible issues.

  • Items beginning with a dash (-) are considered list items or plays. Items with the format of key: value operate as hashes or dictionaries. Ensure that you do not have extra or missing - plays in your file.

  • Review your license status and the number of unique hosts that the automation controller server manages.

If the license has expired, or too many hosts are registered, launching jobs might not be possible.

Problems Connecting to Your Host

If you encounter connectivity issues when running playbooks, try the following suggestions:

  • Verify that you can establish an SSH or WinRM connection with the managed host. Ansible depends upon SSH (or WinRM for Microsoft Windows systems) to access the servers you are managing.

  • Review your inventory file. Review the host names and IP addresses.

Playbooks Do Not Appear in the List of Job Templates

If your playbooks are not showing up in the job template list, then review the playbook’s YAML syntax and make sure that it can be parsed by Ansible.

Playbook Stays in Pending State

When you are trying to run a job and it stays in the Pending state, try the following suggestions:

  • Ensure that the automation controller server has enough memory available and that the services governed by supervisord are running. Run the supervisorctl status command.

  • Ensure that the partition where the /var/ directory is located has more than 1 GB of space available. Jobs cannot complete when there is insufficient free space in the /var/ directory.

  • Restart the automation controller infrastructure using the automation-controller-service restart command.

Error: Provided Hosts List Is Empty

If you encounter the error message Skipping: No Hosts Matched when you are trying to run a playbook through automation controller, review these possibilities:

  • Review and make sure that the host patterns used by the hosts declaration in your play matches the group or host names in the inventory. The host patterns are case-sensitive.

  • Make sure that your group names have no spaces. Modify them to use underscores or no spaces to ensure that the groups are correctly recognized.

  • If you have specified a limit in the job template, make sure that it is a valid limit and that it matches something in your inventory.

Performing Command-line Management

Automation controller ships with the awx-manage command-line utility, which can be used to access detailed internal automation controller information. The awx-manage command must be run as root or as the awx (automation controller) user. This utility is most commonly used to reset the automation controller’s admin password.

Changing the Automation Controller Admin Password

The password for the built-in automation controller System Administrator account, admin, is initially set when the automation controller server is installed. The awx-manage command provides a way to change the administrator password from the command line. To do this, as the root or awx user on the automation controller server, use the changepassword option:

[root@control ~]# awx-manage changepassword admin
Changing password for user 'admin'
Password: new_password
Password (again): new_password
Password changed successfully for user 'admin'

Note

You can also change the password for the admin user through the automation controller web UI.

You can also create a new automation controller superuser, with administrative privileges if needed. To create a new superuser, you can use awx-manage with the createsuperuser option.

[root@control ~]# awx-manage createsuperuser
Username (leave blank to use 'root'): admin3
Email address: admin@demo.example.com
Password: new_password
Password (again): new_password
Superuser created successfully.

Revision: do467-2.2-08877c1