Bookmark this page

Troubleshooting Playbooks

Objectives

  • Identify issues in Ansible Playbooks and repair them.

Debugging Playbooks

The output provided by the ansible-navigator run command is a good starting point for troubleshooting issues with your plays.

Consider the following output from a playbook run:

PLAY [Configure the Junos managed nodes] *****************************************

TASK: [Enable the netconf service on port 830] ***********************************
changed: [junos1]
ok: [junos2]

TASK [Configure DNS settings] ****************************************************
changed: [junos1]
changed: [junos2]

PLAY RECAP ***********************************************************************
junos1 : ok=2  changed=2  unreachable=0  failed=0  skipped=0  rescued=0  ignored=0
junos2 : ok=2  changed=1  unreachable=0  failed=0  skipped=0  rescued=0  ignored=0

The previous output shows a PLAY header with the name of the play being run, followed by one or more TASK headers for the tasks in that play. Each of the TASK headers represents the associated task in the play, and it is run against all the managed nodes specified by the hosts parameter in the play.

As the task runs against each managed node, the name of the managed node is displayed under the TASK header, along with the task result for that node. Task results can be ok, fatal, changed, or skipping.

Table 3.2. Task Results

ResultDescription
ok There is no need to made any changes on the managed node. It is already in the desired state.
fatal There was a problem implementing a change on the managed node.
changed The task implemented a change on the managed node. It is now in the desired state.
skipping The task was not run on the managed node. It might or might not be in the desired state.

At the bottom of the output for each play, the PLAY RECAP section displays the number of tasks run for each managed node, by task result.

You can increase the verbosity of the output from ansible-navigator run by adding one or more -v options. The ansible-navigator run -v command provides additional debugging information, with up to four levels of verbosity.

Table 3.3. Configuring the Output Verbosity of Playbook Execution

OptionDescription
-v Displays task results.
-vv Displays task results and task configuration.
-vvv Displays extra information about connections to managed nodes.
-vvvv Adds extra verbosity options to the connection plug-ins, including users who executed the scripts on the managed nodes, and what scripts have been executed.

Debugging by Displaying Variable Values

The ansible.builtin.debug module can provide you insight into what is happening in the play. You can create a task that uses this module to display the value for a given variable at a specific point in the play. This can help you to debug tasks that use variables to communicate with each other (for example, using the output of a task as the input to the following one).

The following example uses the msg and var parameters of the ansible.builtin.debug module, after gathering the interface facts for the IOS manged nodes:

- name: Gather IOS interface facts
  cisco.ios.facts:
    gather_subset: interfaces

- name: Display the operating system version running on the managed node
  ansible.builtin.debug:
    msg: "Running the {{ ansible_net_version }} OS version"

- name: Display the IPv4 addresses configured on the managed node
  ansible.builtin.debug:
    var: ansible_net_all_ipv4_addresses

The first task using the ansible.builtin.debug module displays the value at run time of the ansible_net_version variable as part of a message printed to the output of the ansible-navigator run command.

The second task using the ansible.builtin.debug module displays the value of the ansible_net_all_ipv4_addresses variable.

You can use the verbosity parameter to specify whether the ansible.builtin.debug module task is executed. The value for the verbosity parameter correlates to the number of -v options that are specified when you run the playbook. The default value of the verbosity parameter is 0.

In the following example, the verbosity is set to 2 for a task using the ansible.builtin.debug module. The task is executed only if you run the playbook with the -vv option in the ansible-navigator run command.

- name: Display the network device serial number
  ansible.builtin.debug:
    var: ansible_net_serialnum
    verbosity: 2

Working with Command Modules

You can have a playbook that includes command modules to send commands to your managed nodes. A playbook task might fail to run because a command does not yet produce the required output on the managed node, but if you wait a certain amount of time, the condition is eventually met.

For example, a network interface is being connected. You execute the playbook and it fails. It fails because the network interface is not connected yet.

Conditionals and Command Module Arguments

Use conditionals to evaluate the result from a command that is executed on the managed node and then wait for a certain result before continuing.

The following table illustrates the unique conditionals you can use to evaluate the command result in a managed node:

Table 3.4. Networking Module Conditionals

OptionResult from executing a command on the managed node
eq Result is equal to what follows the conditional.
neq Result is not equal to what follows the conditional.
gt Result has a value greater than the value that follows the conditional.
ge Result has a value greater than or equal to the value that follows the conditional.
lt Result has a value less than the value that follows the conditional.
le Result has a value less than or equal to the value that follows the conditional.
contains Result contains the string that follows the conditional.

The wait_for command argument can be used to specify one or more conditions to evaluate against the output of the command. The task does not continue to execute until all conditions are true. If the conditions are not true within the configured number of retries, then the task fails.

You can control how many retries to do, and how much time to leave between one and the other. By default, the number of retries is set to 9. Use the retries command argument to modify this value. By default, the interval in seconds between retries is 1. Modify this interval using the interval command argument.

The following example shows a task that runs the show interfaces command on the managed nodes, and then waits for the result of that task to contain the text GigabitEthernet1 is up. It retries the task 19 times, waiting two seconds between each retry.

- name: Wait for the second network interface
  cisco.ios.ios_command:
    commands:
      - show interfaces
    wait_for:
      - result[0] contains 'GigabitEthernet1 is up'
    interval: 2
    retries: 19

Including Prompts in the Playbook

When working directly with network devices, sometimes there is a prompt confirmation message that must be answered to perform some actions or changes on the devices. By using Ansible, you can handle these kinds of prompt confirmations using the ansible.netcommon.cli_command module with the prompt parameter.

Note

The cisco.ios.ios_command and cisco.nxos.nxos_command modules include the prompt parameter to provide answers to interactive prompts when commands require it.

You can also use the ansible.netcommon.cli_command module to handle prompts. For more information on the ansible.netcommon.cli_command module, visit https://docs.ansible.com/ansible/latest/collections/ansible/netcommon/cli_command_module.html.

The prompt parameter is accompanied by the answer parameter. The prompt parameter specifies the prompt message that is received on the network device, or a list of these prompt messages when there are more than one. Likewise, the answer parameter specifies the answers to the prompt.

The value of the prompt parameter must not only match the network device prompt request, but also follow the rules for Python regular expressions, as indicated in https://docs.ansible.com/ansible/latest/network/user_guide/network_working_with_command_output.html#handling-prompts-in-network-modules.

In the following example, the task uses the cisco.ios.ios_command to send the reload command to the managed nodes. Then, the prompt message from the network device and the corresponding answer are specified.

- name: Reboot the managed nodes
  cisco.ios.ios_command:
    commands:
      - command: 'reload'
        prompt: 'Proceed with reload\?'
        answer: 'y'

Reviewing Playbooks for Errors

Several issues can occur during a playbook run, many related to the syntax of the playbook, or due to connectivity issues with the managed nodes (for example, an error in the hostname of the managed node in the inventory file).

A number of tools are available that you can use to review your playbook for syntax errors and other problems before you run it.

Following Good Practices

One of the best ways to simplify debugging playbooks is to follow good practices when writing them in the first place. The following list describes some recommended practices for playbook development:

  • Use a concise description of the purpose of the play or task when naming plays and tasks. The play name or task name is displayed when the playbook is executed. This also helps document what each play or task is supposed to accomplish, and possibly why it is needed.

  • Use comments to add inline documentation about tasks.

  • Make effective use of vertical white space. In general, organize task attributes vertically to make them easier to read.

  • Consistent horizontal indentation is critical. Use spaces, not tabs, to avoid indentation errors. You can set up your text editor to insert spaces when you press the Tab key to make this easier.

  • Try to keep the playbook as simple as possible. Only use the features that you need.

Note

Some Ansible practitioners at Red Hat have been working on an unofficial set of recommended practices for creating Ansible automation content, based on their own experiences in the field. See https://redhat-cop.github.io/automation-good-practices.

Although not officially endorsed by Red Hat at this time, it can be a useful starting point for developing good practices of your own.

Making Use of Ansible Lint

To help you follow good practices, Red Hat Ansible Automation Platform 2.3 provides a tool called Ansible Lint, which uses a set of predefined rules to look for possible issues with your playbook.

Use the ansible-navigator lint command to review a file for common errors and issues:

[user@host ~]$ ansible-navigator lint playbook.yml

Not all the issues that Ansible Lint reports break your playbook, but a reported issue might indicate the presence of a more serious error. The output provided for the ansible-navigator lint command shows the issue found, the line where the issue was found, and the rule that allowed Ansible Lint to find the issue.

For example, assume that you have the following ios_information.yml playbook:

---
- name: Gather information from IOS devices
  hosts: ios
  gather_facts: false
  tasks:
    - name: Gather information from IOS devices
      ios_command:
        commands:
        - show version
        - show interfaces

Use the ansible-navigator lint command to validate it:

[user@host ~]$ ansible-navigator lint ios_information.yml -m stdout
fqcn[action]: Use FQCN for module actions, such `<namespace>.<collection>.ios_command`. (warning) 1
ios_information_lint.yml:7 Action `ios_command` is not FQCN.

yaml[indentation]: Wrong indentation: expected 10 but found 8 2
ios_information_lint.yml:10

yaml[trailing-spaces]: Trailing spaces 3
ios_information_lint.yml:11

yaml[empty-lines]: Too many blank lines (1 > 0) 4
ios_information_lint.yml:12

1

Line 7 of the playbook, (ios_command:). The issue was detected by the fqcn[action] rule. Ansible Lint indicates the use of the fully qualified collection name (FQCN) for the module name on that task. It should be cisco.ios.ios_command instead of just ios_command.

2

Line 10 of the playbook, (- show version). The issue was detected by the yaml[indentation] rule. Ansible Lint indicates incorrect indentation. It expected 10 spaces but found 8. You need to increase the indentation by two spaces.

3

Line 11 of the playbook, (- show interfaces). The issue was detected by the yaml[trailing-spaces] rule. Ansible Lint indicates an issue related to trailing spaces. Although it is not visible, there are blank spaces at the end of the line. It is not a problem with the playbook directly, but many developers prefer not to have trailing white space in files stored in version control to avoid unnecessary differences as they edit the files.

4

The playbook ends with one or more blank lines, detected by the yaml[empty-lines] rule. Again, this is not a problem with the playbook directly, but it is good practice to remove this last blank line.

Playbook Artifacts and Log Files

Red Hat Ansible Automation Platform can log the output of playbook runs that you make from the command line in a number of different ways.

  • Automation content navigator can produce playbook artifacts that store information about runs of playbooks in JSON format.

  • You can log information about playbook runs to a text file.

Playbook Artifacts from Automation Content Navigator

The ansible-navigator command produces playbook artifact files by default each time you run a playbook. These files record information about the playbook run, and can be used to review the results of the run when it completes, to troubleshoot issues, or be kept for compliance purposes. Each playbook artifact file is named based on the name of the playbook you ran, followed by artifact, and then the time stamp of when the playbook was run, ending with the .json file extension.

For example, if you run the command ansible-navigator run networking.yml at 20:00 UTC on May 31, 2023, the resulting file name of the artifact file would be:

networking-artifact-2023-05-31T20:00:04.019343+00:00.json

You can review the contents of these files with the ansible-navigator replay command. If you include the -m stdout option, then the output of the playbook run is printed to your terminal as if it had just run. If you omit that option, you can examine the results of the run interactively.

Consider the following playbook where you run the show interfaces command on Junos managed nodes. The playbook uses the ansible.netcommon.cli_command module to run the command:

---
- name: Gather interfaces information for Junos devices
  hosts: junos
  gather_facts: false
  tasks:
    - name: Run command on remote managed nodes
      ansible.netcommon.cli_command:
        command: show interfaces

The junos managed nodes were defined in the inventory file for the project:

[junos]
junos[1:2].lab.example.com

[junos:vars]
ansible_user=developer
ansible_ssh_private_key_file=~/.ssh/lab_rsa
ansible_connection=ansible.netcommon.netconf
ansible_network_os=junipernetworks.junos.junos

If you run the playbook then it fails. Using Ansible Lint, you do not find any issues in the predefined rules.

To troubleshoot further, you run ansible-navigator replay in interactive mode on the resulting artifact file, which opens the following output in your terminal:

Figure 3.24: Initial replay screen

You can press 0 to display details of the play. The output shows the failed results in each of the managed nodes. In this case, to simplify and show the problem, the play only has one task. If there is more than one task, they are listed in this output.

Figure 3.25: Play results by managed node and task

It looks like the Run command on remote managed nodes task failed on both managed nodes. By entering 0 or 1 you can see the failure for the first or the second managed node, respectively.

The following terminal output shows the failed result for the first managed node. You find a similar result for the other node.

Figure 3.26: Task results for one of the managed nodes

The task is attempting to use the ansible.netcommon.cli_command module to execute the show interfaces command on the managed node, but the connection type used by this playbook is not valid for that module.

You might discover that the ansible.netcommon.netconf connection type was configured as the default for the Junos managed nodes in the inventory file:

[junos:vars]
ansible_user=developer
ansible_ssh_private_key_file=~/.ssh/lab_rsa
ansible_connection=ansible.netcommon.netconf
ansible_network_os=junipernetworks.junos.junos

Using the same play but with the ansible.netcommon.network_cli connection type, the result is successful on both managed nodes.

Figure 3.27: Successful task results in the managed nodes

Important

You might not want to save playbook artifacts for several reasons:

  • You are concerned about sensitive information being saved in the log file.

  • You need to provide interactive input to automation content navigator.

  • You do not want the files to clutter up the project directory.

You can keep the files from being generated by disabling the playbook artifacts in the ansible-navigator.yml configuration file inside the project directory:

ansible-navigator:
  playbook-artifact:
    enable: false

Logging Output to a Text File

Use the logging key in the ansible-navigator.yml configuration file to set the parameters related to logs that are controlled by automation content navigator.

The following parameters apply to the logging key:

append

The default is true and means that the log messages are appended to an existing log file. If you set this value to false then every new log message creates a new log file.

file

Specifies the full path to the log file.

level

Choose between debug, info, warning, error, and critical levels to show in the log file.

Important

If you configure automation content navigator to write log files to /var/log, then Red Hat recommends that you configure logrotate to manage them.

The following example shows an automation content navigator configuration where you specify a debug log level for troubleshooting:

ansible-navigator:
  logging:
    append: true
    file: /tmp/ansible-navigator.log
    level: debug

Debugging in VS Code

When using VS Code to create or edit playbooks, you can see syntax errors as you write, find information for the modules or collections in your plays, or detect if the playbook breaks a predefined Ansible Lint rule.

You first need to install the Ansible and YAML VS Code extensions in VS Code. Adjust the VS Code extension settings for Ansible to use your ee-supported-rhel8 automation execution environment. This step is required because Ansible Lint is an operating system package included in it.

To adjust the Ansible extension settings, navigate to ViewExtensions in VS Code. Select the Ansible extension, click the Manage icon for the extension, and then click Extension Settings.

Figure 3.28: Adjusting the VS Code extension settings for Ansible

Scroll down to the Ansible>Execution Environment:Enabled settings option and enable the use of an automation execution environment. Specify the automation execution environment to use in the Ansible>Execution Environment:Image settings option. Adjust other automation execution environment related settings for your preference.

Figure 3.29: Automation execution environment settings section

Close the Settings tab and any other tabs that you opened.

Consider the following playbook in VS Code:

Figure 3.30: Playbook with issues opened in VS Code

The playbook is analyzed in levels: first the plays, then the attributes, and so on. The scan stops at the first problem found. Notice that for this example there are several indications regarding one issue in the playbook.

  • The second line of the playbook is underlined in red, indicating an issue. If you hover over the line, a message indicates that the host attribute is not valid for the play.

Figure 3.31: Ansible Lint checker in VS Code
  • The color for the host attribute is different from the color for the name, gather_facts, and tasks attributes. In the example, you are using VS Code to verify an existing playbook, but when writing a new file, the color can be an indicator to avoid you making a mistake.

Figure 3.32:

Issue related to the host attribute

  • The name of the playbook is in red. If you hover over the number at the end of the playbook name line, a message indicates that there is one problem in this file.

Figure 3.33: Showing the number of problems in the file
  • The Status Bar in VS Code displays a summary with the result of these checks. The first symbol in this bar shows the number of problems found in the playbook.

As you can see, in a single view you can detect that there is at least one problem with the playbook. VS Code also gives you a direct view into the problems found.

From VS Code, select ViewProblems to see the issues detected.

Figure 3.34: Section in VS Code to show the problems found

If you fix the hosts attribute in the example playbook, the first thing that happens is that the attribute acquires the same color as the rest of the play attributes. After saving the file, the Status Bar shows the starting of a new Ansible Lint verification.

Figure 3.35: Status Bar in VS Code indicates that Ansible Lint is verifying the file

In this example, new issues are detected in the playbook after you fix the hosts attribute for the play.

Figure 3.36: New detected issues in the playbook

Ansible Lint in VS Code works similarly as on the command line. You review each one of the detected problems, and fix them. An advantage is that you have access to more information in a simple way.

For example, the first issue now detected corresponds to not using the FQCN when referring to the ios_command module in the play. You can hover over the module to obtain the FQCN, its description, and additional notes.

Figure 3.37:

Information about the ios_command module.

Another advantage is the autocomplete feature, which provides you with suggestions that avoid making mistakes when typing. All these suggestions are based on the collections inside the automation execution environment image configured in VS Code.

Figure 3.38: The autocomplete VS Code feature provides suggestions to use in the playbook.

Revision: do457-2.3-7cfa22a