DO457 - ch03s07

Connectivity issues, such as routing issues from the Ansible control node to the managed nodes.
Authentication issues, such as an incorrect username, password, or SSH key credentials.
Timeout issues, where commands take longer to run than the configured timeout value. A timeout issue might also be due to an authentication or connectivity issue.

Isolating and Resolving Connectivity Issues

Use the platform specific *_ping modules to verify connectivity to managed nodes.

For example, use the ios_ping module from the cisco.ios collection to ping Cisco managed nodes:

---
- name: Ping Cisco managed nodes
  hosts: ios
  gather_facts: false
  tasks:
    - name: Ping Cisco managed nodes
      cisco.ios.ios_ping:
        dest: "{{ inventory_hostname }}"
        count: 5

If a ping task fails, you see an error indicating the root cause of the issue.

For example, the following error indicates that the managed node is unreachable via the IP address or hostname provided:

TASK [Ping Cisco managed nodes] **************************************************
fatal: [iosxe1.lab.example.com]: FAILED! => {"changed": false, "msg": "ssh connection failed: ssh connect failed: Timeout connecting to iosxe1.lab.example.com"}

Verifying Platform Options

When writing playbooks for network automation, always verify that you are using the correct setting for each network platform option, such as ansible_connection and ansible_network_os.

Visit the network platform documentation at https://docs.ansible.com/ansible/latest/network/user_guide/platform_index.html to verify that you are using the correct settings for each network platform.

Isolating and Resolving Authentication Issues

When you run a playbook, it might fail with the error message No authentication methods available, or another error indicating that authentication to the managed node has failed.

TASK [Change configuration on managed nodes] *************************************
fatal: [junos1.lab.example.com]: FAILED! => {"changed": false, "msg": "No authentication methods available"}

This message means that the Ansible control node where you ran the playbook cannot connect via password or SSH key to the managed node.

SSH Key Issues

If you cannot connect to managed nodes using SSH keys, you can still connect by adding the -k option to the ansible-navigator run command. This option prompts for the connection password.

Because the -k option only prompts for one connection password, problems can occur if the managed nodes use different passwords.

One solution is to ensure that the hosts targeted by the playbook use the same connection password. This might mean having separate playbooks for the various types of network devices.

Another solution is to use a single playbook, but use the --limit option to restrict which managed nodes are targeted by the playbook:

[user@host ~]$ ansible-navigator run snmp.yml --limit junos1.lab.example.com

You can explicitly define the location of the SSH key for an inventory host or host group. Define the SSH key location by setting the ansible_ssh_private_key_file variable with the path to the appropriate SSH key.

Important

Although setting the ansible_ssh_private_key_file variable works from the CLI, it does not work from automation controller if you import the inventory (and associated variables).

Automation controller can fail with a message similar to: No such file or directory: '/home/runner/.ssh/lab_rsa'.

Troubleshooting Privilege Escalation

A possible authentication error is not being able to enter the privileged configuration mode of a managed node.

To resolve this issue, ensure that you set the ansible_become variable to true and the ansible_become_method variable to enable.

Note

For network automation playbooks, the only valid value for the ansible_become_method option is enable.

[user@host ~]$ cat inventory
...output omitted...
[ios:vars]
ansible_connection=ansible.netcommon.network_cli
ansible_network_os=cisco.ios.ios
ansible_become=true
ansible_become_method=enable
...output omitted...

Isolating and Resolving Timeout Issues

You might see the following error if the amount of time to wait for a command from the managed node is too low for a task to succeed:

fatal: [junos1.lab.example.com]: FAILED! => {"changed": false, "msg": "command
 timeout triggered, timeout value is 30 secs.\nSee the timeout setting options in
 the Network Debug and Troubleshooting Guide."}

You also might see a timeout error due to authentication or DNS resolution errors.

To troubleshoot these types of errors, you can set the value of the ansible_command_timeout variable for a play or a task.

- name: Save the running-config
  vars:
    ansible_command_timeout: 120
  cisco.ios.ios_command:
    commands:
      - copy running-config startup-config

The related environment variable is the ANSIBLE_PERSISTENT_COMMAND_TIMEOUT variable. This environment variable is set to 30 seconds by default. You can use the export command to modify the value of the ANSIBLE_PERSISTENT_COMMAND_TIMEOUT environment variable in your current session:

export ANSIBLE_PERSISTENT_COMMAND_TIMEOUT=120

Using the Long Form of Commands

Always use the long form of commands. For example, use the command no shutdown versus no shut.

When committing a configuration, the network OS converts the short form of commands into the long form of commands. For example, if you specify the command shut, then the configuration line is stored as shutdown.

The *_config module compares the command in your task to the line in the configuration. If you use the short form of the command, then the module returns changed=true every time the play runs, even though the configuration might already be correct. This means that your task is not idempotent and the configuration is updated every time the task runs.

To avoid this problem, use the long form of commands when using *_config modules in your tasks. You can still abbreviate the configuration statements in the play name if desired.

---
- name: Change interface configuration
  hosts: ios
  gather_facts: false
  tasks:
    - name: Shutdown interface Gig0/1
      cisco.ios.ios_config:
        lines:
          - shutdown
        parents: interface GigabitEthernet0/1

References

Network Debug and Troubleshooting Guide - Ansible Documentation

Platform Options - Ansible Documentation

Discuss Network Automation with Red Hat Ansible Automation Platform

Go to community

Welcome to the Red Hat Ansible for Network Automation (DO457) group!

Syed

26 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat Ansible for Network Automation! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to DO457.Read more about Red Hat Ansible for Network Automation here.

3120

Revision: do457-2.3-7cfa22a