Bookmark this page

Guided Exercise: Troubleshooting Ansible Network Communication

Troubleshoot problems with communication between your automation execution environment and your managed network nodes.

Outcomes

  • Troubleshoot and resolve network communication issues.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise, and to ensure that all required resources are available. This command also creates a project directory with the files needed for the exercise.

[student@workstation ~]$ lab start run-communication

Instructions

You have been provided with a playbook called snmp.yml, which runs against the junos inventory group of managed nodes.

Table 3.5. Managed Nodes in the junos Inventory Group

Managed nodeIP address
junos1.lab.example.com 172.25.250.22
junos2.lab.example.com 172.25.250.23

Table 3.6. DNS Name Server IP Address and Domain Name

Name server IP addressDomain name
172.25.250.220 lab.example.com

The playbook performs a connectivity verification, and then configures the Simple Network Management Protocol (SNMP) in the Juniper Junos managed nodes. You were told that the user account used to connect to the managed nodes is student with Student as the password.

The exercise also provides you with three additional playbooks, which you can use to troubleshoot connectivity issues:

  • The netconf.yml playbook enables and configures the Network Configuration Protocol (NETCONF) system service. The playbook then checks the connectivity by waiting five seconds for port 830 to become open on the managed nodes.

  • The dns.yml playbook configures the Domain Name System (DNS) service. The playbook then verifies and displays the managed node DNS configuration.

  • The ping.yml playbook tests that a remote destination is reachable from the Juniper Junos managed nodes.

Based on the error messages provided by Ansible when you run the snmp.yml playbook, troubleshoot the network communication issues. Ensure that there are no connectivity problems if more Juniper Junos managed nodes are added to the junos inventory group.

  1. Open the /home/student/run-communication directory in VS Code and review the snmp.yml playbook.

    1. Open VS Code and then click FileOpen Folder.

    2. Navigate to Homerun-communication and then click Open.

      Note

      If prompted, select Trust the authors of all files in the parent folder 'student', and then click Yes, I trust the authors.

    3. Click the snmp.yml playbook. Notice that the playbook consists of one play with two tasks to execute against the junos inventory group of managed nodes:

      ---
      - name: Configure SNMP on Juniper managed nodes
        hosts: junos
        gather_facts: false
        tasks:
          - name: Verify connectivity between the managed nodes
             vars:
              ansible_connection: ansible.netcommon.network_cli
            junipernetworks.junos.junos_ping:
              dest: "{{ inventory_hostname }}"
              count: 5
      
          - name: Configure SNMP on managed nodes
            junipernetworks.junos.junos_snmp_server:
              state: merged
              config:
                location: 'Raleigh, NC'
                contact: 'Network Engineering | neteng@company.com'
                description: "SNMP Server"
                communities:
                  - name: rocommunity2n4g!
                    authorization: read-only
                  - name: rwcommunityd7g$v
                    authorization: read-write
  2. Try to run the snmp.yml playbook. The playbook fails at the Configure SNMP on managed nodes task for the junos1.lab.example.com managed node, and at the Verify connectivity between the managed nodes task for the junos2.lab.example.com managed node.

    Isolate the error found for the junos1.lab.example.com managed node. By isolating each error, you can troubleshoot the issues you find one by one.

    1. Switch to the Terminal tab in VS Code, or change to the /home/student/run-communication directory in a GNOME terminal:

      [student@workstation ~]$ cd ~/run-communication
      [student@workstation run-communication]$
    2. Use the ansible-navigator run command to run the snmp.yml playbook. The output shows a fatal result for both managed nodes:

      [student@workstation run-communication]$ ansible-navigator run snmp.yml
      
      PLAY [Configure SNMP on Juniper managed nodes] ***********************************
      
      TASK [Verify connectivity between the managed nodes] *****************************
      ok: [junos1.lab.example.com]
      fatal: [junos2.lab.example.com]: FAILED! => {"changed": false, "module_stderr": "expected string or bytes-like object", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}
      ok: [junos1.lab.example.com]
      
      TASK [Configure SNMP on managed nodes] *******************************************
      fatal: [junos1.lab.example.com]: FAILED! => {"changed": false, "msg": "Could not open socket to junos1.lab.example.com:830"}
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com     : ok=1    changed=0    unreachable=0    failed=1    ...
      junos2.lab.example.com     : ok=0    changed=0    unreachable=0    failed=1    ...
      
      Please review the log for errors.
    3. Run the playbook using the --limit option to isolate the error message for the junos1.lab.example.com managed node:

      [student@workstation run-communication]$ ansible-navigator run snmp.yml \
      --limit junos1.lab.example.com
      
      PLAY [Configure SNMP on Juniper managed nodes] ***********************************
      
      TASK [Verify connectivity between the managed nodes] *****************************
      ok: [junos1.lab.example.com]
      
      TASK [Configure SNMP on managed nodes] *******************************************
      fatal: [junos1.lab.example.com]: FAILED! => {"changed": false, "msg": "Could not open socket to junos1.lab.example.com:830"}
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com     : ok=0    changed=0    unreachable=0    failed=1    ...
      
      Please review the log for errors.
  3. Troubleshoot the Could not open socket to junos1.lab.example.com:830 error message.

    1. (Optional) Use the ssh command to determine if NETCONF over SSH is enabled. The error message indicates that the SSH connection to port 830 on the managed node was refused. That might indicate that the NETCONF service on the managed node is disabled.

      [student@workstation run-communication]$ ssh junos1.lab.example.com -p 830 \
      -s netconf
      ssh: connect to host junos1.lab.example.com port 830: Connection refused

      When you have many managed nodes where you need to configure or verify connectivity, running the provided netconf.yml playbook is a better option than using the ssh command.

    2. Open the netconf.yml playbook in VS Code. Notice that the first task in the playbook performs the NETCONF configuration. This task is currently commented out. The second task validates that there is indeed connectivity.

      ---
      - name: Networking - netconf
        hosts: junos
        gather_facts: false
        tasks:
      #    - name: Enable netconf service on port 830
      #      vars:
      #        ansible_connection: ansible.netcommon.network_cli
      #      junipernetworks.junos.junos_netconf:
      #        netconf_port: 830
      #        state: present
      
          - name: Checking connectivity
            ansible.builtin.wait_for:
              host: "{{ inventory_hostname }}"
              port: 830
              timeout: 5
    3. Run the netconf.yml playbook against the junos1.lab.example.com managed node. The timeout when waiting for an answer indicates that after five seconds, port 830 was not available on the managed node.

      [student@workstation run-communication]$ ansible-navigator run netconf.yml \
      --limit junos1.lab.example.com
      
      PLAY [Networking - netconf] ******************************************************
      
      TASK [Checking connectivity] *****************************************************
      fatal: [junos1.lab.example.com]: FAILED! => {"changed": false, "elapsed": 5, "msg": "Timeout when waiting for junos1.lab.example.com:830"}
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com     : ok=0    changed=0    unreachable=0    failed=1    ...
      
      Please review the log for errors.
    4. Uncomment the task that configures the NETCONF service in the netconf.yml playbook and then save the playbook. Run the netconf.yml playbook again against the junos1.lab.example.com managed node. Both tasks in the playbook complete successfully.

      [student@workstation run-communication]$ ansible-navigator run netconf.yml \
      --limit junos1.lab.example.com
      
      PLAY [Networking - netconf] ******************************************************
      
      TASK [Enable netconf service on port 830] ****************************************
      changed: [junos1.lab.example.com]
      
      TASK [Checking connectivity] *****************************************************
      ok: [junos1.lab.example.com]
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com     : ok=2    changed=1    unreachable=0    failed=0    ...
    5. Add the NETCONF configuration task to the snmp.yml playbook to avoid future errors related to the missing NETCONF configuration for new Juniper Junos managed nodes added to your environment.

      Edit the snmp.yml file and add the Enable netconf service on port 830 task before the Verify connectivity between the managed nodes task, and then save the playbook:

      ---
      - name: Configure SNMP on Juniper managed nodes
        hosts: junos
        gather_facts: false
        tasks:
          - name: Enable netconf service on port 830
            vars:
              ansible_connection: ansible.netcommon.network_cli
            junipernetworks.junos.junos_netconf:
              netconf_port: 830
              state: present
      
          - name: Verify connectivity between the managed nodes
      ...output omitted...
  4. Run the snmp.yml playbook again against the junos1.lab.example.com managed node and troubleshoot the new error message obtained.

    1. Run the snmp.yml playbook against the junos1.lab.example.com managed node. Although the playbook still stops at the same task, the error regarding NETCONF is no longer displayed.

      Note

      Depending on the connection to the workstation machine, you can obtain the SSHException('No existing session') or the No authentication methods available error message. The root cause of the issue is the same.

      [student@workstation run-communication]$ ansible-navigator run snmp.yml \
      --limit junos1.lab.example.com
      
      PLAY [Configure SNMP on Juniper managed nodes] ***********************************
      
      TASK [Enable netconf service on port 830] ****************************************
      ok: [junos1.lab.example.com]
      
      TASK [Verify connectivity between the managed nodes] *****************************
      ok: [junos1.lab.example.com]
      
      TASK [Configure SNMP on managed nodes] *******************************************
      fatal: [junos1.lab.example.com]: FAILED! => {"changed": false, "msg": "SSHException: No existing session"}
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com     : ok=2    changed=0    unreachable=0    failed=1    ...
      
      Please review the log for errors.
    2. Because you were told that the student user uses Student as the password to connect to the managed nodes, one of the following reasons is causing the failure:

      • The Student password is incorrect.

      • The Student password is not defined in your project.

    3. Use the ssh command to log in to the junos1.lab.example.com managed node. Add the -o PreferredAuthentications=password option to authenticate using a password. When prompted, enter Student as the SSH connection password.

      Because the connection is successful, the Student provided password is correct. Exit from the managed node:

      [student@workstation run-communication]$ ssh junos1.lab.example.com \
      -o PreferredAuthentications=password
      student@junos1.lab.example.com's password: Student
      Last login: Wed Jun 21 19:02:00 2023 from 172.25.250.9
      --- JUNOS 23.1R1.8 Kernel 64-bit  JNPR-12.1-20230307.3e7c4b6_buil
      student> exit
      
      Connection to junos1.lab.example.com closed.
      [student@workstation run-communication]$
    4. Open the inventory file in VS Code. Inspect the variables for the junos inventory group used to authenticate when connecting to the managed nodes. Notice that the only defined authentication variable is ansible_user. The playbook fails because it does not have the password to use when it tries to connect to the managed node:

      [junos]
      junos1.lab.example.com
      junos2.lab.example.com
      
      [junos:vars]
      ansible_user=student
      ansible_connection=ansible.netcommon.netconf
      ansible_network_os=junipernetworks.junos.junos

      It is not good practice to store passwords in plain text, so leave the inventory file as it is. You can use the -k option for the ansible-navigator command to type the authentication password when you run the playbook.

    5. Run the snmp.yml playbook against the junos1.lab.example.com managed node. Add the -k option to the ansible-navigator run command and enter Student as the SSH connection password when prompted.

      The playbook runs successfully.

      [student@workstation run-communication]$ ansible-navigator run snmp.yml \
      --limit junos1.lab.example.com -k
      SSH password: Student
      
      PLAY [Configure SNMP on Juniper managed nodes] ***********************************
      
      TASK [Enable netconf service on port 830] ****************************************
      ok: [junos1.lab.example.com]
      
      TASK [Verify connectivity between the managed nodes] *****************************
      ok: [junos1.lab.example.com]
      
      TASK [Configure SNMP on managed nodes] *******************************************
      changed: [junos1.lab.example.com]
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com     : ok=3    changed=1    unreachable=0    failed=0    ...
  5. Run the snmp.yml playbook against the junos2.lab.example.com managed node to isolate the communication issue with this managed node. Investigate the connectivity error.

    1. Run the snmp.yml playbook against the junos2.lab.example.com managed node. Add the -k option to the ansible-navigator command. When prompted, enter Student as the password. The playbook fails in the Verify connectivity between the managed nodes task:

      [student@workstation run-communication]$ ansible-navigator run snmp.yml \
      --limit junos2.lab.example.com -k
      SSH password: Student
      
      PLAY [Configure SNMP on Juniper managed nodes] ***********************************
      
      TASK [Enable netconf service on port 830] ****************************************
      ok: [junos2.lab.example.com]
      
      TASK [Verify connectivity between the managed nodes] *****************************
      fatal: [junos2.lab.example.com]: FAILED! => {"changed": false, "module_stderr": "expected string or bytes-like object", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}
      
      PLAY RECAP ***********************************************************************
      junos2.lab.example.com     : ok=0    changed=0    unreachable=0    failed=1    ...
    2. In VS Code, open the ping.yml file. Notice that the playbook uses the junipernetworks.junos.junos_ping module to ping the managed nodes in the junos inventory group:

      ---
      - name: Networking - ping
        hosts: junos
        gather_facts: false
        tasks:
          - name: Ping managed nodes
            vars:
              ansible_connection: ansible.netcommon.network_cli
            junipernetworks.junos.junos_ping:
              dest: "{{ inventory_hostname }}"
              count: 5
    3. Run the ping.yml playbook against the junos2.lab.example.com managed node. The error message when you run the ping.yml playbook is the same as that received when you run the snmp.yml playbook. This is because the snmp.yml uses the junipernetworks.junos.junos_ping module to verify the connectivity between the managed nodes:

      [student@workstation run-communication]$ ansible-navigator run ping.yml \
      --limit junos2.lab.example.com
      
      PLAY [Networking - ping] *********************************************************
      
      TASK [Ping managed nodes] ********************************************************
      fatal: [junos2.lab.example.com]: FAILED! => {"changed": false, "module_stderr": "expected string or bytes-like object", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}
      
      PLAY RECAP ***********************************************************************
      junos2.lab.example.com     : ok=0    changed=0    unreachable=0    failed=1    ...
    4. Connect to the junos2.lab.example.com managed node and ping the managed nodes:

      [student@workstation run-communication]$ ssh junos2.lab.example.com
      ...output omitted...
      --- JUNOS 23.1R1.8 Kernel 64-bit  JNPR-12.1-20230307.3e7c4b6_buil
      student@junos2.lab.example.com> ping junos1.lab.example.com
      ping: cannot resolve junos1.lab.example.com: Host name lookup failure
      
      student@junos2.lab.example.com> ping junos2.lab.example.com
      ping: cannot resolve junos2.lab.example.com: Host name lookup failure
      
      student@junos2.lab.example.com>
    5. The messages suggest that the problem is related to the DNS configuration on the junos2.lab.example.com managed node. Exit from the junos2.lab.example.com managed node:

      [student@junos2.lab.example.com> exit
      
      Connection to junos2.lab.example.com closed.
      [student@workstation run-communication]$
  6. Solve the connectivity issue for the junos2.lab.example.com managed node.

    1. Open the dns.yml playbook in VS Code. The first task in the playbook enables the NETCONF service in the managed node. This task is required when using the junipernetworks.junos.junos_system module to configure DNS on Juniper Junos managed nodes. The second task performs the DNS configuration. Both the first and second tasks are commented out in the playbook.

      The last two tasks in the playbook verify and display the configuration in each managed node:

      ---
      - name: Networking - dns
        hosts: junos
        gather_facts: false
        tasks:
      #    - name: Enable netconf service on port 830
      #      vars:
      #        ansible_connection: ansible.netcommon.network_cli
      #      junipernetworks.junos.junos_netconf:
      #        netconf_port: 830
      #        state: present
      
      #    - name: Configure DNS settings
      #      junipernetworks.junos.junos_system:
      #        name_servers: 172.25.250.220
      
          - name: Checking DNS configuration
            junipernetworks.junos.junos_command:
              commands:
                - show configuration system name-server
            register: junos_dns
      
          - name: Display DNS configuration
            ansible.builtin.debug:
              var: junos_dns['stdout']
    2. Run the dns.yml playbook to verify and compare any differences in the DNS configuration for each managed node. Add the -k option to the ansible-navigator command. When prompted, enter Student as the password.

      Note that the DNS configuration is missing in the junos2.lab.example.com managed node:

      [student@workstation run-communication]$ ansible-navigator run dns.yml -k
      SSH password: Student
      ...output omitted...
      TASK [Display DNS configuration] *************************************************
      ok: [junos1.lab.example.com] => {
          "junos_dns['stdout']": [
              "172.25.250.220;"
          ]
      }
      ok: [junos2.lab.example.com] => {
          "junos_dns['stdout']": [
              ""
          ]
      }
      ...output omitted...
    3. Uncomment the first two tasks for configuring DNS in the dns.yml playbook, and save the playbook. Run the dns.yml playbook against the junos2.lab.example.com managed node. Add the -k option to the ansible-navigator command and enter Student as the password when prompted:

      [student@workstation run-communication]$ ansible-navigator run dns.yml \
      --limit junos2.lab.example.com -k
      SSH password: Student
      
      PLAY [Networking - dns] **********************************************************
      
      TASK [Enable netconf service on port 830] ****************************************
      ok: [junos2.lab.example.com]
      
      TASK [Configure DNS settings] ****************************************************
      changed: [junos2.lab.example.com]
      
      TASK [Checking DNS configuration] ************************************************
      ok: [junos2.lab.example.com]
      
      TASK [Display DNS configuration] *************************************************
      ok: [junos2.lab.example.com] => {
          "junos_dns['stdout']": [
              "172.25.250.220;"
          ]
      }
      
      PLAY RECAP ***********************************************************************
      junos2.lab.example.com     : ok=4    changed=1    unreachable=0    failed=0    ...

      The Configure DNS settings task runs successfully on the junos2.lab.example.com managed node. This time the managed node has the correct DNS configuration.

    4. Verify that the connectivity issue for the junos2.lab.example.com has been solved by running the ping.yml playbook again. The playbook finishes successfully.

      [student@workstation run-communication]$ ansible-navigator run ping.yml \
      --limit junos2.lab.example.com
      
      PLAY [Networking - ping] *********************************************************
      
      TASK [Ping managed nodes] ********************************************************
      ok: [junos2.lab.example.com]
      
      PLAY RECAP ***********************************************************************
      junos2.lab.example.com     : ok=1    changed=0    unreachable=0    failed=0    ...
    5. Add the Configure DNS settings task to the snmp.yml playbook to avoid issues related to the missing DNS configuration on any new Juniper Junos managed node that might be added to the junos inventory group.

      Because the DNS configuration requires enabling the netconf service, add the task after the Enable netconf service on port 830 task and before the Verify connectivity between the managed nodes task. When done, save the playbook:

      ...output omitted...
        tasks:
          - name: Enable netconf service on port 830
            vars:
              ansible_connection: ansible.netcommon.network_cli
            junipernetworks.junos.junos_netconf:
              netconf_port: 830
              state: present
      
          - name: Configure DNS settings
            junipernetworks.junos.junos_system:
              name_servers: 172.25.250.220
      
          - name: Verify connectivity between the managed nodes
      ...output omitted...
  7. Verify that the playbook runs successfully for all the Juniper Junos managed nodes. Close the /home/student/run-communication directory in VS Code. If you are using the GNOME terminal, return to the /home/student directory.

    1. Run the snmp.yml playbook without isolating any managed node. Add the -k option to the ansible-navigator command. When prompted, enter Student as the password. The playbook runs successfully.

      [student@workstation run-communication]$ ansible-navigator run snmp.yml -k
      SSH password: Student
      
      PLAY [Configure SNMP on Juniper managed nodes] ***********************************
      
      TASK [Enable netconf service on port 830] ****************************************
      ok: [junos1.lab.example.com]
      ok: [junos2.lab.example.com]
      
      TASK [Configure DNS settings] ****************************************************
      ok: [junos1.lab.example.com]
      ok: [junos2.lab.example.com]
      
      TASK [Verify connectivity between the managed nodes] *****************************
      ok: [junos1.lab.example.com]
      ok: [junos2.lab.example.com]
      
      TASK [Configure SNMP on managed nodes] *******************************************
      ok: [junos1.lab.example.com]
      changed: [junos2.lab.example.com]
      
      PLAY RECAP ***********************************************************************
      junos1.lab.example.com    : ok=4    changed=0    unreachable=0    failed=0    ...
      junos2.lab.example.com    : ok=4    changed=1    unreachable=0    failed=0    ...
    2. Click FileClose Folder in VS Code to close the /home/student/run-communication directory.

    3. If you are using the GNOME terminal, run the cd command to return to the student home directory:

      [student@workstation run-communication]$ cd

Finish

On the workstation machine, use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish run-communication

Revision: do457-2.3-7cfa22a