Bookmark this page

Guided Exercise: Troubleshooting Ansible Managed Hosts

In this exercise, you troubleshoot task failures that are occurring on one of your managed hosts when running a playbook.

Outcomes

  • You should be able to troubleshoot managed hosts.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

This command prepares your environment and ensures that all required resources are available.

[student@workstation ~]$ lab start troubleshoot-host

Procedure 8.2. Instructions

  1. Change into the /home/student/troubleshoot-host/ directory.

    [student@workstation ~]$ cd ~/troubleshoot-host/
    [student@workstation troubleshoot-host]$
  2. Run the mailrelay.yml playbook using check mode.

    [student@workstation troubleshoot-host]$ ansible-navigator run \
    > -m stdout mailrelay.yml --check
    PLAY [Create mail relay servers] ***********************************************
    ...output omitted...
    
    TASK [Check main.cf file] ******************************************************
    ok: [servera.lab.example.com]
    
    TASK [Verify main.cf file exists] **********************************************
    ok: [servera.lab.example.com]  => {
        "msg": "The main.cf file exists"
    }
    ...output omitted...
    
    TASK [Start and enable mail services] ******************************************
    fatal: [servera.lab.example.com]: FAILED! => {"changed": false, "msg": "Could not find the requested service postfix: host"}
    ...output omitted...
    PLAY RECAP *********************************************************************
    servera.lab.example.com    : ok=5    changed=2    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

    The verify main.cf file exists task uses the ansible.builtin.stat module. It confirms that main.cf exists on the servera.lab.example.com host.

    The Start and enable mail services task failed. It could not start the postfix service because you ran the playbook using check mode and therefore the play did not install the postfix package.

    Important

    The task failed because earlier tasks in the play did not ensure that postfix was installed on the servera host, because you ran the playbook in check mode. This failure happened because the playbook did not actually make changes to the host that it normally would have if you ran it normally.

  3. Run the playbook again, but without specifying check mode. The error in the Start and enable mail services task should disappear and the playbook should run successfully.

    [student@workstation troubleshoot-host]$ ansible-navigator run \
    > -m stdout mailrelay.yml
    
    PLAY [Create mail relay servers] ***********************************************
    ...output omitted...
    
    TASK [Check main.cf file] ******************************************************
    ok: [servera.lab.example.com]
    
    TASK [Verify main.cf file exists] **********************************************
    ok: [servera.lab.example.com] => {
        "msg": "The main.cf file exists"
    }
    
    TASK [Start and enable mail services] ******************************************
    changed: [servera.lab.example.com]
    ...output omitted...
    
    PLAY RECAP *********************************************************************
    servera.lab.example.com    : ok=8    changed=5    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
  4. Edit the mailrelay.yml playbook and add a task to enable the smtp service through the firewall. Add the task as the last task, before the handlers.

    ...output omitted...
        - name: Postfix firewalld config
          ansible.posix.firewalld:
            state: enabled
            permanent: true
            immediate: true
            service: smtp
    ...output omitted...
  5. Run the mailrelay.yml playbook. The postfix firewalld config task runs with no errors.

    [student@workstation troubleshoot-host]$ ansible-navigator run \
    > -m stdout mailrelay.yml
    PLAY [Create mail relay servers] ***********************************************
    ...output omitted...
    TASK [Postfix firewalld config] ************************************************
    changed: [servera.lab.example.com]
    
    PLAY RECAP *********************************************************************
    servera.lab.example.com    : ok=8    changed=2    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
  6. Use telnet to test if the SMTP service is listening on port TCP/25 on the servera.lab.example.com host. Disconnect when you are finished.

    [student@workstation troubleshoot-host]$ telnet servera.lab.example.com 25
    Trying 172.25.250.10...
    Connected to servera.lab.example.com.
    Escape character is '^]'.
    220 servera.lab.example.com ESMTP Postfix
    quit
    221 2.0.0 Bye
    Connection closed by foreign host.
  7. Run the samba.yml playbook. The first task fails with an error related to an SSH connection problem.

    [student@workstation troubleshoot-host]$ ansible-navigator run \
    > -m stdout samba.yml
    
    PLAY [Install a samba server] **************************************************
    
    TASK [Gathering Facts] *********************************************************
    fatal: [servera.lab.exammple.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host servera.lab.exammple.com port 22: Connection timed out", "unreachable": true}
    
    PLAY RECAP *********************************************************************
    servera.lab.exammple.com   : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
    Please review the log for errors.
  8. Make sure that you can connect to the servera.lab.example.com managed host as the devops user using SSH, and that the correct SSH keys are in place. Log off again when you have finished.

    [student@workstation troubleshoot-host]$ ssh devops@servera.lab.example.com
    ...output omitted...
    [devops@servera ~]$ exit
    logout
    Connection to servera.lab.example.com closed.

    That is working normally.

  9. Test to see if you can run modules on the servera.lab.example.com managed host by using an ad hoc command that runs the ansible.builtin.ping module.

    [student@workstation troubleshoot-host]$ ansible servera.lab.example.com \
    > -m ansible.builtin.ping
    servera.lab.example.com | SUCCESS => {
        "ansible_facts": {
            "discovered_interpreter_python": "/usr/bin/python3"
        },
        "changed": false,
        "ping": "pong"
    }

    Based on the preceding output, that is also working, and successfully connected to the managed host.

    This should suggest to you that the problem is not with the SSH configuration and credentials, or with the ad hoc command that you used. So the question now is why the ad hoc command worked and the ansible-navigator command did not. There might be a problem with the play in the playbook, or with the inventory.

  10. Rerun the samba.yml playbook with -vvvv to get more information about the run. An error is issued because the servera.lab.example.com managed host is not reachable.

    [student@workstation troubleshoot-host]$ ansible-navigator run \
    > -m stdout -vvvv samba.yml
    ansible-playbook [core 2.13.0]
      config file = /home/student/troubleshoot-host/ansible.cfg
      configured module search path = ['/home/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
      ansible python module location = /usr/lib/python3.9/site-packages/ansible
      ansible collection location = /home/runner/.ansible/collections:/usr/share/ansible/collections
      executable location = /usr/bin/ansible-playbook
      python version = 3.9.7 (default, Sep 13 2021, 08:18:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
      jinja version = 3.0.3
      libyaml = True
    Using /home/student/troubleshoot-host/ansible.cfg as config file
    ...output omitted...
    
    PLAYBOOK: samba.yml ************************************************************
    Positional arguments: /home/student/troubleshoot-host/samba.yml
    verbosity: 4
    connection: smart
    timeout: 10
    become_method: sudo
    tags: ('all',)
    inventory: ('/home/student/troubleshoot-host/inventory',)
    forks: 5
    1 plays in /home/student/troubleshoot-host/samba.yml
    
    PLAY [Install a samba server] **************************************************
    
    TASK [Gathering Facts] *********************************************************
    task path: /home/student/troubleshoot-host/samba.yml:2
    <servera.lab.exammple.com> ESTABLISH SSH CONNECTION FOR USER: devops
    ...output omitted...
    fatal: [servera.lab.exammple.com]: UNREACHABLE! => {
        "changed": false,
        "msg": "Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL 1.1.1k  FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/runner/.ssh/config\r\ndebug1: /home/runner/.ssh/config line 1: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host servera.lab.exammple.com originally servera.lab.exammple.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/runner/.ssh/config\r\ndebug1: /home/runner/.ssh/config line 1: Applying options for *\r\ndebug2: add_identity_file: ignoring duplicate key ~/.ssh/lab_rsa\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host servera.lab.exammple.com originally servera.lab.exammple.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: Control socket \"/home/runner/.ansible/cp/d4775f48c9\" does not exist\r\ndebug2: resolving \"servera.lab.exammple.com\" port 22\r\ndebug2: ssh_connect_direct\r\ndebug1: Connecting to servera.lab.exammple.com [3.130.253.23] port 22.\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug1: connect to address 3.130.253.23 port 22: Connection timed out\r\ndebug1: Connecting to servera.lab.exammple.com [3.130.204.160] port 22.\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug1: connect to address 3.130.204.160 port 22: Connection timed out\r\nssh: connect to host servera.lab.exammple.com port 22: Connection timed out",
        "unreachable": true
    }
    
    PLAY RECAP *********************************************************************
    servera.lab.exammple.com   : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
    Please review the log for errors.
  11. Investigate the inventory file for errors.

    If you look at the [samba_servers] group, servera.lab.example.com is misspelled (with an extra m). Correct this error as shown below:

    [samba_servers]
    servera.lab.example.com
    ...output omitted...
  12. Run the playbook again and all tasks should succeed.

    [student@workstation troubleshoot-host]$ ansible-navigator run \
    > -m stdout samba.yml
    
    PLAY [Install a samba server] **************************************************
    
    TASK [Gathering Facts] *********************************************************
    ok: [servera.lab.example.com]
    
    TASK [Install samba] ***********************************************************
    changed: [servera.lab.example.com]
    
    TASK [Install firewalld] *******************************************************
    ok: [servera.lab.example.com]
    
    TASK [Debug install_state variable] ********************************************
    ok: [servera.lab.example.com] => {
        "msg": "The state for the samba service is installed"
    }
    
    TASK [Start firewalld] *********************************************************
    ok: [servera.lab.example.com]
    
    TASK [Configure firewall for samba] ********************************************
    changed: [servera.lab.example.com]
    
    TASK [Deliver samba config] ****************************************************
    changed: [servera.lab.example.com]
    
    TASK [Start samba] *************************************************************
    changed: [servera.lab.example.com]
    
    PLAY RECAP *********************************************************************
    servera.lab.example.com    : ok=8    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Finish

On the workstation machine, change to the student user home directory and use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish troubleshoot-host

This concludes the section.

Revision: rh294-9.0-c95c7de