In this exercise, you troubleshoot task failures that are occurring on one of your managed hosts when running a playbook.
Outcomes
You should be able to troubleshoot managed hosts.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
This command prepares your environment and ensures that all required resources are available.
[student@workstation ~]$ lab start troubleshoot-host
Procedure 8.2. Instructions
Change into the /home/student/troubleshoot-host/ directory.
[student@workstation ~]$ cd ~/troubleshoot-host/
[student@workstation troubleshoot-host]$Run the mailrelay.yml playbook using check mode.
[student@workstation troubleshoot-host]$ansible-navigator run \>-m stdout mailrelay.yml --checkPLAY [Create mail relay servers] *********************************************** ...output omitted... TASK [Check main.cf file] ****************************************************** ok: [servera.lab.example.com] TASK [Verify main.cf file exists] ********************************************** ok: [servera.lab.example.com] => { "msg": "The main.cf file exists" } ...output omitted... TASK [Start and enable mail services] ******************************************fatal: [servera.lab.example.com]: FAILED! => {"changed": false, "msg": "Could not find the requested service postfix: host"}...output omitted... PLAY RECAP ********************************************************************* servera.lab.example.com : ok=5 changed=2 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
The verify main.cf file exists task uses the ansible.builtin.stat module.
It confirms that main.cf exists on the servera.lab.example.com host.
The Start and enable mail services task failed.
It could not start the postfix service because you ran the playbook using check mode and therefore the play did not install the postfix package.
The task failed because earlier tasks in the play did not ensure that postfix was installed on the servera host, because you ran the playbook in check mode.
This failure happened because the playbook did not actually make changes to the host that it normally would have if you ran it normally.
Run the playbook again, but without specifying check mode.
The error in the Start and enable mail services task should disappear and the playbook should run successfully.
[student@workstation troubleshoot-host]$ansible-navigator run \>-m stdout mailrelay.ymlPLAY [Create mail relay servers] *********************************************** ...output omitted... TASK [Check main.cf file] ****************************************************** ok: [servera.lab.example.com] TASK [Verify main.cf file exists] ********************************************** ok: [servera.lab.example.com] => { "msg": "The main.cf file exists" } TASK [Start and enable mail services] ******************************************changed: [servera.lab.example.com]...output omitted... PLAY RECAP ********************************************************************* servera.lab.example.com : ok=8 changed=5 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
Edit the mailrelay.yml playbook and add a task to enable the smtp service through the firewall.
Add the task as the last task, before the handlers.
...output omitted... - name: Postfix firewalld config ansible.posix.firewalld: state: enabled permanent: true immediate: true service: smtp ...output omitted...
Run the mailrelay.yml playbook.
The postfix firewalld config task runs with no errors.
[student@workstation troubleshoot-host]$ansible-navigator run \>-m stdout mailrelay.ymlPLAY [Create mail relay servers] *********************************************** ...output omitted... TASK [Postfix firewalld config] ************************************************ changed: [servera.lab.example.com] PLAY RECAP ********************************************************************* servera.lab.example.com : ok=8 changed=2 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
Use telnet to test if the SMTP service is listening on port TCP/25 on the servera.lab.example.com host.
Disconnect when you are finished.
[student@workstation troubleshoot-host]$telnet servera.lab.example.com 25Trying 172.25.250.10... Connected to servera.lab.example.com. Escape character is '^]'. 220 servera.lab.example.com ESMTP Postfixquit221 2.0.0 Bye Connection closed by foreign host.
Run the samba.yml playbook.
The first task fails with an error related to an SSH connection problem.
[student@workstation troubleshoot-host]$ansible-navigator run \>-m stdout samba.ymlPLAY [Install a samba server] ************************************************** TASK [Gathering Facts] ********************************************************* fatal: [servera.lab.exammple.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host servera.lab.exammple.com port 22: Connection timed out", "unreachable": true} PLAY RECAP ********************************************************************* servera.lab.exammple.com : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 Please review the log for errors.
Make sure that you can connect to the servera.lab.example.com managed host as the devops user using SSH, and that the correct SSH keys are in place.
Log off again when you have finished.
[student@workstation troubleshoot-host]$ssh devops@servera.lab.example.com...output omitted... [devops@servera ~]$exitlogout Connection to servera.lab.example.com closed.
That is working normally.
Test to see if you can run modules on the servera.lab.example.com managed host by using an ad hoc command that runs the ansible.builtin.ping module.
[student@workstation troubleshoot-host]$ansible servera.lab.example.com \>-m ansible.builtin.pingservera.lab.example.com | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python3" }, "changed": false, "ping": "pong" }
Based on the preceding output, that is also working, and successfully connected to the managed host.
This should suggest to you that the problem is not with the SSH configuration and credentials, or with the ad hoc command that you used.
So the question now is why the ad hoc command worked and the ansible-navigator command did not.
There might be a problem with the play in the playbook, or with the inventory.
Rerun the samba.yml playbook with -vvvv to get more information about the run.
An error is issued because the servera.lab.example.com managed host is not reachable.
[student@workstation troubleshoot-host]$ansible-navigator run \>-m stdout -vvvv samba.ymlansible-playbook [core 2.13.0] config file = /home/student/troubleshoot-host/ansible.cfg configured module search path = ['/home/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.9/site-packages/ansible ansible collection location = /home/runner/.ansible/collections:/usr/share/ansible/collections executable location = /usr/bin/ansible-playbook python version = 3.9.7 (default, Sep 13 2021, 08:18:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] jinja version = 3.0.3 libyaml = True Using /home/student/troubleshoot-host/ansible.cfg as config file ...output omitted... PLAYBOOK: samba.yml ************************************************************ Positional arguments: /home/student/troubleshoot-host/samba.yml verbosity: 4 connection: smart timeout: 10 become_method: sudo tags: ('all',) inventory: ('/home/student/troubleshoot-host/inventory',) forks: 5 1 plays in /home/student/troubleshoot-host/samba.yml PLAY [Install a samba server] ************************************************** TASK [Gathering Facts] ********************************************************* task path: /home/student/troubleshoot-host/samba.yml:2 <servera.lab.exammple.com> ESTABLISH SSH CONNECTION FOR USER: devops ...output omitted... fatal: [servera.lab.exammple.com]: UNREACHABLE! => {"changed": false,"msg": "Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL 1.1.1k FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/runner/.ssh/config\r\ndebug1: /home/runner/.ssh/config line 1: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host servera.lab.exammple.com originally servera.lab.exammple.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/runner/.ssh/config\r\ndebug1: /home/runner/.ssh/config line 1: Applying options for *\r\ndebug2: add_identity_file: ignoring duplicate key ~/.ssh/lab_rsa\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host servera.lab.exammple.com originally servera.lab.exammple.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: Control socket \"/home/runner/.ansible/cp/d4775f48c9\" does not exist\r\ndebug2: resolving \"servera.lab.exammple.com\" port 22\r\ndebug2: ssh_connect_direct\r\ndebug1: Connecting to servera.lab.exammple.com [3.130.253.23] port 22.\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug1: connect to address 3.130.253.23 port 22: Connection timed out\r\ndebug1: Connecting to servera.lab.exammple.com [3.130.204.160] port 22.\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug1: connect to address 3.130.204.160 port 22: Connection timed out\r\nssh: connect to host servera.lab.exammple.com port 22: Connection timed out","unreachable": true} PLAY RECAP ********************************************************************* servera.lab.exammple.com : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 Please review the log for errors.
Investigate the inventory file for errors.
If you look at the [samba_servers] group, servera.lab.example.com is misspelled (with an extra m).
Correct this error as shown below:
[samba_servers]
servera.lab.example.com
...output omitted...Run the playbook again and all tasks should succeed.
[student@workstation troubleshoot-host]$ansible-navigator run \>-m stdout samba.ymlPLAY [Install a samba server] ************************************************** TASK [Gathering Facts] ********************************************************* ok: [servera.lab.example.com] TASK [Install samba] *********************************************************** changed: [servera.lab.example.com] TASK [Install firewalld] ******************************************************* ok: [servera.lab.example.com] TASK [Debug install_state variable] ******************************************** ok: [servera.lab.example.com] => { "msg": "The state for the samba service is installed" } TASK [Start firewalld] ********************************************************* ok: [servera.lab.example.com] TASK [Configure firewall for samba] ******************************************** changed: [servera.lab.example.com] TASK [Deliver samba config] **************************************************** changed: [servera.lab.example.com] TASK [Start samba] ************************************************************* changed: [servera.lab.example.com] PLAY RECAP ********************************************************************* servera.lab.example.com : ok=8 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
This concludes the section.