Bookmark this page

Guided Exercise: Managing Rolling Updates

  • Run a playbook that uses unequal batch sizes with the serial directive, aborts if it fails for too many hosts, and runs a specific task once per batch.

Outcomes

  • Control the update process of an existing HAProxy cluster by controlling the update with the serial directive, which determines the size of the batch to use.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

This command initializes the remote Git repository that you need for this lab.

[student@workstation ~]$ lab start update-management

Procedure 8.3. Instructions

  1. Clone the https://git.lab.example.com/student/update-management.git Git repository into the /home/student/git-repos directory and then create a new branch for this exercise.

    1. Create the /home/student/git-repos directory from a terminal if it does not already exist,and then change to it.

      [student@workstation ~]$ mkdir -p ~/git-repos/
      [student@workstation ~]$ cd ~/git-repos/
    2. Clone the https://git.lab.example.com/student/update-management.git repository and then change to the cloned repository:

      [student@workstation git-repos]$ git clone \
      > https://git.lab.example.com/student/update-management.git
      Cloning into 'update-management'...
      ...output omitted...
      [student@workstation git-repos]$ cd update-management
    3. Create the exercise branch.

      [student@workstation update-management]$ git checkout -b exercise
      Switched to a new branch 'exercise'
  2. Install the content collections specified in the collections/requirements.yml file into the collections/ directory.

    1. Log in to the private automation hub at https://hub.lab.example.com with the student username and the redhat123 password.

    2. Navigate to CollectionsAPI token management, and then click Load token. Copy the API token.

    3. Update both token lines in the ansible.cfg file by using the copied token. Your token might be different from the one displayed in this example.

      ...output omitted...
      
      [galaxy_server.rh-certified_repo]
      url=https://hub.lab.example.com/api/galaxy/content/rh-certified
      token=f41f07130d6eb6ef2ded63a574c161b509c647dd
      
      [galaxy_server.community_repo]
      url=https://hub.lab.example.com/api/galaxy/content/community/
      token=f41f07130d6eb6ef2ded63a574c161b509c647dd
    4. Use the ansible-galaxy command to install the community.general and redhat.rhel_system_roles content collections into the collections/ directory. Later in the exercise, the update_webapp.yml playbook uses the community.general.haproxy module to interact with the HAProxy server, and the apache role requires the redhat.rhel_system_roles.selinux module.

      [student@workstation update-management]$ ansible-galaxy collection install \
      > -r collections/requirements.yml -p collections/
      Starting galaxy collection install process
      ...output omitted...
      Downloading https://hub.lab.example.com/api/galaxy/v3/plugin/ansible/content/rh-certified/collections/artifacts/redhat-rhel_system_roles-1.20.0.tar.gz to /home/student/.ansible/tmp/ansible-local-22938l4moi2mi/tmprv5lya26/redhat-rhel_system_roles-1.20.0-smd0f1c_
      Installing 'redhat.rhel_system_roles:1.20.0' to '/home/student/git-repos/update-management/collections/ansible_collections/redhat/rhel_system_roles'
      Downloading https://hub.lab.example.com/api/galaxy/v3/plugin/ansible/content/community/collections/artifacts/community-general-6.1.0.tar.gz to /home/student/.ansible/tmp/ansible-local-22938l4moi2mi/tmprv5lya26/community-general-6.1.0-_ty0hzx6
      redhat.rhel_system_roles:1.20.0 was installed successfully
      Installing 'community.general:6.1.0' to '/home/student/git-repos/update-management/collections/ansible_collections/community/general'
      community.general:6.1.0 was installed successfully
  3. Using the ee-supported-rhel8:latest automation execution environment, run the site.yml playbook. The playbook deploys a front-end load balancer and a set of back-end web servers.

    [student@workstation update-management]$ ansible-navigator run site.yml
    
      Play name                   Ok  Changed  ... Failed  ... Task count  Progress
    0│Gather web_server facts      5        0  ...      0  ...          5  Complete
    1│Ensure HAProxy is deployed   6        5  ...      0  ...          6  Complete
    2│Set Load Balancer facts      1        0  ...      0  ...          1  Complete
    3│Ensure Apache is deployed   35       25  ...      0  ...         35  Complete
    4│Ensure Web App is deployed  15        5  ...      0  ...         15  Complete
    
    ^f/PgUp page up     ^b/PgDn page down     ↑↓ scroll    esc back  ... Successful

    Press Esc to exit from the ansible-navigator command.

  4. Use the curl command to send five requests to the load balancer.

    [student@workstation update-management]$ for x in {1..5}; do curl servera; done
    serverb: /var/www/html/index.html
    serverc: /var/www/html/index.html
    serverd: /var/www/html/index.html
    servere: /var/www/html/index.html
    serverf: /var/www/html/index.html

    The load balancer distributes requests to all five back-end web servers that serve content from the /var/www/html/ directory.

    Note

    The server's order might be different from the order displayed throughout this example.

  5. The update_webapp.yml playbook performs a rolling update of the web application hosted on the back-end web servers. Review the update_webapp.yml playbook and the use of the serial keyword.

    1. Review the top section of the playbook:

      ---
      - name: Update web servers to use a new document root
        hosts: web_servers
        become: true
        force_handlers: true
        vars:
          webapp_content_dir: /srv/web/app/html
      
        serial:
          - 1
          - 25%
          - 100%
      ...output omitted...

      The update_webapp.yml playbook acts on all hosts in the web_servers host group. The webapp_content_dir variable specifies the root directory for Apache web documents. If not specified, the webapp_content_dir variable defaults to the /var/www/html/ directory.

      The serial keyword specifies that tasks in the play are processed in three batches.

      • The first batch contains one host. After this batch finishes, four hosts still require processing.

      • The second batch contains one host because 25% of the host group size is 1.25 and that truncates to one.

      • The third batch contains three hosts because 100% of the hosts group size is five but only three hosts remain unprocessed.

    2. Review the pre_tasks section of the play:

        pre_tasks:
          - name: Remove web server from service during the update
            community.general.haproxy:
              state: disabled
              backend: app
              host: "{{ inventory_hostname }}"
            delegate_to: "{{ item }}"
            with_items: "{{ groups['lb_servers'] }}"

      Before the playbook updates the web application on each server, the haproxy module disables the server in all load balancers. This task protects external clients from errors discovered after application deployment.

    3. Review the roles section of the play:

        roles:
          - role: apache
          - role: webapp

      The apache role modifies the /etc/httpd/conf/httpd.conf configuration file to use the directory specified by the webapp_content_dir variable. The webapp role deploys content to the directory that is specified by the webapp_content_dir variable.

    4. Review the post_tasks section of the play:

        post_tasks:
          # Firewall rules dictate that requests to backend web
          # servers must originate from a load balancer.
          - name: Smoke Test - Ensure HTTP 200 OK
            ansible.builtin.uri:
              url: "http://{{ inventory_hostname }}:{{ apache_port }}"
              status_code: 200
            delegate_to: "{{ groups['lb_servers'][0] }}"
            become: false
      
          # If the test fails, servers are not re-enabled
          # in the load balancers, and the update process halts.
          - name: Enable healthy server in load balancers
            community.general.haproxy:
              state: enabled
              backend: app
              host: "{{ inventory_hostname }}"
            delegate_to: "{{ item }}"
            with_items: "{{ groups['lb_servers'] }}"

      After the playbook deploys the web application, a smoke test ensures that each back-end web server responds with a 200 HTTP status code. The firewall rules on each web server only enable web requests from a load balancer, so all smoke tests are delegated to a load balancer.

      If the smoke test fails for a server, then further processing of that server halts, and the web server is not re-enabled in the load balancer.

      The second task enables the server in the load balancer when the smoke test passes.

  6. Use the update_webapp.yml playbook to deploy the web application to the /srv/web/app/html/ directory.

    Use the curl command to send another five requests to the load balancer. Notice how the failed back-end web server is not included in the load-balancing pool.

    1. Using the ee-supported-rhel8:latest automation execution environment, run the update_webapp.yml playbook. The playbook fails. Leave this terminal running in interactive mode so that you can troubleshoot the problem.

      [student@workstation update-management]$ ansible-navigator run update_webapp.yml
      
        Play name                   Ok  Changed  ... Failed  ... Task count  Progress
      0│Update web servers ...      10        5  ...      1  ...         11  Complete
      
      ^f/PgUp page up     ^b/PgDn page down     ↑↓ scroll    esc back  ...    Failed
    2. In a separate terminal tab or window, send another five requests to the load balancer. Notice how the load balancer only redirects requests to four of the original five web servers.

      [student@workstation update-management]$ for x in {1..5}; do curl servera; done
      serverc: /var/www/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      serverc: /var/www/html/index.html

      The update_webapp.yml playbook first removed one of the servers from the load-balancing pool (serverb in this example). After applying changes to the web server, the playbook ran a task to validate that the web server could still respond successfully to web requests. Because the web server did not respond successfully, the playbook did not add the server back to the load-balancing pool.

      In this example, each web server displays a custom message. A real-world example would deploy the same content to each web server. Users accessing the load balancer would be unaware that one or more web servers were removed from the load-balancing pool.

    3. Return to the terminal tab or window that contains the interactive session for the ansible-navigator command. Press 0 to view details about the Update web servers to use a new document root play.

      ...ouptut omitted...
      0│Update web servers ...      10        5  ...      1  ...         11  Complete
      ...ouptut omitted...

      The output shows that task 10 (Smoke Test - Ensure HTTP 200 OK) failed for serverb.lab.example.com.

         Result  Host                    ... Task                                  ...
       0│Ok      serverb.lab.example.com ... Gathering Facts                       ...
       1│Ok      serverb.lab.example.com ... Remove web server from service during ...
       2│Ok      serverb.lab.example.com ... Apache Port Check                     ...
      ...ouptut omitted...
      10│Failed  serverb.lab.example.com ... Smoke Test - Ensure HTTP 200 OK       ...
      
      ^f/PgUp page up     ^b/PgDn page down     ↑↓ scroll    esc back  ...    Failed
    4. Continue to use interactive mode to discover more details about the problem. Because the item number exceeds nine, you must enter a colon before the item number. Press :10 to view details about the Smoke Test - Ensure HTTP 200 OK task.

      ...output omitted...
      10│Failed  serverb.lab.example.com ... Smoke Test - Ensure HTTP 200 OK       ...
      ...output omitted...

      The details page indicates that serverb.lab.example.com returned a status code of 403 instead of the expected status code of 200. This same information is also displayed on the msg line in the output.

      Play name: Update web servers to use a new document root:10
      Task name: Smoke Test - Ensure HTTP 200 OK
      Failed: serverb.lab.example.com Status code was 403 and not [200]: HTTP Error 403: Forbidden
      ...output omitted...

      Press :q to exit from the ansible-navigator command.

  7. Web servers often return a status code of 403 because of either a regular file permission problem or an SELinux access problem. Identify the problem and a potential solution.

    1. Use SSH to connect to serverb.lab.example.com as the student user.

      [student@workstation update-management]$ ssh student@serverb.lab.example.com
      ...output omitted...
      [student@serverb ~]$
    2. Use the ls command to display the permissions and ownership for the /srv/web/app/html/index.html file. Regular file permissions indicate that the index.html file is readable by everyone, so the file permissions are not the problem.

      [student@serverb ~]$ ls -l /srv/web/app/html/index.html
      -rw-r--r--. 1 root root ... /srv/web/app/html/index.html
    3. Use the ls command to display SELinux context information for the /srv/web/app/html/index.html file. The var_t SELinux context type is not appropriate for web content. SELinux prevents the httpd process from accessing files with an incorrect context type.

      [student@serverb ~]$ ls -Z /srv/web/app/html/index.html
      system_u:object_r:var_t:s0 /srv/web/app/html/index.html
    4. By default, the /var/www/html/ directory uses a context type allowed by SELinux. Identify the context type for the /var/www/html/ directory.

      [student@serverb ~]$ ls -Zd /var/www/html
      system_u:object_r:httpd_sys_content_t:s0 /var/www/html

      Although SELinux provides additional context types for web content, this exercise uses the httpd_sys_content_t context type.

      Important

      The ansible.builtin.file module can use the setype option to set the SELinux context type for a directory. However, files and directories created within the directory do not inherit the specified context type.

    5. Exit the SSH session on serverb.lab.example.com and return to workstation.lab.example.com.

      [student@serverb ~]$ logout
      Connection to serverb.lab.example.com closed.
      [student@workstation update-management]$
  8. Update the apache role to resolve the issue. Modify the roles/apache/tasks/main.yml task file to add a code block after the Start and enable httpd task. Move the existing Customize Apache HTTPD Configuration task and the Ensure that {{ webapp_content_dir }} exists task into the block. The resulting file should consist of the following content:

    ---
    # tasks file for apache
    
    - name: Apache Port Check
      ansible.builtin.assert:
        that:
          - apache_port in apache_standard_ports_list
        fail_msg: "{{ tmp_msg}}: {{ apache_standard_ports_list }}"
        success_msg: The specified apache port ({{ apache_port }}) is allowed.
      vars:
        tmp_msg: "'apache_port' value ({{ apache_port }}) is not in the list"
    
    - name: Install httpd
      ansible.builtin.yum:
        name:
          - httpd
        state: present
    
    - name: Start and enable httpd
      ansible.builtin.service:
        name: httpd
        state: started
        enabled: true
    
    - name: Customize SELinux for web_content_dir
      block:
        - name: Set webapp_base fact
          ansible.builtin.set_fact:
            webapp_base: "{{ webapp_content_dir | split('/') }}" 1
    
        - name: Web directory is a subdirectory of /srv 2
          ansible.builtin.assert:
            that:
              - webapp_base[0] == ''
              - webapp_base[1] == 'srv'
              - webapp_base[2] is defined
            fail_msg: '"{{ webapp_content_dir }}" is not a subdirectory of /srv.'
            success_msg: '"{{ webapp_content_dir }}" is a subdirectory of /srv.'
    
        - name: Customize Apache HTTPD Configuration 3
          ansible.builtin.template:
            src: templates/httpd.conf.j2
            dest: /etc/httpd/conf/httpd.conf
          notify: restart httpd
    
        - name: Ensure that {{ webapp_content_dir }} exists
          ansible.builtin.file:
            path: "{{ webapp_content_dir }}"
            state: directory
            owner: root
            group: root
            mode: '0755'
    
        - name: Create SELinux file context for the directory 4
          ansible.builtin.include_role:
            name: redhat.rhel_system_roles.selinux
          vars:
            selinux_fcontexts:
              - target: "/{{ webapp_base[1] }}/{{ webapp_base[2] }}(/.*)?"
                setype: "httpd_sys_content_t"
                state: present
            selinux_restore_dirs:
              - /{{ webapp_base[1] }}/{{ webapp_base[2] }}
      when: "webapp_content_dir != '/var/www/html'" 5

    1

    The first task splits the webapp_content_dir variable into an array using a forward slash as a delimiter.

    2

    The second task in the block checks that the directory specified by the webapp_content_dir variable is a subdirectory of the /srv/ directory. This task checks that the first item of the (webapp_base[0]) array is empty because this is the value to the left of the first forward slash. The second item of the (webapp_base[1]) array must match the value srv. The third item of the (webapp_base[2]) array must be defined. The playbook exits with an error message if the task fails.

    3

    The third and fourth tasks are from the existing task file and have been moved into the block. The directory specified by the webapp_content_dir variable must exist before running the restorecon command.

    4

    The fifth task in the block creates a new file context rule that uses the first two directories specified by the webapp_content_dir variable and then applies the new rule.

    5

    The block of code only applies if the webapp_content_dir variable does not match the default value of /var/www/html.

    Note

    The ~/git-repos/update-management/solutions/main.yml file contains the correct configuration and can be used for comparison.

  9. Run the update_webapp.yml playbook again and notice that the playbook succeeds.

    As the playbook runs, monitor the output of the issue_requests.sh script in another terminal tab or window. The output shows the back-end servers gradually switching over to using the new web document root directory.

    1. Run the issue_requests.sh script in a separate terminal tab or window and monitor the output during the execution of the update_webapp.yml playbook. The script sends a web request to the servera.lab.example.com load balancer every 2 seconds. Output is displayed in the terminal and logged to the curl_output.log file.

      [student@workstation update-management]$ ./issue_requests.sh
      serverc: /var/www/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...

      Initially, responses cycle through the four hosts remaining in the load-balancing pool.

      Leave the issue_requests.sh script running and return to the previous terminal tab or window.

    2. Using the ee-supported-rhel8:latest automation execution environment, run the update_webapp.yml playbook. While the playbook runs, switch back to the terminal tab or window that is running the issue_requests.sh script.

      [student@workstation update-management]$ ansible-navigator run update_webapp.yml
      ...output omitted...
    3. Monitor the output of the issue_requests.sh script while running the update_webapp.yml playbook.

      Eventually, the smoke test passes for the new application, and the serverb server is returned to service with an updated web document root directory:

      The playbook removes the serverc server from service:

      ...output omitted...
      serverc: /var/www/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /var/www/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...

      The playbook processes the next batch, which only contains the serverc server.

      The playbook removes serverc from service:

      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /var/www/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...

      Eventually, the smoke test passes for the serverc server and the server is put back into service:

      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /srv/web/app/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...

      The last batch processes all remaining web servers. The playbook first disables all three of these servers, leaving only the serverb and serverc servers to handle requests:

      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /srv/web/app/html/index.html
      serverd: /var/www/html/index.html
      servere: /var/www/html/index.html
      serverf: /var/www/html/index.html
      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /srv/web/app/html/index.html
      ...output omitted...

      Eventually, each server passes the smoke test and is put back into service:

      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /srv/web/app/html/index.html
      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /srv/web/app/html/index.html
      serverd: /srv/web/app/html/index.html
      servere: /srv/web/app/html/index.html
      serverf: /srv/web/app/html/index.html
      ...output omitted...
    4. Press Ctrl+C to stop the issue_requests.sh script.

      ...output omitted...
      serverb: /srv/web/app/html/index.html
      serverc: /srv/web/app/html/index.html
      serverd: /srv/web/app/html/index.html
      servere: /srv/web/app/html/index.html
      serverf: /srv/web/app/html/index.html
      ^C
      [student@workstation update-management]$
  10. Return to the terminal tab or window that is running the ansible-navigator command. The update_webapp.yml playbook is completed successfully.

      Play name              Ok Changed ... Failed Skipped ... Task count  Progress
    0│Update web servers ... 22       4 ...      0      16 ...         38  Complete
    1│Update web servers ... 23       9 ...      0      16 ...         39  Complete
    2│Update web servers ... 71      27 ...      0      46 ...        117  Complete
    
    ^f/PgUp page up     ^b/PgDn page down     ↑↓ scroll    esc back  ... Successful

    Your numbers might be slightly different than the numbers displayed in this output. Press ESC to exit from the ansible-navigator command.

Finish

On the workstation machine, change to the student user home directory and use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish update-management

This concludes the section.

Revision: do374-2.2-82dc0d7