Skip to main content

8 ways to speed up your Ansible playbooks

Here's how to optimize your Ansible playbooks to make them run faster.
Image
Light trails on highway at night

Photo by Pixabay from Pexels

Ansible is a simple and powerful open source automation tool that can streamline many of your IT infrastructure operations. You can automate simple tasks like installing packages, or complex workflows such as deploying a clustered solution with multiple nodes or patching your operating system with many steps. Whether the workflows are simple or complex, you need to integrate appropriate optimization techniques into the Ansible playbook content.

This article covers some of the major optimization methods available in Ansible for speeding up playbook execution.

1. Identify slow tasks with callback plugins

A specific task in a playbook might look simple, but it can be why the playbook is executing slowly. You can enable callback plugins such as timer, profile_tasks, and profile_roles to find a task's time consumption and identify which jobs are slowing down your plays.

Configure ansible.cfg with the plugins:

[defaults]
inventory = ./hosts
callbacks_enabled = timer, profile_tasks, profile_roles

Now execute the ansible-playbook command:

$ ansible-playbook site.yml
PLAY [Deploying Web Server] ************
 
TASK [Gathering Facts] **********************
Thursday 23 December 2021  22:55:58 +0800 (0:00:00.055)   0:00:00.055
Thursday 23 December 2021  22:55:58 +0800 (0:00:00.054)   0:00:00.054
ok: [node1]
 
TASK [Deploy Web service] *******************
Thursday 23 December 2021  22:56:00 +0800 (0:00:01.603)  0:00:01.659
Thursday 23 December 2021  22:56:00 +0800 (0:00:01.603)  0:00:01.658
 
...<output removed>...
 
PLAY RECAP **********************************
node1: ok=9  changed=4  unreachable=0  failed=0
       skipped=0  rescued=0  ignored=0

Playbook run took 0 days, 0 hours, 0 minutes, 14 seconds
Thursday 23 December 2021  22:56:12 +0800 (0:00:00.541)       0:00:14.100 ***** 
=============================================================================== 
deploy-web-server : Install httpd and firewalld ------- 5.42s
deploy-web-server : Git checkout ---------------------- 3.40s
Gathering Facts --------------------------------------- 1.60s
deploy-web-server : Enable and Run Firewalld ---------- 0.82s
deploy-web-server : firewalld permitt httpd service --- 0.72s
deploy-web-server : httpd enabled and running --------- 0.55s
deploy-web-server : Set Hostname on Site -------------- 0.54s
deploy-web-server : Delete content & directory -------- 0.52s
deploy-web-server : Create directory ------------------ 0.41s
Deploy Web service ------------------------------------ 0.04s
Thursday 23 December 2021  22:56:12 +0800 (0:00:00.541) 0:00:14.099
===================================================================== 
deploy-web-server ------------------------- 12.40s
gather_facts ------------------------------- 1.60s
include_role ------------------------------- 0.04s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
total ------------------------------------- 14.04s

The output details the time it took for each task, role, and so on. This information helps you identify which task takes more time than the others.

[ Download an excerpt of Jesse Keating's Mastering Ansible to learn more about putting automation to work. ]

2. Disable fact gathering

When a playbook executes, each play runs a hidden task, called gathering facts, using the setup module. This gathers information about the remote node you're automating, and the details are available under the variable ansible_facts. But if you're not using these details in your playbook anywhere, then this is a waste of time. You can disable this operation by setting gather_facts: False in the play.

With gathering facts enabled:

$ time ansible-playbook site.yml
 
PLAY [Deploying Web Server] *********************
 
TASK [Gathering Facts] **************************
ok: [node1]
...<output removed>...
PLAY RECAP **************************************
node1: ok=9  changed=4  unreachable=0  failed=0
       skipped=0  rescued=0  ignored=0
 
ansible-playbook site.yml  3.03s user 0.93s system 25% cpu 15.526 total

With gather_facts: False disabling fact gathering, performance increases:

$ time ansible-playbook site.yml
 
PLAY [Deploying Web Server] ****************
 
...<output removed>...
 
PLAY RECAP **************************************
node1: ok=8  changed=4  unreachable=0  failed=0
       skipped=0    rescued=0    ignored=0
 
ansible-playbook site.yml  2.96s
user 1.00s
system 26%
cpu 14.992 total

The more nodes you have, the more time you save by disabling fact gathering.

3. Configure parallelism

Ansible uses batches for task execution, which are controlled by a parameter called forks. The default value for forks is 5, which means Ansible executes a task on the first five hosts, waits for the task to complete, and then takes the next batch of five hosts, and so on. Once all hosts finish the task, Ansible moves to the next tasks with a batch of five hosts again.

You can increase the value of forks in ansible.cfg, enabling Ansible to execute a task on more hosts in parallel:

[defaults]
inventory = ./hosts
forks=50

You can also change the value of forks dynamically while executing a playbook by using the --forks option (-f for short):

$ ansible-playbook site.yaml --forks 50

A word of warning: When Ansible works on multiple managed nodes, it uses more computing resources (CPU and memory). Based on your Ansible control node machine capacity, configure forks appropriately and responsibly.

4. Configure SSH optimization

Establishing a secure shell (SSH) connection is a relatively slow process that runs in the background. The global execution time increases significantly when you have more tasks in a playbook and more managed nodes to execute the tasks.

You can use ControlMaster and ControlPersist features in ansible.cfg (in the ssh_connection section) to mitigate this issue.

  • ControlMaster allows multiple simultaneous SSH sessions with a remote host to use a single network connection. This saves time on an SSH connection's initial processes because later SSH sessions use the first SSH connection for task execution.
  • ControlPersist indicates how long the SSH keeps an idle connection open in the background. For example, ControlPersist=60s keeps the connection idle for 60 seconds:
    [ssh_connection]
    ssh_args = -o ControlMaster=auto -o ControlPersist=60s

5. Disable host key checking in a dynamic environment

By default, Ansible checks and verifies SSH host keys to safeguard against server spoofing and man-in-the-middle attacks. This also consumes time. If your environment contains immutable managed nodes (virtual machines or containers), then the key is different when the host is reinstalled or recreated. You can disable host key checking for such environments by adding the host_key_checking parameter in your ansible.cfg file and setting it to False:

[defaults]
host_key_checking = False

I don't recommend this outside of a controlled environment. Make sure you have a clear understanding of the implications of this action before you use it in critical environments.

[ Explore Red Hat Ansible Automation Platform 2 in this interactive guide. ]

6. Use pipelining

When Ansible uses SSH, several SSH operations happen in the background for copying files, scripts, and other execution commands. You can reduce the number of SSH connections by enabling the pipelining parameter (it's disabled by default) in ansible.cfg:

# ansible.cfg 
pipelining = True

7. Use execution strategies

By default, Ansible waits for every host to finish a task before moving to the next task, which is called linear strategy.

If you don't have dependencies on tasks or managed nodes, you can change strategy to free, which allows Ansible to execute tasks on managed hosts until the end of the play without waiting for other hosts to finish their tasks:

- hosts: production servers
  strategy: free
  tasks:

You can develop or use more strategy plugins as needed, such as Mitogen, which uses Python-based executions and connections.

8. Use async tasks

When a task executes, Ansible waits for it to complete before closing the connection to the managed node. This can become a bottleneck when you have tasks with longer execution times (such as disk backups, package installation, and so on) because it increases global execution time. If the following tasks do not depend on this long-running task, you can use the async mode with an appropriate poll interval to tell Ansible not to wait and proceed with the next tasks:

​​​​---
- name: Async Demo
  hosts: nodes
  tasks:
    
    - name: Initiate custom snapshot
      shell:
        "/opt/diskutils/snapshot.sh init"
      async: 120 # Maximum allowed time in Seconds
      poll: 05 # Polling Interval in Seconds

Optimization is a journey

The global execution time of Ansible playbooks relies on multiple configurations. You can do your infrastructure a favor by finding the best combination of configuration parameters for your needs.

This isn't a complete list, of course. You can use many other parameters to control and optimize Ansible playbook execution, such as serial, throttle, run_once, and more. Refer to the documentation to learn more and apply the settings based on your Ansible environment.

Topics:   Ansible   Automation   Troubleshooting  
Author’s photo

Gineesh Madapparambath

Gineesh Madapparambath is a Platform &amp; DevOps Consultant at Red Hat Singapore, specializing in automation and containerization with Ansible and OpenShift.  More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.