Ansible is a hack on a hack on a hack on a hack

Rant.

Everything is JSON!

So the other day I wanted to pass a literal string, (that coincidentally happened to contain valid JSON!), as a value of an environment variable, because this is what the application needs. Picture:

env:
  SOME_CONFIG_OPTION: 1
  SOME_OTHER_OPTION: '{"a": 2}'

Can you imagine how much screwing around do you need to make this work reliably in all contexts?

#!/bin/sh
{% for key, value in env.items() %}
export {{ key }}={{ value }}
{% endfor %}
exec ...
tasks:
- action: foo
  environment: "{{ env }}"

Ansible will go way out of its way to make absolutely sure that the contents of the string will be interpreted as json, turned into a python data structure, and then (most probably) formatted using repr, or some equivalent, unless you try some combination of dirty hacks:

So how to make this happen cleanly? After fighting for several hours, I came up with the following:

Is this a solution? In that it has the end effect of making a deployment possible - yes. As far as I’m concerned, it is a hack to offset the effect of another hack.

This is just one tiny example, a symptom of a much bigger problem: Ansible is built out of hacks.

There is no clear boundary between languages

Ansible uses “vanilla” YAML as the language describing its playbooks, which means you can take any conforming YAML parser in the world and just parse any of its playbooks.

But there’s a “problem” with YAML, that anyone runs into as soon as they try doing anything fancy with it; it’s not a programming language! It’s a configuration language, data serialisation and exchange format; it’s excellent when you don’t feel like inventing a configuration format for your tiny app, or when you need to embed metadata in a Markdown document.

So Ansible has grafted Jinja2 on top of its YAML, which seems like a great idea: Jinja is quite powerful, almost to the point of being a “real” programming language in itself. (Let’s disregard where did this idea of using a templating language as a programming language led us to in the recent past.)

So what happens when you start freely mixing the two?

- hosts: localhost
  become: no
  tasks:
  - action: {{ foo }}
% ansible-playbook foo.yml -e foo=ping
ERROR! Syntax Error while loading YAML.


The error appears to have been in '.../foo.yml': line 6, column 14, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - action: {{ foo }}
             ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes.  Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"

Oops. Ansible helpfully suggests we put quotes around our templated string. Let’s fix this quickly:

- hosts: localhost
  become: no
  tasks:
  - action: "{{ foo }}"
% ansible-playbook foo.yml -e foo=ping
PLAY [localhost] ***************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [{{ foo }}] ***************************************************************
ok: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0

Ansible has helpfully… ran the ping module, as asked on the command line… but didn’t bother to expand that same template when printing the task name.

Until very recently (as of Ansible 1.9), this rule didn’t apply everywhere, and I’m sure even as 2.0 fixed a lot of things, there are still spots where things break. If you’d try using template substitution e.g. with an include statement:

- hosts: localhost
  become: no
  tasks:
  - debug: var=ansible_distribution
  - include: "{{ ansible_distribution }}.yml"

Try!

% ansible-playbook -i /dev/null apply-foo.yml
ERROR: file could not read: .../{{ ansible_distribution }}.yml

Let’s try another one: variables.

- hosts: localhost
  become: no
  vars:
    a: "{{ b }}"
    b: ok
  tasks:
  - debug: var=a

Yes, we’ve seen this one before! It was in the programming 101 class. This Python code will (quite obviously) fail on the first line:

a = b
b = "ok"
print(a)

Then how come Ansible prints “ok”!? Go. Go check it. I’m not lying!

% ansible-playbook foo.yml
PLAY [localhost] ***************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] => {
    "a": "ok"
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0

Note that the vars section is a dict, a hash table, an undordered mapping - however you call it, there’s no actual order to these pairs.

The only way this could ever happen is if there was some variable resolution engine that tried templating all strings in various orders until one way worked. What happens when we have more complex dependencies? What happens when we have templates doing side-effects, like database lookups?

I find it hard to believe the result can be deterministic.

Is there a better way?

Once again, while Lisp failed to fix the world, it had the correct answer from the very beginning.

Ansible, by nature, has to mix code and data - a lot; and achieves it by patching a templating language into a data description language, and then patching the data into the templates again - which constantly falls apart at every seam.

Lisp is homoiconic. TLDR: this is a fancy word to say that code is data, and data is code. It has templating built in right into its syntax. It has compile-time macros, that are made in and of Lisp itself. It has data serialisation built-in into the compiler. All of this is achievable in five thousand lines of quite clean and portable C.

Common Lisp (in particular) is also pretty vast in scope. But I’m not arguing for installing Common Lisp on all your servers, that would be pretty insane, even if perfectly in line with the current trend to install a whole horse with legs and a stable, for example just to run your monitoring.

What I’m arguing for, is to take inspiration from Lisp’s design; after all, we’ve successfuly incorporated garbage collection, rich typesystems, object systems, exception handling, read-eval-print loops, object introspection and macros into several maintream languages and tools; and all of this is achievable even in a lightweight package. Why take a step back?

In Ansible’s defense

With all of its warts and deficiencies, I couldn’t imagine getting my current job done without Ansible - you can pry it from my cold, dead hands!

That is, until I get that suckless rewrite rolling.