ansible/docsite/rst/developing_modules_python3.rst
Toshio Kuratomi 0a39700b36 Fix octal output in a few more places (#17250)
Fix filetree lookup plugin for python3 (octal output and selinux API
takes native strings)
2016-08-25 14:58:35 -07:00

12 KiB

Porting Modules to Python 3

Ansible modules are not the usual Python-3 porting exercise. There are two factors that make it harder to port them than most code:

  1. Many modules need to run on Python-2.4 in addition to Python-3.
  2. A lot of mocking has to go into unittesting a Python-3 module. So it's harder to test that your porting has fixed everything or to make sure that later commits haven't regressed.

Which version of Python-3.x and which version of Python-2.x are our minimums?

The short answer is Python-3.4 and Python-2.4 but please read on for more information.

For Python-3 we are currently using Python-3.4 as a minimum. However, no long term supported Linux distributions currently ship with Python-3. When that occurs, we will probably take that as our minimum Python-3 version rather than Python-3.4. Thus far, Python-3 has been adding small changes that make it more compatible with Python-2 in its newer versions (For instance, Python-3.5 added the ability to use percent-formatted byte strings.) so it should be more pleasant to use a newer version of Python-3 if it's available. At some point this will change but we'll just have to cross that bridge when we get to it.

For Python-2 the default is for modules to run on Python-2.4. This allows users with older distributions that are stuck on Python-2.4 to manage their machines. Modules are allowed to drop support for Python-2.4 when one of their dependent libraries require a higher version of python. This is not an invitation to add unnecessary dependent libraries in order to force your module to be usable only with a newer version of Python. Instead it is an acknowledgment that some libraries (for instance, boto3 and docker-py) will only function with newer Python.

Note

When will we drop support for Python-2.4?

The only long term supported distro that we know of with Python-2.4 is RHEL5 (and its rebuilds like CentOS5) which is supported until April of 2017. We will likely end our support for Python-2.4 in modules in an Ansible release around that time. We know of no long term supported distributions with Python-2.5 so the new minimum Python-2 version will likely be Python-2.6. This will let us take advantage of the forwards-compat features of Python-2.6 so porting and maintainance of Python-2/Python-3 code will be easier after that.

Note

Ubuntu 16 LTS ships with Python 3.5

We have ongoing discussions now about taking Python3-3.5 as our minimum Python3 version.

Supporting only Python-2 or only Python-3

Sometimes a module's dependent libraries only run on Python-2 or only run on Python-3. We do not yet have a strategy for these modules but we'll need to come up with one. I see three possibilities:

  1. We treat these libraries like any other libraries that may not be installed on the system. When we import them we check if the import was successful. If so, then we continue. If not we return an error about the library being missing. Users will have to find out that the library is unavailable on their version of Python either by searching for the library on their own or reading the requirements section in ansible-doc.
  2. The shebang line is the only metadata that Ansible extracts from a module so we may end up using that to specify what we mean. Something like #!/usr/bin/python means the module will run on both Python-2 and Python-3, #!/usr/bin/python2 means the module will only run on Python-2, and #!/usr/bin/python3 means the module will only run on Python-3. Ansible's code will need to be modified to accommodate this. For python2, if ansible_python2_interpreter is not set, it will have to fallback to ansible_python_interpreter and if that's not set, fallback to /usr/bin/python. For python3, Ansible will have to first try ansible_python3_interpreter and then fallback to /usr/bin/python3 as normal.
  3. We add a way for Ansible to retrieve metadata about modules. The metadata will include the version of Python that is required.

Methods 2 and 3 will both require that we modify modules or otherwise add this additional information somewhere. 2 needs only a little code changes in executor/module_common.py to parse. 3 will require a lot of work. This is probably not worthwhile if this is the only change but could be worthwhile if there's other things as well. 1 requires that we port all modules to work with python3 syntax but only the code path to get to the library import being attempted and then a fail_json() being called because the libraries are unavailable needs to actually work.

Note

Metadata proposal in progress

A metadata specification is being created to address module maintainership. In the future we will likely extend this to record that a module works with Python2 and 3, Python2 only, or Python3 only.

Tips, tricks, and idioms to adopt

Exceptions

In code which already needs Python-2.6+ (For instance, because a library it depends on only runs on Python >= 2.6) it is okay to port directly to the new exception catching syntax:

try:
    a = 2/0
except ValueError as e:
    module.fail_json(msg="Tried to divide by zero!")

For modules which also run on Python-2.4, we have to use an uglier construction to make this work under both Python-2.4 and Python-3:

from ansible.module_utils.pycompat24 import get_exception
[...]

try:
    a = 2/0
except ValueError:
    e = get_exception()
    module.fail_json(msg="Tried to divide by zero!")

Octal numbers

In Python-2.4, octal literals are specified as 0755. In Python-3, that is invalid and octals must be specified as 0o755. To bridge this gap, modules should create their octals like this:

# Can't use 0755 on Python-3 and can't use 0o755 on Python-2.4
EXECUTABLE_PERMS = int('0755', 8)

Outputting octal numbers may also need to be changed. In python2 we often did this to return file permissions:

mode = int('0775', 8)
result['mode'] = oct(mode)

This would give the user result['mode'] == '0755' in their playbook. In python3, oct returns the format with the lowercase o in it like: result['mode'] == '0o755'. If a user had a conditional in their playbook or was using the mode in a template the new format might break things. We need to return the old form of mode for backwards compatibility. You can do it like this:

mode = int('0775', 8)
result['mode'] = '0%03o' % mode

You should use this wherever backwards compatibility is a concern or you are dealing with file permissions. (With file permissions a user may be feeding the mode into another program or to another module which doesn't understand the python syntax for octal numbers. [zero][digit][digit][digit] is understood by most everything and therefore the right way to express octals in these cisrcumstances.

Bundled six

The third-party python-six library exists to help projects create code that runs on both Python-2 and Python-3. Ansible includes version 1.4.1 in module_utils so that other modules can use it without requiring that it is installed on the remote system. To make use of it, import it like this:

from ansible.module_utils import six

Note

Why version 1.4.1?

six-1.4.1 is the last version of python-six to support Python-2.4. As long as Ansible modules need to run on Python-2.4 we won't be able to update the bundled copy of six.

Compile Test

We have travis compiling all modules with various versions of Python to check that the modules conform to the syntax at those versions. When you've ported a module so that its syntax works with Python-3, we need to modify .travis.yml so that the module is included in the syntax check. Here's the relevant section of .travis.yml:

env:
  global:
    - PY3_EXCLUDE_LIST="cloud/amazon/cloudformation.py
      cloud/amazon/ec2_ami.py
      [...]
      utilities/logic/wait_for.py"

The PY3_EXCLUDE_LIST environment variable is a blacklist of modules which should not be tested (because we know that they are older modules which have not yet been ported to pass the Python-3 syntax checks. To get another old module to compile with Python-3, remove the entry for it from the list. The goal is to have the LIST be empty.

String Model

One of the big differences between Python2 and Python3 is the string model. In Python2, most APIs take byte strings (the Python2 str type). Using the text type (in Python2, this is the unicode type) often leads to tracebacks because the strings need to be converted to bytes and Python fails to do that correctly. In Python3, the situation is somewhat reversed. Most APIs take text strings (this is Python3's str type). When you have byte strings (the Python3 bytes type) you sometimes get errors when attempting to combine those with text strings. Note, however, that under the hood, Python still has to convert text to bytes to interface operating system libraries and system calls. This means that you can still get tracebacks when passing text to APIs which call those OS level facilities.

For module_utils, code we've decided to make the environment work with "native strings". This means that on Python2, things should work if you use the byte string type. In Python3, code should work if you give it text strings. The reason for this is so that third party modules written for Python2 don't start issuing UnicodeError exceptions once we've ported module_utils to work under Python3. We'll need to gather experience to see if this is going to work out well for modules as well or if we should give the module_utils API explicit switches so that modules can choose to operate with text type all of the time.

Porting Core Ansible to Python 3

The Ansible code which runs controller-side is easier to port to Python3 in one important way: We do not have to support Python-2.4 on the controller. We only have to support Python-2.6 and above. However, this doesn't eliminate the work that has to be done. The controller is a much more complicated piece of code than any individual module. Making it Python2 and Python3 compatible is a much more complex task.

String Model

By and large, the controller uses the standard best practice of storing everything internally as text type and converting to and from bytes at the borders. In many places we hardcode these byte values as utf-8. Thus yaml and inventory files are encoded in utf-8. Filenames are also utf-8. This may not be the right answer forever but it is sufficient for now. If there's demand from users to handle encodings other than utf-8 after the code works on Python3 we can look into what strategy to take for supporting other encodings.

In some cases, storing values as a byte string is not necessarily a choice without drawbacks. For instance, filenames and environment variables on POSIX systems are a sequence of bytes. By using text to represent filenames we prevent filenames that are undecodable in utf-8 and filenames that are not text at all from working. We made the choice to represent these as text for now due to code paths that handle filenames not being able to handle bytes end-to-end. PyYAML on Python3 and jinja2 on both Python2 and Python3, for instance, are meant to work with text. Any decision to allow filenames to be byte values will have to address how we deal with those pieves of the code as well.