ansible/docsite/rst/developing_modules_python3.rst
Toshio Kuratomi 4ed88512e4 Move uses of to_bytes, to_text, to_native to use the module_utils version (#17423)
We couldn't copy to_unicode, to_bytes, to_str into module_utils because
of licensing.  So once created it we had two sets of functions that did
the same things but had different implementations.  To remedy that, this
change removes the ansible.utils.unicode versions of those functions.
2016-09-06 22:54:17 -07:00

272 lines
13 KiB
ReStructuredText

===========================
Porting Modules to Python 3
===========================
Ansible modules are not the usual Python-3 porting exercise. There are two
factors that make it harder to port them than most code:
1. Many modules need to run on Python-2.4 in addition to Python-3.
2. A lot of mocking has to go into unittesting a Python-3 module. So it's
harder to test that your porting has fixed everything or to make sure that
later commits haven't regressed.
Which version of Python-3.x and which version of Python-2.x are our minimums?
=============================================================================
The short answer is Python-3.5 and Python-2.4 but please read on for more
information.
For Python-3 we are currently using Python-3.5 as a minimum on both the
controller and the managed nodes. This was chosen as it's the version of
Python3 in Ubuntu-16.04, the first long-term support (LTS) distribution to
ship with Python3 and not Python2. Much of our code would still work with
Python-3.4 but there are always bugfixes and new features in any new upstream
release. Taking advantage of this relatively new version allows us not to
worry about workarounds for problems and missing features in that older
version.
For Python-2, the default is for the controller to run on Python-2.6 and
modules to run on Python-2.4. This allows users with older distributions that
are stuck on Python-2.4 to manage their machines. Modules are allowed to drop
support for Python-2.4 when one of their dependent libraries require a higher
version of python. This is not an invitation to add unnecessary dependent
libraries in order to force your module to be usable only with a newer version
of Python. Instead it is an acknowledgment that some libraries (for instance,
boto3 and docker-py) will only function with newer Python.
.. note:: When will we drop support for Python-2.4?
The only long term supported distro that we know of with Python-2.4 is
RHEL5 (and its rebuilds like CentOS5) which is supported until April of
2017. Whatever major release we make in or after April of 2017 (probably
2.4.0) will no longer have support for Python-2.4 on the managed machines.
Previous major release series's that we support (2.3.x) will continue to
support Python-2.4 on the managed nodes.
We know of no long term supported distributions with Python-2.5 so the new
minimum Python-2 version will be Python-2.6. This will let us take
advantage of the forwards-compat features of Python-2.6 so porting and
maintainance of Python-2/Python-3 code will be easier after that.
Supporting only Python-2 or only Python-3
=========================================
Sometimes a module's dependent libraries only run on Python-2 or only run on
Python-3. We do not yet have a strategy for these modules but we'll need to
come up with one. I see three possibilities:
1. We treat these libraries like any other libraries that may not be installed
on the system. When we import them we check if the import was successful.
If so, then we continue. If not we return an error about the library being
missing. Users will have to find out that the library is unavailable on
their version of Python either by searching for the library on their own or
reading the requirements section in :command:`ansible-doc`.
2. The shebang line is the only metadata that Ansible extracts from a module
so we may end up using that to specify what we mean. Something like
``#!/usr/bin/python`` means the module will run on both Python-2 and
Python-3, ``#!/usr/bin/python2`` means the module will only run on
Python-2, and ``#!/usr/bin/python3`` means the module will only run on
Python-3. Ansible's code will need to be modified to accommodate this.
For :command:`python2`, if ``ansible_python2_interpreter`` is not set, it
will have to fallback to `` ansible_python_interpreter`` and if that's not
set, fallback to ``/usr/bin/python``. For :command:`python3`, Ansible
will have to first try ``ansible_python3_interpreter`` and then fallback to
``/usr/bin/python3`` as normal.
3. We add a way for Ansible to retrieve metadata about modules. The metadata
will include the version of Python that is required.
Methods 2 and 3 will both require that we modify modules or otherwise add this
additional information somewhere. 2 needs only a little code changes in
executor/module_common.py to parse. 3 will require a lot of work. This is
probably not worthwhile if this is the only change but could be worthwhile if
there's other things as well. 1 requires that we port all modules to work
with python3 syntax but only the code path to get to the library import being
attempted and then a fail_json() being called because the libraries are
unavailable needs to actually work.
.. note:: Metadata proposal in progress
A metadata specification is being created to address module
maintainership. In the future we will likely extend this to record that a module
works with Python2 and 3, Python2 only, or Python3 only.
Tips, tricks, and idioms to adopt
=================================
Exceptions
----------
In code which already needs Python-2.6+ (For instance, because a library it
depends on only runs on Python >= 2.6) it is okay to port directly to the new
exception catching syntax::
try:
a = 2/0
except ValueError as e:
module.fail_json(msg="Tried to divide by zero!")
For modules which also run on Python-2.4, we have to use an uglier
construction to make this work under both Python-2.4 and Python-3::
from ansible.module_utils.pycompat24 import get_exception
[...]
try:
a = 2/0
except ValueError:
e = get_exception()
module.fail_json(msg="Tried to divide by zero!")
Octal numbers
-------------
In Python-2.4, octal literals are specified as ``0755``. In Python-3, that is
invalid and octals must be specified as ``0o755``. To bridge this gap,
modules should create their octals like this::
# Can't use 0755 on Python-3 and can't use 0o755 on Python-2.4
EXECUTABLE_PERMS = int('0755', 8)
Outputting octal numbers may also need to be changed. In python2 we often did
this to return file permissions::
mode = int('0775', 8)
result['mode'] = oct(mode)
This would give the user ``result['mode'] == '0755'`` in their playbook. In
python3, :func:`oct` returns the format with the lowercase ``o`` in it like:
``result['mode'] == '0o755'``. If a user had a conditional in their playbook
or was using the mode in a template the new format might break things. We
need to return the old form of mode for backwards compatibility. You can do
it like this::
mode = int('0775', 8)
result['mode'] = '0%03o' % mode
You should use this wherever backwards compatibility is a concern or you are
dealing with file permissions. (With file permissions a user may be feeding
the mode into another program or to another module which doesn't understand
the python syntax for octal numbers. ``[zero][digit][digit][digit]`` is
understood by most everything and therefore the right way to express octals in
these circumstances.
Bundled six
-----------
The third-party python-six library exists to help projects create code that
runs on both Python-2 and Python-3. Ansible includes version 1.4.1 in
module_utils so that other modules can use it without requiring that it is
installed on the remote system. To make use of it, import it like this::
from ansible.module_utils import six
.. note:: Why version 1.4.1?
six-1.4.1 is the last version of python-six to support Python-2.4. As
long as Ansible modules need to run on Python-2.4 we won't be able to
update the bundled copy of six.
Compile Test
------------
We have travis compiling all modules with various versions of Python to check
that the modules conform to the syntax at those versions. When you've
ported a module so that its syntax works with Python-3, we need to modify
.travis.yml so that the module is included in the syntax check. Here's the
relevant section of .travis.yml::
env:
global:
- PY3_EXCLUDE_LIST="cloud/amazon/cloudformation.py
cloud/amazon/ec2_ami.py
[...]
utilities/logic/wait_for.py"
The :envvar:`PY3_EXCLUDE_LIST` environment variable is a blacklist of modules
which should not be tested (because we know that they are older modules which
have not yet been ported to pass the Python-3 syntax checks. To get another
old module to compile with Python-3, remove the entry for it from the list.
The goal is to have the LIST be empty.
String Model
------------
One of the big differences between Python2 and Python3 is the string model.
In Python2, most APIs take byte strings (the Python2 ``str`` type). Using the
text type (in Python2, this is the ``unicode`` type) often leads to tracebacks
because the strings need to be converted to bytes and Python fails to do that
correctly. In Python3, the situation is somewhat reversed. Most APIs take
text strings (this is **Python3's** ``str`` type). When you have byte strings
(the Python3 ``bytes`` type) you sometimes get errors when attempting to
combine those with text strings. Note, however, that under the hood, Python
still has to convert text to bytes to interface operating system libraries and
system calls. This means that you can still get tracebacks when passing
text to APIs which call those OS level facilities.
For module_utils, code we've decided to make the environment work with "native
strings". This means that on Python2, things should work if you use the byte
string type. In Python3, code should work if you give it text strings. The
reason for this is so that third party modules written for Python2 don't start
issuing UnicodeError exceptions once we've ported module_utils to work under
Python3. We'll need to gather experience to see if this is going to work out
well for modules as well or if we should give the module_utils API explicit
switches so that modules can choose to operate with text type all of the time.
Helpers
~~~~~~~
For converting between bytes, text, and native strings we have three helper
functions. These are :func:`ansible.module_utils._text.to_bytes`,
:func:`ansible.module_utils._text.to_native`, and
:func:`ansible.module_utils._text.to_text`. These are similar to using
``bytes.decode()`` and ``unicode.encode()`` with a few differences.
* By default they try very hard not to traceback.
* The default encoding is "utf-8"
* There are two error strategies that don't correspond one-to-one with
a python codec error handler. These are ``surrogate_or_strict`` and
``surrogate_or_replace``. ``surrogate_or_strict`` will use the ``surrogateescape``
error handler if available (mostly on python3) or strict if not. It is most
appropriate to use when dealing with something that needs to round trip its
value like file paths database keys, etc. Without ``surrogateescape`` the best
thing these values can do is generate a traceback that our code can catch
and decide how to show an error message. ``surrogate_or_replace`` is for
when a value is going to be displayed to the user. If the
``surrogateescape`` error handler is not present, it will replace
undecodable byte sequences with a replacement character.
================================
Porting Core Ansible to Python 3
================================
The Ansible code which runs controller-side is easier to port to Python3 in
one important way: We do not have to support Python-2.4 on the controller.
We only have to support Python-2.6 and above. However, this doesn't eliminate
the work that has to be done. The controller is a much more complicated piece
of code than any individual module. Making it Python2 and Python3 compatible
is a much more complex task.
String Model
------------
By and large, the controller uses the standard best practice of storing
everything internally as text type and converting to and from bytes at the
borders. In many places we hardcode these byte values as utf-8. Thus yaml
and inventory files are encoded in utf-8. Filenames are also utf-8. This may
not be the right answer forever but it is sufficient for now. If there's
demand from users to handle encodings other than utf-8 after the code works on
Python3 we can look into what strategy to take for supporting other encodings.
In some cases, storing values as a byte string is not necessarily a choice
without drawbacks. For instance, filenames and environment variables on POSIX
systems are a sequence of bytes. By using text to represent filenames we
prevent filenames that are undecodable in utf-8 and filenames that are not
text at all from working. We made the choice to represent these as text for
now due to code paths that handle filenames not being able to handle bytes
end-to-end. PyYAML on Python3 and jinja2 on both Python2 and Python3, for
instance, are meant to work with text. Any decision to allow filenames to be
byte values will have to address how we deal with those pieves of the code as
well.