4ed88512e4
We couldn't copy to_unicode, to_bytes, to_str into module_utils because of licensing. So once created it we had two sets of functions that did the same things but had different implementations. To remedy that, this change removes the ansible.utils.unicode versions of those functions.
272 lines
13 KiB
ReStructuredText
272 lines
13 KiB
ReStructuredText
===========================
|
|
Porting Modules to Python 3
|
|
===========================
|
|
|
|
Ansible modules are not the usual Python-3 porting exercise. There are two
|
|
factors that make it harder to port them than most code:
|
|
|
|
1. Many modules need to run on Python-2.4 in addition to Python-3.
|
|
2. A lot of mocking has to go into unittesting a Python-3 module. So it's
|
|
harder to test that your porting has fixed everything or to make sure that
|
|
later commits haven't regressed.
|
|
|
|
Which version of Python-3.x and which version of Python-2.x are our minimums?
|
|
=============================================================================
|
|
|
|
The short answer is Python-3.5 and Python-2.4 but please read on for more
|
|
information.
|
|
|
|
For Python-3 we are currently using Python-3.5 as a minimum on both the
|
|
controller and the managed nodes. This was chosen as it's the version of
|
|
Python3 in Ubuntu-16.04, the first long-term support (LTS) distribution to
|
|
ship with Python3 and not Python2. Much of our code would still work with
|
|
Python-3.4 but there are always bugfixes and new features in any new upstream
|
|
release. Taking advantage of this relatively new version allows us not to
|
|
worry about workarounds for problems and missing features in that older
|
|
version.
|
|
|
|
For Python-2, the default is for the controller to run on Python-2.6 and
|
|
modules to run on Python-2.4. This allows users with older distributions that
|
|
are stuck on Python-2.4 to manage their machines. Modules are allowed to drop
|
|
support for Python-2.4 when one of their dependent libraries require a higher
|
|
version of python. This is not an invitation to add unnecessary dependent
|
|
libraries in order to force your module to be usable only with a newer version
|
|
of Python. Instead it is an acknowledgment that some libraries (for instance,
|
|
boto3 and docker-py) will only function with newer Python.
|
|
|
|
.. note:: When will we drop support for Python-2.4?
|
|
|
|
The only long term supported distro that we know of with Python-2.4 is
|
|
RHEL5 (and its rebuilds like CentOS5) which is supported until April of
|
|
2017. Whatever major release we make in or after April of 2017 (probably
|
|
2.4.0) will no longer have support for Python-2.4 on the managed machines.
|
|
Previous major release series's that we support (2.3.x) will continue to
|
|
support Python-2.4 on the managed nodes.
|
|
|
|
We know of no long term supported distributions with Python-2.5 so the new
|
|
minimum Python-2 version will be Python-2.6. This will let us take
|
|
advantage of the forwards-compat features of Python-2.6 so porting and
|
|
maintainance of Python-2/Python-3 code will be easier after that.
|
|
|
|
|
|
Supporting only Python-2 or only Python-3
|
|
=========================================
|
|
|
|
Sometimes a module's dependent libraries only run on Python-2 or only run on
|
|
Python-3. We do not yet have a strategy for these modules but we'll need to
|
|
come up with one. I see three possibilities:
|
|
|
|
1. We treat these libraries like any other libraries that may not be installed
|
|
on the system. When we import them we check if the import was successful.
|
|
If so, then we continue. If not we return an error about the library being
|
|
missing. Users will have to find out that the library is unavailable on
|
|
their version of Python either by searching for the library on their own or
|
|
reading the requirements section in :command:`ansible-doc`.
|
|
|
|
2. The shebang line is the only metadata that Ansible extracts from a module
|
|
so we may end up using that to specify what we mean. Something like
|
|
``#!/usr/bin/python`` means the module will run on both Python-2 and
|
|
Python-3, ``#!/usr/bin/python2`` means the module will only run on
|
|
Python-2, and ``#!/usr/bin/python3`` means the module will only run on
|
|
Python-3. Ansible's code will need to be modified to accommodate this.
|
|
For :command:`python2`, if ``ansible_python2_interpreter`` is not set, it
|
|
will have to fallback to `` ansible_python_interpreter`` and if that's not
|
|
set, fallback to ``/usr/bin/python``. For :command:`python3`, Ansible
|
|
will have to first try ``ansible_python3_interpreter`` and then fallback to
|
|
``/usr/bin/python3`` as normal.
|
|
|
|
3. We add a way for Ansible to retrieve metadata about modules. The metadata
|
|
will include the version of Python that is required.
|
|
|
|
Methods 2 and 3 will both require that we modify modules or otherwise add this
|
|
additional information somewhere. 2 needs only a little code changes in
|
|
executor/module_common.py to parse. 3 will require a lot of work. This is
|
|
probably not worthwhile if this is the only change but could be worthwhile if
|
|
there's other things as well. 1 requires that we port all modules to work
|
|
with python3 syntax but only the code path to get to the library import being
|
|
attempted and then a fail_json() being called because the libraries are
|
|
unavailable needs to actually work.
|
|
|
|
.. note:: Metadata proposal in progress
|
|
|
|
A metadata specification is being created to address module
|
|
maintainership. In the future we will likely extend this to record that a module
|
|
works with Python2 and 3, Python2 only, or Python3 only.
|
|
|
|
Tips, tricks, and idioms to adopt
|
|
=================================
|
|
|
|
Exceptions
|
|
----------
|
|
|
|
In code which already needs Python-2.6+ (For instance, because a library it
|
|
depends on only runs on Python >= 2.6) it is okay to port directly to the new
|
|
exception catching syntax::
|
|
|
|
try:
|
|
a = 2/0
|
|
except ValueError as e:
|
|
module.fail_json(msg="Tried to divide by zero!")
|
|
|
|
For modules which also run on Python-2.4, we have to use an uglier
|
|
construction to make this work under both Python-2.4 and Python-3::
|
|
|
|
from ansible.module_utils.pycompat24 import get_exception
|
|
[...]
|
|
|
|
try:
|
|
a = 2/0
|
|
except ValueError:
|
|
e = get_exception()
|
|
module.fail_json(msg="Tried to divide by zero!")
|
|
|
|
Octal numbers
|
|
-------------
|
|
|
|
In Python-2.4, octal literals are specified as ``0755``. In Python-3, that is
|
|
invalid and octals must be specified as ``0o755``. To bridge this gap,
|
|
modules should create their octals like this::
|
|
|
|
# Can't use 0755 on Python-3 and can't use 0o755 on Python-2.4
|
|
EXECUTABLE_PERMS = int('0755', 8)
|
|
|
|
Outputting octal numbers may also need to be changed. In python2 we often did
|
|
this to return file permissions::
|
|
|
|
mode = int('0775', 8)
|
|
result['mode'] = oct(mode)
|
|
|
|
This would give the user ``result['mode'] == '0755'`` in their playbook. In
|
|
python3, :func:`oct` returns the format with the lowercase ``o`` in it like:
|
|
``result['mode'] == '0o755'``. If a user had a conditional in their playbook
|
|
or was using the mode in a template the new format might break things. We
|
|
need to return the old form of mode for backwards compatibility. You can do
|
|
it like this::
|
|
|
|
mode = int('0775', 8)
|
|
result['mode'] = '0%03o' % mode
|
|
|
|
You should use this wherever backwards compatibility is a concern or you are
|
|
dealing with file permissions. (With file permissions a user may be feeding
|
|
the mode into another program or to another module which doesn't understand
|
|
the python syntax for octal numbers. ``[zero][digit][digit][digit]`` is
|
|
understood by most everything and therefore the right way to express octals in
|
|
these circumstances.
|
|
|
|
Bundled six
|
|
-----------
|
|
|
|
The third-party python-six library exists to help projects create code that
|
|
runs on both Python-2 and Python-3. Ansible includes version 1.4.1 in
|
|
module_utils so that other modules can use it without requiring that it is
|
|
installed on the remote system. To make use of it, import it like this::
|
|
|
|
from ansible.module_utils import six
|
|
|
|
.. note:: Why version 1.4.1?
|
|
|
|
six-1.4.1 is the last version of python-six to support Python-2.4. As
|
|
long as Ansible modules need to run on Python-2.4 we won't be able to
|
|
update the bundled copy of six.
|
|
|
|
Compile Test
|
|
------------
|
|
|
|
We have travis compiling all modules with various versions of Python to check
|
|
that the modules conform to the syntax at those versions. When you've
|
|
ported a module so that its syntax works with Python-3, we need to modify
|
|
.travis.yml so that the module is included in the syntax check. Here's the
|
|
relevant section of .travis.yml::
|
|
|
|
env:
|
|
global:
|
|
- PY3_EXCLUDE_LIST="cloud/amazon/cloudformation.py
|
|
cloud/amazon/ec2_ami.py
|
|
[...]
|
|
utilities/logic/wait_for.py"
|
|
|
|
The :envvar:`PY3_EXCLUDE_LIST` environment variable is a blacklist of modules
|
|
which should not be tested (because we know that they are older modules which
|
|
have not yet been ported to pass the Python-3 syntax checks. To get another
|
|
old module to compile with Python-3, remove the entry for it from the list.
|
|
The goal is to have the LIST be empty.
|
|
|
|
String Model
|
|
------------
|
|
|
|
One of the big differences between Python2 and Python3 is the string model.
|
|
In Python2, most APIs take byte strings (the Python2 ``str`` type). Using the
|
|
text type (in Python2, this is the ``unicode`` type) often leads to tracebacks
|
|
because the strings need to be converted to bytes and Python fails to do that
|
|
correctly. In Python3, the situation is somewhat reversed. Most APIs take
|
|
text strings (this is **Python3's** ``str`` type). When you have byte strings
|
|
(the Python3 ``bytes`` type) you sometimes get errors when attempting to
|
|
combine those with text strings. Note, however, that under the hood, Python
|
|
still has to convert text to bytes to interface operating system libraries and
|
|
system calls. This means that you can still get tracebacks when passing
|
|
text to APIs which call those OS level facilities.
|
|
|
|
For module_utils, code we've decided to make the environment work with "native
|
|
strings". This means that on Python2, things should work if you use the byte
|
|
string type. In Python3, code should work if you give it text strings. The
|
|
reason for this is so that third party modules written for Python2 don't start
|
|
issuing UnicodeError exceptions once we've ported module_utils to work under
|
|
Python3. We'll need to gather experience to see if this is going to work out
|
|
well for modules as well or if we should give the module_utils API explicit
|
|
switches so that modules can choose to operate with text type all of the time.
|
|
|
|
Helpers
|
|
~~~~~~~
|
|
|
|
For converting between bytes, text, and native strings we have three helper
|
|
functions. These are :func:`ansible.module_utils._text.to_bytes`,
|
|
:func:`ansible.module_utils._text.to_native`, and
|
|
:func:`ansible.module_utils._text.to_text`. These are similar to using
|
|
``bytes.decode()`` and ``unicode.encode()`` with a few differences.
|
|
|
|
* By default they try very hard not to traceback.
|
|
* The default encoding is "utf-8"
|
|
* There are two error strategies that don't correspond one-to-one with
|
|
a python codec error handler. These are ``surrogate_or_strict`` and
|
|
``surrogate_or_replace``. ``surrogate_or_strict`` will use the ``surrogateescape``
|
|
error handler if available (mostly on python3) or strict if not. It is most
|
|
appropriate to use when dealing with something that needs to round trip its
|
|
value like file paths database keys, etc. Without ``surrogateescape`` the best
|
|
thing these values can do is generate a traceback that our code can catch
|
|
and decide how to show an error message. ``surrogate_or_replace`` is for
|
|
when a value is going to be displayed to the user. If the
|
|
``surrogateescape`` error handler is not present, it will replace
|
|
undecodable byte sequences with a replacement character.
|
|
|
|
================================
|
|
Porting Core Ansible to Python 3
|
|
================================
|
|
|
|
The Ansible code which runs controller-side is easier to port to Python3 in
|
|
one important way: We do not have to support Python-2.4 on the controller.
|
|
We only have to support Python-2.6 and above. However, this doesn't eliminate
|
|
the work that has to be done. The controller is a much more complicated piece
|
|
of code than any individual module. Making it Python2 and Python3 compatible
|
|
is a much more complex task.
|
|
|
|
String Model
|
|
------------
|
|
|
|
By and large, the controller uses the standard best practice of storing
|
|
everything internally as text type and converting to and from bytes at the
|
|
borders. In many places we hardcode these byte values as utf-8. Thus yaml
|
|
and inventory files are encoded in utf-8. Filenames are also utf-8. This may
|
|
not be the right answer forever but it is sufficient for now. If there's
|
|
demand from users to handle encodings other than utf-8 after the code works on
|
|
Python3 we can look into what strategy to take for supporting other encodings.
|
|
|
|
In some cases, storing values as a byte string is not necessarily a choice
|
|
without drawbacks. For instance, filenames and environment variables on POSIX
|
|
systems are a sequence of bytes. By using text to represent filenames we
|
|
prevent filenames that are undecodable in utf-8 and filenames that are not
|
|
text at all from working. We made the choice to represent these as text for
|
|
now due to code paths that handle filenames not being able to handle bytes
|
|
end-to-end. PyYAML on Python3 and jinja2 on both Python2 and Python3, for
|
|
instance, are meant to work with text. Any decision to allow filenames to be
|
|
byte values will have to address how we deal with those pieves of the code as
|
|
well.
|