Skip to content
Snippets Groups Projects
README.md 3.15 KiB
Newer Older
Christian Meeßen's avatar
Christian Meeßen committed
# Why version pinning is important

This repository shows why version pinning in research software development is important, by showing the difference of running the same code in Python 3.1 and in Python 3.2.

## The code

This is the code the we run:

```python
import random

items = [1, 2, 3, 4, 5, 6, 7]
items_reproducible = [4, 7, 6, 3, 1, 5, 2]

# Define a seed to make it reproducible
random.seed(a=1)

# Whe shuffle the items
random.shuffle(items)

# Now check the output
print("The output is              : ", items)
print("The output should have been: ", items_reproducible)
if items == items_reproducible:
    print("Success!")
else:
    print("Failure!")
```

We use the built-in `random` module of Python to shuffle an array of integers.
To make the output reproducible, we predefine a seed. This means, the `items`
array after shuffling should equal the `items_reproducible`.

## Running the example

To test the example yourself, clone the repository and run:

```shell
docker-compose build
docker-compose up
```

The output is:

```txt
$ docker-compose up
[+] Running 2/0
 ⠿ Container version-pinning-python32-1  Created                                 0.0s
 ⠿ Container version-pinning-python31-1  Created                                 0.0s
Attaching to version-pinning-python31-1, version-pinning-python32-1
version-pinning-python32-1  | Python version: 3.2.6
version-pinning-python32-1  | The output is              :  [4, 7, 6, 3, 1, 5, 2]
version-pinning-python32-1  | The output should have been:  [3, 7, 5, 2, 4, 6, 1]
version-pinning-python32-1  | Failure!
version-pinning-python31-1  | Python version: 3.1.5
version-pinning-python31-1  | The output is              :  [3, 7, 5, 2, 4, 6, 1]
version-pinning-python31-1  | The output should have been:  [3, 7, 5, 2, 4, 6, 1]
version-pinning-python31-1  | Success!
version-pinning-python31-1 exited with code 0
version-pinning-python32-1 exited with code 0
```

Here are the important parts:

```txt
Python version: 3.2.6
The output is              :  [4, 7, 6, 3, 1, 5, 2]
The output should have been:  [3, 7, 5, 2, 4, 6, 1]
Failure!

Python version: 3.1.5
The output is              :  [3, 7, 5, 2, 4, 6, 1]
The output should have been:  [3, 7, 5, 2, 4, 6, 1]
Success!
```

We see that the result of the shuffle version between the Python versions 3.1.5
and 3.2.6 differs.

## What happened?

Looking at the [documentation](https://docs.python.org/release/3.2.3/library/random.html?highlight=random#random.seed)
of the seed function in Python shows:

> **Changed in version 3.2:** Moved to the version 2 scheme which uses all of the bits in a string seed.

As of Python 3.2, the random number generator was updated to another version,
which is the reason why the output is not the same anymore.

## How to mitigate this?

Things like this can be mitigated by using package manager. For Python, these
may be

* [Poetry](https://python-poetry.org/) (recommended)
* [Pipenv](https://pipenv.pypa.io/en/latest/)
* [Conda](https://docs.conda.io/en/latest/) (**important**: do not use the offficial Anaconda channels if you do not have a license)

These package managers will define the Python version and also all the versions
of the used dependencies.