When learning a programming language, the focus is essentially on understanding the syntax, the code style, and the underlying concepts. With time, you become sufficiently comfortable with the language and you start writing programs solving new exciting problems.
However, when you need to move towards this step, there is an aspect that one might have underestimated which is how to build the right environment. An environment that enforces good software engineering practices, improves productivity and facilitates collaboration. At Adaltas, we manage several open source projects and we welcome a lot of contributions. They are mostly targeting the Node.js platform. Based on this experience, we already established common practices for managing large scale projects written in Node.js.
Another language we also use a lot in our daily job as data consultant is Python. Packaging and tooling with Python is often described as cumbersome and challenging. In this regard, several open-source projects emerged in the last years and aim at facilitating the management of Python packages along your working projects. We are going to see here how to use two of them: Pyenv, to manage and install different Python versions, and Poetry, to manage your packages and virtual environments. Combined or used individually, they help you to establish a productive environment.
This article is the first one from a series of three in which we share our best practices.
- Part 1: project initialization with pyenv and poetry
- Part 2: unit testing and commit enforcement
- Part 3: CI pipeline with GitHub Actions and publication on PiPy
pyenv you require some OS-specific dependencies. These are needed as
pyenv installs Python by building from source. For Ubuntu/Debian be sure to have the following packages installed:
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \ libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \ libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python-openssl
To know the required dependencies on your OS go read this documentation. Once the dependencies are installed you can now install
pyenv. For this, I recommend using pyenv-installer that automates the process.
curl https://pyenv.run | bash
From there on, you can install on your system any versions of Python you wish. You can use the following command to all versions and flavors of Python available:
pyenv install --list
In our case we are going to install the classical CPython in versions
pyenv install 3.7.10 Downloading Python-3.7.10.tar.xz... -> https://www.python.org/ftp/python/3.7.10/Python-3.7.10.tar.xz Installing Python-3.7.10... Installed Python-3.7.10 to /home/fbraza/.pyenv/versions/3.7.10
Once the versions are installed you can see them by running:
pyenv versions * system 3.7.10 3.8.7 3.9.2
You can see that
pyenv identified recently installed Python versions and also the one installed by default on your system. The
system means that the global version used now is the system version.
pyenv permits to manage Python versions at different levels: globally and locally. Let’s say we are going to set version
3.7.10 as our global version.
pyenv global 3.7.10
Let’s list our version again:
pyenv versions system * 3.7.10 (set by /home/<username>/.pyenv/version) 3.8.7 3.9.2
You can see that
3.7.10 as our global Python version. This will not alter the operations that require the use of the system version. The path you can read between parenthesis corresponds to the path that points to the required Python version. How does this work? Briefly,
pyenv captures Python commands using executables injected into your
PATH. Then it determines which Python version you need to use, and passes the commands to the correct Python installation. Feel free to read the complete documentation to better understand the functionalities and possibilities offered by
Don’t be confused by the semantic here. Change the global version will not affect your system version. The system version corresponds to the version used by your OS to accomplish specific tasks or run background processes that depend on this specific Python version. Do not switch the system version to another one or you may face several issues with your OS! This version is usually updated along with your OS. The global version is just the version that
pyenvwill use to execute your Python commands / programs globally.
Poetry allows you to efficiently manage dependencies and packages in Python. It has a similar role as
setup.py or pipenv, but offers more flexibility and functionalities. You can declare the libraries your project depends on in a
poetry will then install or update them on demand. Additionally this tools allows you to encapsulate your working project into isolated environments. Finally, you can use
poetry to directly publish your package on Pypi.
As a last pre-requisite we are going to install
poetry by running the following command:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
We are going to see how to create a project and isolate it inside a Python environment using
Let’s first create a directory named
my_awesome_project and move inside:
mkdir my_awesome_project && cd $_
Once inside, set the local Python version we are going to use (we are going to use Python 3.8.7). This will prompt
poetry to use the local version of Python defined by
pyenv local 3.8.7
This creates a
.python-version file inside our project. This file will be read by
pyenv and prompts it to set the defined local Python version. Consequently every directory or file created down this point will depend on the local Python version and not the global one.
Poetry proposes a robust CLI allowing you to create, configure and update your Python project and dependencies. To create your Python project use the following command:
poetry new <project_name>
This command generates a default project scaffold. The content of our new project is the following:
. ├── <project_name> │ └── __init__.py ├── pyproject.toml ├── README.rst └── tests ├── __init__.py └── test_summarize_dataframe.py
pyproject.toml. This is where we define everything from our project’s metadata, dependencies, scripts, and more. If you’re familiar with Node.js, consider the
pyproject.toml as an equivalent of the Node.js
[tool.poetry] name = "your_project_name" version = "0.1.0" description = "" authors = ["<username> <email address>"] [tool.poetry.dependencies] python = "^3.8" [tool.poetry.dev-dependencies] pytest = "^5.2" [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api"
We can see several entries in our default
[tool.poetry]: This section contains metadata about our package. You can put there the package name, a short description, author’s details, the version of your project, and so on. All details here are optional but will be required if you decided to publish the package on Pypi.
[tool.poetry.dependencies]: This section contains all required dependencies for our package. You can specify specific version numbers for these packages (
packageX = "1.0.0") or use symbols. The version of Python we want the project to use is defined here as well. In our case
python = "^3.8"specifies the minimum version required to run our app. Here this is Python 3.8 and this has been based on the version of our local version defined with
[tool.poetry.dev-dependencies]: This section contains all developer dependencies which are packages needed to work and iterate on this project. Nevertheless, these dependencies are not required to run the app and will not be downloaded when building the package.
[build-system]: Do not touch this section unless you updated the version of
you can see the full list of available entries for the
Here you have two approaches: whether you know in advance all dependencies you need and you can directly alter the
.toml file accordingly or you decide to add later on when needed. In our example, we are going to add progressively our dependencies while writing code. Consequently, we just need to initialize the project and create the virtual environment. To do this run the command:
poetry install Creating virtualenv summarize-dataframe-SO-g_7pj-py3.8 in ~/.cache/pypoetry/virtualenvs Updating dependencies Resolving dependencies... (6.4s) Writing lock file Package operations: 8 installs, 0 updates, 0 removals • Installing pyparsing (2.4.7) • Installing attrs (20.3.0) • Installing more-itertools (8.7.0) • Installing packaging (20.9) • Installing pluggy (0.13.1) • Installing py (1.10.0) • Installing wcwidth (0.2.5) • Installing pytest (5.4.3) Installing the current project: summarize_dataframe (0.1.0)
Firstly the virtual environment is created and stored outside of the project. A bit similar to what we have when using
conda. Indeed, Instead of creating a folder containing your dependency libraries (as
poetry creates an environment on a global system path (
.cache/ by default). This separation of concerns allows keeping your project away from dependency source code.
You can create your virtual environment inside your project or in any other directories. For that you need to edit the configuration of poetry. Follow this documentation for more details.
poetry is going to read the
pyproject.toml and install all dependencies specified in this file. If not defined, poetry will download the last version of the packages. At the end of the operation, a
poetry.lock file is created. It contains all packages and their exact versions. Keep in mind that if a
poetry.lock file is already present, the version numbers defined in it take precedence over what is defined in the
pyproject.toml. Finally, you should commit the
poetry.lock file to your project repository so that all collaborators working on the project use the same versions of dependencies.
Now let’s activate the environment we just created with the following command:
peotry shell Spawning shell within ~/.cache/pypoetry/virtualenvs/summarize-dataframe-SO-g_7pj-py3.8 . ~/.cache/pypoetry/virtualenvs/summarize-dataframe-SO-g_7pj-py3.8/bin/activate
The command creates a child process that inherits from the parent Shell but will not alter its environment. It encapsulates and restrict any modifications you will perform to your project environment.
For our last step here we are going to create a git repository, add
.gitignore files and push everything to our remote repository.
git init git remote add origin https://github.com/fbraza/summarize_dataframe.git echo ".*\n!.gitignore" > .gitignore echo "# Summarize dataframe" > README.md git add . git commit -m "build: first commit. Environment built" git push -u origin master
Herein we have seen how to install and manage different versions of Python on our machine using
pyenv. We demonstrated how to leverage
pyenv local to set a specific Python version in your project and then create a virtual environment using
poetry. The use of
poetry really smoothens the process of creation by proposing a simple and widely project scaffold. In addition, it includes the minimum build system requirements as defined by PEP 518.
In our next article, we are going to dive more into our project. We will write some code with their respective unit tests and see how we can use
poetry to add the expected dependencies and run the tests. Finally, we are going to go a bit further and install all necessary dependencies with
poetry to help us enforcing good practices with our
git commits when using a Python project.
Get all available and installable versions of Python
pyenv install --list
Set the global Python version
pyenv global <version_id>
Set a local Python version
pyenv local <version_id>
Create a project
poetry new <project_name>
Install core dependencies and create environment