(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!
Writing Packages in Python
Posted on Nov 02, 2019
A package is basically a collection of Python modules. Packages are a way of structuring both, multiple packages as well as modules which eventually leads to a well-organized hierarchy of data set, making the directories and modules easy to access. This article focuses on the process of writing and releasing Python packages. Here, we will see how to decrease the time required setting up everything before starting the real work. Along with that, we will also explore how to provide a standardised way to write packages and ease the use of test-driven development approach.
Technical Requirements: #
Before delving into the actual process, let us first download the code file that we will be using in this article. It can be downloaded from (https://github.com/PacktPublishing/Expert-Python-Programming-Third-Edition/tree/master/chapter7).
Python packages mentioned in this article can be downloaded from PyPi, and are as follows:
You can install these packages using the following command:
python3 -m pip install <package-name>
Creating a Package #
Python packaging can be a bit overwhelming at first. The main reason behind that is the confusion regarding proper tools for creating Python packages. But once the first package is created, it won’t find it as hard as it looks. Also, knowing proper, state-of-the-art packaging tools helps a lot.
You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own packages will give you more insight into the packaging ecosystem and will help you to work with third-party code that is available on PyPI that we are probably already using.
Also, having your closed source project or its components available as source distribution packages can help in deploying code in different environments. Here, we will be focusing on proper tools and techniques to create such distributions.
The Confusing State of Python Packaging Tool: #
The state of Python packaging was very confusing for a long time. Everything started with the Distutils package introduced in 1998, which was later enhanced by Setuptools in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to (once and for all) fix the Python packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools or distutils only added to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (such as to distribute which was a fork of setuptools) but some were left abandoned (such as distutils2).
Fortunately, this state is gradually changing. An organization called the Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. The Python Packaging User Guide, maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. This guide also contains a detailed history of changes and new projects related to packaging. So it is worth reading it, even if you already know a bit about packaging, to make sure you still use the proper tools.
Let's take a look at the effect of PyPA on Python packaging.
The Current Landscape of Python Packaging #
PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and a standardization process for new official aspects of Python packaging. All of PyPA's projects can be found under a single organization on GitHub: https://github.com/pypa.
The following are the most notable ones:
Note that most of them were started outside of this organization and were moved under PyPA patronage when they become mature and widespread solutions.
Thanks to PyPA engagement, the progressive abandonment of the eggs format in favour of wheels for built distributions has already happened. Also thanks to the commitment of the PyPA community, the old PyPI implementation was finally totally rewritten in the form of the Warehouse project. Now, PyPI has got a modernized user interface and many long-awaited usability improvements and features.
Tool Recommendations #
The Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into the following two groups:
- Tools for installing packages
- Tools for package creation and distribution
Utilities recommended by PyPA:
pipfor installing packages from PyPI.
venvfor application-level isolation of the Python runtime environment.
The Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:
setuptoolsto define projects and create source distributions.
- Use wheels in favour of eggs to create built distributions.
twineto upload package distributions to PyPI.
Project Configuration #
The easiest way to organize the code of big applications is to split them into several packages. This makes the code simpler, easier to understand, maintain, and change. It also maximizes the reusability of your code. Separate packages act as components that can be used in various programs.
The root directory of a package that has to be distributed contains a
setup.py script. It defines all metadata as described in the
distutils module. Package metadata is expressed as arguments in a call to the standard
setup() function. Despite
distutils being the standard library module provided for the purpose of code packaging, it is actually recommended to use the
setuptools instead. The setuptools
package provides several enhancements over the standard distutils` module.
Therefore, the minimum content for this file is as follows:
1 2 3 4 5
from setuptools import setup setup( name='mypackage', )
name gives the full name of the package. From there, the script provides several commands that can be listed with the
--help-commands option, as shown in the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ python3 setup.py --help-commands Standard commands: build build everything needed to install clean clean up temporary files from 'build' command install install everything from build directory sdist create a source distribution (tarball, zip file, etc.) registerregister the distribution with the Python package index bdist create a built (binary) distribution check perform some checks on the package uploadupload binary package to PyPI Extra commands: bdist_wheel create a wheel distribution alias define a shortcut to invoke one or more commands develop install package in 'development mode' usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help
The actual list of commands is longer and can vary depending on the available
setuptools extensions. It was truncated to show only those that are most important and relevant to this article. Standard commands are the built-in commands provided by
distutils, whereas extra commands are the ones provided by third-party packages, such as
setuptools or any other package that defines and registers a new command. Here, one such extra command registered by another package is
bdist_wheel, provided by the
setup.cfg file contains default options for commands of the
setup.py script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the
setup.py script commands. This `setup.cfg file allows you to store such default parameters together with your source code on a per project basis. This will make your distribution flow independent from the project and also provides transparency about how your package was built/distributed to the users and other team members.
The syntax for the
setup.cfg file is the same as provided by the built-in
configparser module so it is similar to the popular Microsoft Windows INI files. Here is an example of the
setup.cfg configuration file that provides some
bdist_wheel commands' defaults:
1 2 3 4 5 6 7 8
[global] quiet=1 [sdist] formats=zip,tar [bdist_wheel] universal=1
This example configuration will ensure that source distributions (
sdist section) will always be created in two formats (ZIP and TAR) and the built
wheel distributions (
bdist_wheel section) will be created as universal wheels that are independent from the Python version. Also most of the output will be suppressed on every command by the global
--quiet switch. Note that this option is included here only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.
When building a distribution with the
sdist command, the
distutils module browses the package directory looking for files to include in the archive. By default
distutils will include the following:
- All Python source files implied by the
- All C source files listed in the
- Files that match the glob pattern
- Files named
Besides that, if your package is versioned with a version control system such as Subversion, Mercurial, or Git, there is the possibility to auto-include all version controlled files using additional
setuptools extensions such as setuptools-svn, setuptools-hg , and
setuptools-git. Integration with other version control systems is also possible through other custom extensions. No matter if it is the default built-in collection strategy or one defined by custom extension, the
sdist will create a
MANIFEST file that lists all files and will include them in the final archive.
Let's say you are not using any extra extensions, and you need to include in your package distribution some files that are not captured by default. You can define a template called
MANIFEST.in in your package root directory (the same directory as setup.py file). This template directs the
sdist command on which files to include.
MANIFEST.in template defines one inclusion or exclusion rule per line:
1 2 3 4 5 6
include HISTORY.txt include README.txt include CHANGES.txt include CONTRIBUTORS.txt include LICENSE recursive-include *.txt *.py
The full list of
MANIFEST.in commands can be found in the official
Most Important Metadata #
Besides the name and the version of the package being distributed, the most important arguments that the
setup() function can receive are as follows:
description: This includes a few sentences to describe the package.
long_description: This includes a full description that can be in
reStructuredText(default) or other supported mark-up languages.
long_description_content_type: this defines MIME type of long description; it is used to tell the package repository what kind of mark-up language is used for the package description.
keywords: This is a list of keywords that define the package and allow for better indexing in the package repository.
author: This is the name of the package author or organization that takes care of it.
author_email: This is the contact email address.
url: This is the URL of the project.
license: This is the name of the license (GPL, LGPL, and so on) under which the package is distributed.
packages: This is a list of all package names in the package distribution;
setuptoolsprovides a small function called
find_packagesthat can automatically find package names to include.
namespace_packages: This is a list of namespace packages within package distribution.
Trove Classifiers #
distutils provide a solution for
categorizing applications with the set of classifiers called trove classifiers. All trove classifiers form a tree-like structure. Each classifier string defines a list of nested namespaces where every namespace is separated by the :: substring. Their list is provided to the package definition as a
classifiers argument of the
Here is an example list of classifiers taken from solrq project available on PyPI:
1 2 3 4 5 6 7 8 9 10 11 12
from setuptools import setup setup( name="solrq", # (...) classifiers=[ 'Development Status :: 4 - Beta', 'Intended Audience :: Developers', 'License :: OSI Approved :: BSD License', 'Operating System :: OS Independent', 'Programming Language :: Python',
Trove classifiers are completely optional in the package definition but provide a useful extension to the basic metadata available in the
setup() interface. Among others, trove classifiers may provide information about supported Python versions, supported operating systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.
Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.
Currently, there are 667 classifiers available on PyPI that are grouped into the following nine major categories:
- Development status
- Intended audience
- Natural language
- Operating system
- Programming language
This list is ever-growing, and new classifiers are added from time to time. It is thus possible that the total count of them will be different at the time you read this. The full list of currently available trove classifiers is available here.
Common patterns #
Creating a package for distribution can be a tedious task for inexperienced developers. Most of the metadata that
distuitls accept in their
setup() function call can be provided manually ignoring the fact that this metadata may be also available in other parts of the project. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
from setuptools import setup setup( name="myproject", version="0.0.1", description="mypackage project short description", long_description=""" Longer description of mypackage project possibly with some documentation and/or usage examples """, install_requires=[ 'dependency1', 'dependency2', 'etc', ] )
Some of the metadata elements are often found in different places in a typical Python project. For instance, content of long description is commonly included in the project's README file, and it is a good convention to put a version specifier in the
__init__ module of the package. Hard coding such package metadata as
setup() function arguments redundancy to the project that allows for easy mistakes and inconsistencies in future. Both
distutils cannot automatically pick metadata information from the project sources, so you need to provide it yourself. There are some common patterns among the Python community for solving the most popular problems such as dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.
Automated Inclusion of Version String from Package #
The PEP 440 Version Identification and Dependency Specification document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and defines how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then you should definitely read this document carefully.
If you are using a simple scheme that consists just of one, two, three, or more numbers separated by dots, then you don't have to dig into the details of PEP 440. If you don't know how to choose the proper versioning scheme, it is hugely recommended to follow the semantic versioning scheme which can be referred from here.
The other problem related to code versioning is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers) that deals exactly with this problem. PEP 396 is only an informational document and has a deferred status, so it is not a part of the official Python standards track. Anyway, it describes what seems to be a de facto standard now.
According to PEP 396, if a package or module has a specific version defined, the version specifier should be included as a
__version__ attribute of package root
__init__.py INI file or distributed module file. Another de facto standard is to also include the
VERSION attribute that contains the tuple of the version specifier parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.
Many packages available on PyPI follow both conventions. Their
__init__.py files contain version attributes that look like the following:
1 2 3 4
# version as tuple for simple comparisons VERSION = (0, 1, 1) # string created from tuple to avoid inconsistency __version__ = ".".join([str(x) for x in VERSION])
The other suggestion of PEP 396 is that the version argument provided in the
setup() function of the
setup.py script should be derived from
__version__ or the other way around. The Python Packaging User Guide features multiple patterns for single-sourcing project versioning, and each of them has its own advantages and limitations. One such pattern which is rather long, but has the advantage of limiting the complexity only to the
setup.py script is not included in PyPA’s guide. This boilerplate assumes that the version specifier is provided by the
VERSION attribute of the package's
__init__ module and extracts this data for inclusion in the
setup() call. Here is an excerpt from some imaginary package's
setup.py script that illustrates this approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
from setuptools import setup import os def get_version(version_tuple): # additional handling of a,b,rc tags, this can # be simpler depending on your versioning scheme if not isinstance(version_tuple[-1], int): return '.'.join( map(str, version_tuple[:-1]) ) + version_tuple[-1] return '.'.join(map(str, version_tuple)) # path to the packages __init__ module in project # source tree init = os.path.join( os.path.dirname(__file__), 'src', 'some_package', '__init__.py' ) version_line = list( filter(lambda l: l.startswith('VERSION'), open(init)) ) # VERSION is a tuple so we need to eval 'version_line'. # We could simply import it from the package but we # cannot be sure that this package is importable before # installation is done. PKG_VERSION = get_version(eval(version_line.split('=')[-1])) setup( name='some-package', version=PKG_VERSION, # ... )
README file: #
The Python Package Index can display the project's README file or the value of long_description on the package page in the PyPI portal. PyPI is able to interpret the mark-up used in the
long_description content and render it as HTML on the package page. The type of mark-up language is controlled through the
long_description_content_type argument of the
setup() call. For now, there are the following three choices for mark-up available:
- Plain text with
- reStructuredText with
- Markdown with
Markdown and reStructuredText are the most popular choices among Python developers, but some might still want to use different mark-up languages for various reasons. If you want to use something different as your mark-up language for your project's README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the
pypandoc package to translate your other mark-up language into reStructuredText (or Markdown) while uploading the package to the Python Package Index. It is important to do it with a fallback to plain content of your
README file, so the installation won't fail if the user has no
pypandoc installed. The following is an example of a setup.py script that is able to read the content of the
README file written in AsciiDoc mark-up language and translate it to reStructuredText before including a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
from setuptools import setup try: from pypandoc import convert def read_md(file_path): return convert(file_path, to='rst', format='asciidoc') except ImportError: convert = None print( "warning: pypandoc module not found, " "could not convert Asciidoc to RST" ) def read_md(file_path): with open(file_path, 'r') as f: return f.read() README = os.path.join(os.path.dirname(__file__), 'README') setup( name='some-package', long_description=read_md(README), long_description_content_type='text/x-rst', # ... )
Managing Dependencies #
Many projects require some external packages to be installed in order to work properly. When the list of dependencies is very long, it becomes difficult to manage it. To make it easier, do not over-engineer it. Keep it simple and provide the list of dependencies explicitly in your
setup.py script as follows:
1 2 3 4 5 6
from setuptools import setup setup( name='some-package', install_requires=['falcon', 'requests', 'delorean'] # ... )
Some Python developers like to use
requirements.txt files for tracking lists of dependencies for their packages. In some situations, you might find some reason for doing that, but in most cases, this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you want to stick to your habit or are somehow forced to use requirement files, then it is important to do it properly. Here is one of the popular idioms for reading the list of dependencies from the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
from setuptools import setup import os def strip_comments(l): return l.split('#', 1).strip() def reqs(*f): return list(filter(None, [strip_comments(l) for l in open( os.path.join(os.getcwd(), *f)).readlines()])) setup( name='some-package', install_requires=reqs('requirements.txt') # ... )
The Custom Setup Command #
distutils allows you to create new commands. A new command can be registered with an entry point, which was introduced by
setuptools as a simple way to define packages as plugins.
An entry point is a named link to a class or a function that is made available through some APIs in
setuptools. Any application can scan for all registered packages and use the linked code as a plugin.
To link the new command, the
entry_points metadata can be used in the setup call as follows:
1 2 3 4 5 6 7
setup( name="my.command", entry_points=""" [distutils.commands] my_command = my.command.module.Class """ )
All named links are gathered in named sections. When
distutils is loaded, it scans for links that were registered under
distutils.commands. This mechanism is used by numerous Python applications that provide extensibility.
Working with Packages during Development #
setuptools is mostly about building and distributing packages. However,
setuptools is still used to install packages directly from project sources. And the reason for that is to test if our packaging code works properly before submitting the package to PyPI. And the simplest way to test it is by installing it. If you send a broken package to the repository, then in order to re-upload it, you need to increase the version number.
Testing the package of your code properly before the final distribution saves you from unnecessary version number inflation and obviously from wasting your time. Also, installation directly from your own sources using
setuptools may be essential when working on multiple related packages at the same time.
setup.py install #
install command installs the package in your current Python environment. It will try to build the package if no previous build was made and then inject the result into the filesystem directory where Python is looking for installed packages. If you have an archive with a source distribution of some package, you can decompress it in a temporary folder and then install it with this command. The
install command will also install dependencies that are defined in the
install_requires argument. Dependencies will be installed from the Python Package Index.
An alternative to the bare
setup.py script when installing a package is to use pip. Since it is a tool that is recommended by PyPA, it should be used even when installing a package in the local environment just for development purposes. In order to install a package from local sources, run the following command:
pip install <project-path>
Uninstalling Packages #
distutils lack the
uninstall command. Fortunately, it is possible to uninstall any Python package using
pip as follows:
pip uninstall <package-name>
Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.
setup.py develop or pip -e #
Packages installed with
setup.py install are copied to the
site-packages directory of your current Python environment. This means that whenever any changes are made to the sources of that package, reinstalling it would be required. This is often a problem during intensive development because it is very easy to forget about the need to perform the installation again. This is why
setuptools provide an extra
develop command that allows you to install packages in the development mode. This command creates a special link to project sources in the deployment directory (
site-packages) instead of copying the whole package there. Package sources can be edited without the need for reinstallation and are available in the
sys.path as if they were installed normally.
pip also allows you to install packages in such a mode. This installation option is called editable mode and can be enabled with the
-e parameter in the
install command as follows:
pip install -e <project-path>
Once you install the package in your environment in editable mode, you can freely modify the installed package in place and all the changes will be immediately visible without the need to reinstall the package.
In this article, we summarized how to create a package and understood a common pattern for all packages that describes similarities between Python packages and how distutils and setuptools play a central role in the packaging process. If you found this useful and wish you explore it further, ‘Expert Python Programming – Third Edition’ might appear to be helpful. This book primarily takes you through the new features in Python 3.7. With this, you will be able to advanced components of Python syntax and much more. By the end, you should expect to become an expert in writing efficient and maintainable Python code.
Other Tutorials (Sponsors)
This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!