Writing Packages in Python


(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!


A package is basically a collection of Python modules. Packages are a way of structuring both, multiple packages as well as modules which eventually leads to a well-organized hierarchy of data set, making the directories and modules easy to access. This article focuses on the process of writing and releasing Python packages. Here, we will see how to decrease the time required setting up everything before starting the real work. Along with that, we will also explore how to provide a standardised way to write packages and ease the use of test-driven development approach.

Technical Requirements:

Before delving into the actual process, let us first download the code file that we will be using in this article. It can be downloaded from (https://github.com/PacktPublishing/Expert-Python-Programming-Third-Edition/tree/master/chapter7.)

Python packages mentioned in this article can be downloaded from PyPi, and are as follows:

  • twine
  • wheel
  • cx_Freeze
  • py2exe
  • pyinstaller

You can install these packages using the following command:

Creating a Package:

Python packaging can be a bit overwhelming at first. The main reason behind that is the confusion regarding proper tools for creating Python packages. But once the first package is created, it won’t find it as hard as it looks. Also, knowing proper, state-of-the-art packaging tools helps a lot.

You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own packages will give you more insight into the packaging ecosystem and will help you to work with third-party code that is available on PyPI that we are probably already using.

Also, having your closed source project or its components available as source distribution packages can help in deploying code in different environments. Here, we will be focusing on proper tools and techniques to create such distributions.

The Confusing State of Python Packaging Tool:

The state of Python packaging was very confusing for a long time. Everything started with the Distutils package introduced in 1998, which was later enhanced by Setuptools in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to (once and for all) fix the Python packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools or distutils only added to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (such as to distribute which was a fork of setuptools) but some were left abandoned (such as distutils2).

Fortunately, this state is gradually changing. An organization called the Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. The Python Packaging User Guide, maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. This guide also contains a detailed history of changes and new projects related to packaging. So it is worth reading it, even if you already know a bit about packaging, to make sure you still use the proper tools.

Let’s take a look at the effect of PyPA on Python packaging.

The Current Landscape of Python Packaging

PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and a standardization process for new official aspects of Python packaging. All of PyPA’s projects can be found under a single organization on GitHub: https://github.com/pypa.

The following are the most notable ones:

  • pip
  • virtualenv
  • twine
  • warehouse

Note that most of them were started outside of this organization and were moved under PyPA patronage when they become mature and widespread solutions.

Thanks to PyPA engagement, the progressive abandonment of the eggs format in favour of wheels for built distributions has already happened. Also thanks to the commitment of the PyPA community, the old PyPI implementation was finally totally rewritten in the form of the Warehouse project. Now, PyPI has got a modernized user interface and many long-awaited usability improvements and features.

Tool Recommendations

The Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into the following two groups:

  • Tools for installing packages
  • Tools for package creation and distribution

Utilities recommended by PyPA:

  • Use pip for installing packages from PyPI.
  • Use virtualenv or venv for application-level isolation of the Python runtime environment.

The Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:

  • Use setuptools to define projects and create source distributions.
  • Use wheels in favour of eggs to create built distributions.
  • Use twine to upload package distributions to PyPI.

Project Configuration

The easiest way to organize the code of big applications is to split them into several packages. This makes the code simpler, easier to understand, maintain, and change. It also maximizes the reusability of your code. Separate packages act as components that can be used in various programs.

setup.py

The root directory of a package that has to be distributed contains a setup.py script. It defines all metadata as described in the distutils module. Package metadata is expressed as arguments in a call to the standard setup() function. Despite distutils being the standard library module provided for the purpose of code packaging, it is actually recommended to use the setuptools instead. The setuptools package provides several enhancements over the standard distutils module.

Therefore, the minimum content for this file is as follows:

name gives the full name of the package. From there, the script provides several commands that can be listed with the –help-commands option, as shown in the following code:

The actual list of commands is longer and can vary depending on the available setuptools extensions. It was truncated to show only those that are most important and relevant to this article. Standard commands are the built-in commands provided by distutils, whereas extra commands are the ones provided by third-party packages, such as setuptools or any other package that defines and registers a new command. Here, one such extra command registered by another package is bdist_wheel, provided by the wheel package.

setup.cfg

The setup.cfg file contains default options for commands of the setup.py script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the setup.py script commands. This setup.cfg file allows you to store such default parameters together with your source code on a per project basis. This will make your distribution flow independent from the project and also provides transparency about how your package was built/distributed to the users and other team members.

The syntax for the setup.cfg file is the same as provided by the built-in configparser module so it is similar to the popular Microsoft Windows INI files. Here is an example of the setup.cfg configuration file that provides some globalsdist, and bdist_wheel commands’ defaults:

This example configuration will ensure that source distributions (sdist section) will always be created in two formats (ZIP and TAR) and the built wheel distributions (bdist_wheel section) will be created as universal wheels that are independent from the Python version. Also most of the output will be suppressed on every command by the global –quiet switch. Note that this option is included here only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.

MANIFEST.in

When building a distribution with the sdist command, the distutils module browses the package directory looking for files to include in the archive. By default distutils will include the following:

  • All Python source files implied by the py_modulespackages, and scripts arguments
  • All C source files listed in the ext_modules argument
  • Files that match the glob pattern test/test*.py
  • Files named READMEREADME.txtsetup.py, and setup.cfg

Besides that, if your package is versioned with a version control system such as Subversion, Mercurial, or Git, there is the possibility to auto-include all version controlled files using additional setuptools extensions such as setuptools-svn, setuptools-hg , and setuptools-git. Integration with other version control systems is also possible through other custom extensions. No matter if it is the default built-in collection strategy or one defined by custom extension, the sdist will create a MANIFEST file that lists all files and will include them in the final archive.

Let’s say you are not using any extra extensions, and you need to include in your package distribution some files that are not captured by default. You can define a template called MANIFEST.in in your package root directory (the same directory as setup.py file). This template directs the sdist command on which files to include.

This MANIFEST.in template defines one inclusion or exclusion rule per line:

The full list of MANIFEST.in commands can be found in the official distutils documentation.

Most Important Metadata

Besides the name and the version of the package being distributed, the most important arguments that the setup() function can receive are as follows:

  • description: This includes a few sentences to describe the package.
  • long_description: This includes a full description that can be in reStructuredText (default) or other supported mark-up languages.
  • long_description_content_type: this defines MIME type of long description; it is used to tell the package repository what kind of mark-up language is used for the package description.
  • keywords: This is a list of keywords that define the package and allow for better indexing in the package repository.
  • author: This is the name of the package author or organization that takes care of it.
  • author_email: This is the contact email address.
  • url: This is the URL of the project.
  • license: This is the name of the license (GPL, LGPL, and so on) under which the package is distributed.
  • packages: This is a list of all package names in the package distribution; setuptools provides a small function called find_packages that can automatically find package names to include.
  • namespace_packages: This is a list of namespace packages within package distribution.

Trove Classifiers

PyPI and distutils provide a solution for categorizing applications with the set of classifiers called trove classifiers. All trove classifiers form a tree-like structure. Each classifier string defines a list of nested namespaces where every namespace is separated by the :: substring. Their list is provided to the package definition as a classifiers argument of the setup() function.

Here is an example list of classifiers taken from solrq project available on PyPI:

Trove classifiers are completely optional in the package definition but provide a useful extension to the basic metadata available in the setup() interface. Among others, trove classifiers may provide information about supported Python versions, supported operating systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.

Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.

Currently, there are 667 classifiers available on PyPI that are grouped into the following nine major categories:

  • Development status
  • Environment
  • Framework
  • Intended audience
  • License
  • Natural language
  • Operating system
  • Programming language
  • Topic

This list is ever-growing, and new classifiers are added from time to time. It is thus possible that the total count of them will be different at the time you read this. The full list of currently available trove classifiers is available here.

Common patterns

Creating a package for distribution can be a tedious task for inexperienced developers. Most of the metadata that setuptools or distuitls accept in their setup() function call can be provided manually ignoring the fact that this metadata may be also available in other parts of the project. Here is an example:

Some of the metadata elements are often found in different places in a typical Python project. For instance, content of long description is commonly included in the project’s README file, and it is a good convention to put a version specifier in the __init__ module of the package. Hard coding such package metadata as setup() function arguments redundancy to the project that allows for easy mistakes and inconsistencies in future. Both setuptools and distutils cannot automatically pick metadata information from the project sources, so you need to provide it yourself. There are some common patterns among the Python community for solving the most popular problems such as dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.

Automated Inclusion of Version String from Package

The PEP 440 Version Identification and Dependency Specification document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and defines how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then you should definitely read this document carefully.

If you are using a simple scheme that consists just of one, two, three, or more numbers separated by dots, then you don’t have to dig into the details of PEP 440. If you don’t know how to choose the proper versioning scheme, it is hugely recommended to follow the semantic versioning scheme which can be referred from here.

The other problem related to code versioning is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers) that deals exactly with this problem. PEP 396 is only an informational document and has a deferred status, so it is not a part of the official Python standards track. Anyway, it describes what seems to be a de facto standard now.

According to PEP 396, if a package or module has a specific version defined, the version specifier should be included as a __version__ attribute of package root __init__.py INI file or distributed module file. Another de facto standard is to also include the VERSION attribute that contains the tuple of the version specifier parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.

Many packages available on PyPI follow both conventions. Their __init__.py files contain version attributes that look like the following:

The other suggestion of PEP 396 is that the version argument provided in the setup() function of the setup.py script should be derived from __version__ or the other way around. The Python Packaging User Guide features multiple patterns for single-sourcing project versioning, and each of them has its own advantages and limitations. One such pattern which is rather long, but has the advantage of limiting the complexity only to the setup.py script is not included in PyPA’s guide. This boilerplate assumes that the version specifier is provided by the VERSION attribute of the package’s __init__ module and extracts this data for inclusion in the setup() call. Here is an excerpt from some imaginary package’s setup.py script that illustrates this approach:

README file:

The Python Package Index can display the project’s README file or the value of long_description on the package page in the PyPI portal. PyPI is able to interpret the mark-up used in the long_description content and render it as HTML on the package page. The type of mark-up language is controlled through the long_description_content_type argument of the setup() call. For now, there are the following three choices for mark-up available:

  • Plain text with long_description_content_type=’text/plain’
  • reStructuredText with long_description_content_type=’text/x-rst’
  • Markdown with long_description_content_type=’text/markdown’

Markdown and reStructuredText are the most popular choices among Python developers, but some might still want to use different mark-up languages for various reasons. If you want to use something different as your mark-up language for your project’s README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the pypandoc package to translate your other mark-up language into reStructuredText (or Markdown) while uploading the package to the Python Package Index. It is important to do it with a fallback to plain content of your README file, so the installation won’t fail if the user has no pypandoc installed. The following is an example of a setup.py script that is able to read the content of the README file written in AsciiDoc mark-up language and translate it to reStructuredText before including a long_description argument:

Managing Dependencies

Many projects require some external packages to be installed in order to work properly. When the list of dependencies is very long, it becomes difficult to manage it. To make it easier, do not over-engineer it. Keep it simple and provide the list of dependencies explicitly in your setup.py script as follows:

Some Python developers like to use requirements.txt files for tracking lists of dependencies for their packages. In some situations, you might find some reason for doing that, but in most cases, this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you want to stick to your habit or are somehow forced to use requirement files, then it is important to do it properly. Here is one of the popular idioms for reading the list of dependencies from the requirements.txt file:

The Custom Setup Command

distutils allows you to create new commands. A new command can be registered with an entry point, which was introduced by setuptools as a simple way to define packages as plugins.

An entry point is a named link to a class or a function that is made available through some APIs in setuptools. Any application can scan for all registered packages and use the linked code as a plugin.

To link the new command, the entry_points metadata can be used in the setup call as follows:

All named links are gathered in named sections. When distutils is loaded, it scans for links that were registered under distutils.commands. This mechanism is used by numerous Python applications that provide extensibility.

Working with Packages during Development

Working with setuptools is mostly about building and distributing packages. However, setuptools is still used to install packages directly from project sources. And the reason for that is to test if our packaging code works properly before submitting the package to PyPI. And the simplest way to test it is by installing it. If you send a broken package to the repository, then in order to re-upload it, you need to increase the version number.

Testing the package of your code properly before the final distribution saves you from unnecessary version number inflation and obviously from wasting your time. Also, installation directly from your own sources using setuptools may be essential when working on multiple related packages at the same time.

setup.py install

The install command installs the package in your current Python environment. It will try to build the package if no previous build was made and then inject the result into the filesystem directory where Python is looking for installed packages. If you have an archive with a source distribution of some package, you can decompress it in a temporary folder and then install it with this command. The install command will also install dependencies that are defined in the install_requires argument. Dependencies will be installed from the Python Package Index.

An alternative to the bare setup.py script when installing a package is to use pip. Since it is a tool that is recommended by PyPA, it should be used even when installing a package in the local environment just for development purposes. In order to install a package from local sources, run the following command:

Uninstalling Packages

Amazingly, setuptools and distutils lack the uninstall command. Fortunately, it is possible to uninstall any Python package using pip as follows:

Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.

setup.py develop or pip -e

Packages installed with setup.py install are copied to the site-packages directory of your current Python environment. This means that whenever any changes are made to the sources of that package, reinstalling it would be required. This is often a problem during intensive development because it is very easy to forget about the need to perform the installation again. This is why setuptools provide an extra develop command that allows you to install packages in the development mode. This command creates a special link to project sources in the deployment directory (site-packages) instead of copying the whole package there. Package sources can be edited without the need for reinstallation and are available in the sys.path as if they were installed normally.

pip also allows you to install packages in such a mode. This installation option is called editable mode and can be enabled with the -e parameter in the install command as follows:

Once you install the package in your environment in editable mode, you can freely modify the installed package in place and all the changes will be immediately visible without the need to reinstall the package.

In this article, we summarized how to create a package and understood a common pattern for all packages that describes similarities between Python packages and how distutils and setuptools play a central role in the packaging process. If you found this useful and wish you explore it further, ‘Expert Python Programming – Third Edition’ might appear to be helpful. This book primarily takes you through the new features in Python 3.7. With this, you will be able to advanced components of Python syntax and much more. By the end, you should expect to become an expert in writing efficient and maintainable Python code.


Other Tutorials (Sponsors)

This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!


Leave a Reply

Your email address will not be published. Required fields are marked *