(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!
Writing Packages in Python
Updated on Jan 07, 2020
A package is basically a collection of Python modules. Packages are a way of structuring both, multiple packages as well as modules which eventually leads to a well-organized hierarchy of data set, making the directories and modules easy to access. This article focuses on the process of writing and releasing Python packages. Here, we will see how to decrease the time required setting up everything before starting the real work. Along with that, we will also explore how to provide a standardised way to write packages and ease the use of test-driven development approach.
Technical Requirements: #
Before delving into the actual process, let us first download the code file that we will be using in this article. It can be downloaded from (https://github.com/PacktPublishing/Expert-Python-Programming-Third-Edition/tree/master/chapter7).
Python packages mentioned in this article can be downloaded from PyPi, and are as follows:
- twine
- wheel
- cx_Freeze
- py2exe
- pyinstaller
You can install these packages using the following command:
python3 -m pip install <package-name>
Creating a Package #
Python packaging can be a bit overwhelming at first. The main reason behind that is the confusion regarding proper tools for creating Python packages. But once the first package is created, it won’t find it as hard as it looks. Also, knowing proper, state-of-the-art packaging tools helps a lot.
You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own packages will give you more insight into the packaging ecosystem and will help you to work with third-party code that is available on PyPI that we are probably already using.
Also, having your closed source project or its components available as source distribution packages can help in deploying code in different environments. Here, we will be focusing on proper tools and techniques to create such distributions.
The Confusing State of Python Packaging Tool: #
The state of Python packaging was very confusing for a long time. Everything started with the Distutils package introduced in 1998, which was later enhanced by Setuptools in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to (once and for all) fix the Python packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools or distutils only added to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (such as to distribute which was a fork of setuptools) but some were left abandoned (such as distutils2).
Fortunately, this state is gradually changing. An organization called the Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. The Python Packaging User Guide, maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. This guide also contains a detailed history of changes and new projects related to packaging. So it is worth reading it, even if you already know a bit about packaging, to make sure you still use the proper tools.
Let's take a look at the effect of PyPA on Python packaging.
The Current Landscape of Python Packaging #
PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and a standardization process for new official aspects of Python packaging. All of PyPA's projects can be found under a single organization on GitHub: https://github.com/pypa.
The following are the most notable ones:
- pip
- virtualenv
- twine
- warehouse
Note that most of them were started outside of this organization and were moved under PyPA patronage when they become mature and widespread solutions.
Thanks to PyPA engagement, the progressive abandonment of the eggs format in favour of wheels for built distributions has already happened. Also thanks to the commitment of the PyPA community, the old PyPI implementation was finally totally rewritten in the form of the Warehouse project. Now, PyPI has got a modernized user interface and many long-awaited usability improvements and features.
Tool Recommendations #
The Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into the following two groups:
- Tools for installing packages
- Tools for package creation and distribution
Utilities recommended by PyPA:
- Use
pip
for installing packages from PyPI. - Use
virtualenv
orvenv
for application-level isolation of the Python runtime environment.
The Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:
- Use
setuptools
to define projects and create source distributions. - Use wheels in favour of eggs to create built distributions.
- Use
twine
to upload package distributions to PyPI.
Project Configuration #
The easiest way to organize the code of big applications is to split them into several packages. This makes the code simpler, easier to understand, maintain, and change. It also maximizes the reusability of your code. Separate packages act as components that can be used in various programs.
setup.py
The root directory of a package that has to be distributed contains a setup.py
script. It defines all metadata as described in the distutils
module. Package metadata is expressed as arguments in a call to the standard setup()
function. Despite distutils
being the standard library module provided for the purpose of code packaging, it is actually recommended to use the setuptools instead. The
setuptools package provides several enhancements over the standard
distutils` module.
Therefore, the minimum content for this file is as follows:
1 2 3 4 5 | from setuptools import setup
setup(
name='mypackage',
)
|
name
gives the full name of the package. From there, the script provides several commands that can be listed with the --help-commands
option, as shown in the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | $ python3 setup.py --help-commands
Standard commands:
build build everything needed to install
clean clean up temporary files from 'build' command
install install everything from build directory
sdist create a source distribution (tarball, zip file, etc.)
registerregister the distribution with the Python package index
bdist create a built (binary) distribution
check perform some checks on the package
uploadupload binary package to PyPI
Extra commands:
bdist_wheel create a wheel distribution
alias define a shortcut to invoke one or more commands
develop install package in 'development mode'
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
|
The actual list of commands is longer and can vary depending on the available setuptools
extensions. It was truncated to show only those that are most important and relevant to this article. Standard commands are the built-in commands provided by distutils
, whereas extra commands are the ones provided by third-party packages, such as setuptools
or any other package that defines and registers a new command. Here, one such extra command registered by another package is bdist_wheel
, provided by the wheel
package.
setup.cfg #
The setup.cfg
file contains default options for commands of the setup.py
script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the setup.py
script commands. This `setup.cfg file allows you to store such default parameters together with your source code on a per project basis. This will make your distribution flow independent from the project and also provides transparency about how your package was built/distributed to the users and other team members.
The syntax for the setup.cfg
file is the same as provided by the built-in configparser
module so it is similar to the popular Microsoft Windows INI files. Here is an example of the setup.cfg
configuration file that provides some global
, sdist
, and bdist_wheel
commands' defaults:
1 2 3 4 5 6 7 8 | [global]
quiet=1
[sdist]
formats=zip,tar
[bdist_wheel]
universal=1
|
This example configuration will ensure that source distributions (sdist
section) will always be created in two formats (ZIP and TAR) and the built wheel
distributions (bdist_wheel
section) will be created as universal wheels that are independent from the Python version. Also most of the output will be suppressed on every command by the global --quiet
switch. Note that this option is included here only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.
MANIFEST.in #
When building a distribution with the sdist
command, the distutils
module browses the package directory looking for files to include in the archive. By default distutils
will include the following:
- All Python source files implied by the
py_modules
,packages
, andscripts
arguments - All C source files listed in the
ext_modules
argument - Files that match the glob pattern
test/test*.py
- Files named
README
,README.txt
,setup.py
, andsetup.cfg
Besides that, if your package is versioned with a version control system such as Subversion, Mercurial, or Git, there is the possibility to auto-include all version controlled files using additional setuptools
extensions such as setuptools-svn, setuptools-hg , and setuptools-git
. Integration with other version control systems is also possible through other custom extensions. No matter if it is the default built-in collection strategy or one defined by custom extension, the sdist
will create a MANIFEST
file that lists all files and will include them in the final archive.
Let's say you are not using any extra extensions, and you need to include in your package distribution some files that are not captured by default. You can define a template called MANIFEST.in
in your package root directory (the same directory as setup.py file). This template directs the sdist
command on which files to include.
This MANIFEST.in
template defines one inclusion or exclusion rule per line:
1 2 3 4 5 6 | include HISTORY.txt
include README.txt
include CHANGES.txt
include CONTRIBUTORS.txt
include LICENSE
recursive-include *.txt *.py
|
The full list of MANIFEST.in
commands can be found in the official distutils
documentation.
Most Important Metadata #
Besides the name and the version of the package being distributed, the most important arguments that the setup()
function can receive are as follows:
description
: This includes a few sentences to describe the package.long_description
: This includes a full description that can be inreStructuredText
(default) or other supported mark-up languages.long_description_content_type
: this defines MIME type of long description; it is used to tell the package repository what kind of mark-up language is used for the package description.keywords
: This is a list of keywords that define the package and allow for better indexing in the package repository.author
: This is the name of the package author or organization that takes care of it.author_email
: This is the contact email address.url
: This is the URL of the project.license
: This is the name of the license (GPL, LGPL, and so on) under which the package is distributed.packages
: This is a list of all package names in the package distribution;setuptools
provides a small function calledfind_packages
that can automatically find package names to include.namespace_packages
: This is a list of namespace packages within package distribution.
Trove Classifiers #
PyPI and distutils
provide a solution for categorizing
applications with the set of classifiers called trove classifiers. All trove classifiers form a tree-like structure. Each classifier string defines a list of nested namespaces where every namespace is separated by the :: substring. Their list is provided to the package definition as a classifiers
argument of the setup()
function.
Here is an example list of classifiers taken from solrq project available on PyPI:
1 2 3 4 5 6 7 8 9 10 11 12 | from setuptools import setup
setup(
name="solrq",
# (...)
classifiers=[
'Development Status :: 4 - Beta',
'Intended Audience :: Developers',
'License :: OSI Approved :: BSD License',
'Operating System :: OS Independent',
'Programming Language :: Python',
|
Trove classifiers are completely optional in the package definition but provide a useful extension to the basic metadata available in the setup()
interface. Among others, trove classifiers may provide information about supported Python versions, supported operating systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.
Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.
Currently, there are 667 classifiers available on PyPI that are grouped into the following nine major categories:
- Development status
- Environment
- Framework
- Intended audience
- License
- Natural language
- Operating system
- Programming language
- Topic
This list is ever-growing, and new classifiers are added from time to time. It is thus possible that the total count of them will be different at the time you read this. The full list of currently available trove classifiers is available here.
Common patterns #
Creating a package for distribution can be a tedious task for inexperienced developers. Most of the metadata that setuptools
or distuitls
accept in their setup()
function call can be provided manually ignoring the fact that this metadata may be also available in other parts of the project. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | from setuptools import setup
setup(
name="myproject",
version="0.0.1",
description="mypackage project short description",
long_description="""
Longer description of mypackage project
possibly with some documentation and/or
usage examples
""",
install_requires=[
'dependency1',
'dependency2',
'etc',
]
)
|
Some of the metadata elements are often found in different places in a typical Python project. For instance, content of long description is commonly included in the project's README file, and it is a good convention to put a version specifier in the __init__
module of the package. Hard coding such package metadata as setup()
function arguments redundancy to the project that allows for easy mistakes and inconsistencies in future. Both setuptools
and distutils
cannot automatically pick metadata information from the project sources, so you need to provide it yourself. There are some common patterns among the Python community for solving the most popular problems such as dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.
Automated Inclusion of Version String from Package #
The PEP 440 Version Identification and Dependency Specification document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and defines how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then you should definitely read this document carefully.
If you are using a simple scheme that consists just of one, two, three, or more numbers separated by dots, then you don't have to dig into the details of PEP 440. If you don't know how to choose the proper versioning scheme, it is hugely recommended to follow the semantic versioning scheme which can be referred from here.
The other problem related to code versioning is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers) that deals exactly with this problem. PEP 396 is only an informational document and has a deferred status, so it is not a part of the official Python standards track. Anyway, it describes what seems to be a de facto standard now.
According to PEP 396, if a package or module has a specific version defined, the version specifier should be included as a __version__
attribute of package root __init__.py
INI file or distributed module file. Another de facto standard is to also include the VERSION
attribute that contains the tuple of the version specifier parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.
Many packages available on PyPI follow both conventions. Their __init__.py
files contain version attributes that look like the following:
1 2 3 4 | # version as tuple for simple comparisons
VERSION = (0, 1, 1)
# string created from tuple to avoid inconsistency
__version__ = ".".join([str(x) for x in VERSION])
|
The other suggestion of PEP 396 is that the version argument provided in the setup()
function of the setup.py
script should be derived from __version__
or the other way around. The Python Packaging User Guide features multiple patterns for single-sourcing project versioning, and each of them has its own advantages and limitations. One such pattern which is rather long, but has the advantage of limiting the complexity only to the setup.py
script is not included in PyPA’s guide. This boilerplate assumes that the version specifier is provided by the VERSION
attribute of the package's __init__
module and extracts this data for inclusion in the setup()
call. Here is an excerpt from some imaginary package's setup.py
script that illustrates this approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | from setuptools import setup
import os
def get_version(version_tuple):
# additional handling of a,b,rc tags, this can
# be simpler depending on your versioning scheme
if not isinstance(version_tuple[-1], int):
return '.'.join(
map(str, version_tuple[:-1])
) + version_tuple[-1]
return '.'.join(map(str, version_tuple))
# path to the packages __init__ module in project
# source tree
init = os.path.join(
os.path.dirname(__file__), 'src', 'some_package',
'__init__.py'
)
version_line = list(
filter(lambda l: l.startswith('VERSION'), open(init))
)[0]
# VERSION is a tuple so we need to eval 'version_line'.
# We could simply import it from the package but we
# cannot be sure that this package is importable before
# installation is done.
PKG_VERSION = get_version(eval(version_line.split('=')[-1]))
setup(
name='some-package',
version=PKG_VERSION,
# ...
)
|
README file: #
The Python Package Index can display the project's README file or the value of long_description on the package page in the PyPI portal. PyPI is able to interpret the mark-up used in the long_description
content and render it as HTML on the package page. The type of mark-up language is controlled through the long_description_content_type
argument of the setup()
call. For now, there are the following three choices for mark-up available:
- Plain text with
long_description_content_type='text/plain'
- reStructuredText with
long_description_content_type='text/x-rst'
- Markdown with
long_description_content_type='text/markdown'
Markdown and reStructuredText are the most popular choices among Python developers, but some might still want to use different mark-up languages for various reasons. If you want to use something different as your mark-up language for your project's README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the pypandoc
package to translate your other mark-up language into reStructuredText (or Markdown) while uploading the package to the Python Package Index. It is important to do it with a fallback to plain content of your README
file, so the installation won't fail if the user has no pypandoc
installed. The following is an example of a setup.py script that is able to read the content of the README
file written in AsciiDoc mark-up language and translate it to reStructuredText before including a long_description
argument:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from setuptools import setup
try:
from pypandoc import convert
def read_md(file_path):
return convert(file_path, to='rst', format='asciidoc')
except ImportError:
convert = None
print(
"warning: pypandoc module not found, "
"could not convert Asciidoc to RST"
)
def read_md(file_path):
with open(file_path, 'r') as f:
return f.read()
README = os.path.join(os.path.dirname(__file__), 'README')
setup(
name='some-package',
long_description=read_md(README),
long_description_content_type='text/x-rst',
# ...
)
|
Managing Dependencies #
Many projects require some external packages to be installed in order to work properly. When the list of dependencies is very long, it becomes difficult to manage it. To make it easier, do not over-engineer it. Keep it simple and provide the list of dependencies explicitly in your setup.py
script as follows:
1 2 3 4 5 6 | from setuptools import setup
setup(
name='some-package',
install_requires=['falcon', 'requests', 'delorean']
# ...
)
|
Some Python developers like to use requirements.txt
files for tracking lists of dependencies for their packages. In some situations, you might find some reason for doing that, but in most cases, this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you want to stick to your habit or are somehow forced to use requirement files, then it is important to do it properly. Here is one of the popular idioms for reading the list of dependencies from the requirements.txt
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | from setuptools import setup
import os
def strip_comments(l):
return l.split('#', 1)[0].strip()
def reqs(*f):
return list(filter(None, [strip_comments(l) for l in open(
os.path.join(os.getcwd(), *f)).readlines()]))
setup(
name='some-package',
install_requires=reqs('requirements.txt')
# ...
)
|
The Custom Setup Command #
distutils
allows you to create new commands. A new command can be registered with an entry point, which was introduced by setuptools
as a simple way to define packages as plugins.
An entry point is a named link to a class or a function that is made available through some APIs in setuptools
. Any application can scan for all registered packages and use the linked code as a plugin.
To link the new command, the entry_points
metadata can be used in the setup call as follows:
1 2 3 4 5 6 7 | setup(
name="my.command",
entry_points="""
[distutils.commands]
my_command = my.command.module.Class
"""
)
|
All named links are gathered in named sections. When distutils
is loaded, it scans for links that were registered under distutils.commands
. This mechanism is used by numerous Python applications that provide extensibility.
Working with Packages during Development #
Working with setuptools
is mostly about building and distributing packages. However, setuptools
is still used to install packages directly from project sources. And the reason for that is to test if our packaging code works properly before submitting the package to PyPI. And the simplest way to test it is by installing it. If you send a broken package to the repository, then in order to re-upload it, you need to increase the version number.
Testing the package of your code properly before the final distribution saves you from unnecessary version number inflation and obviously from wasting your time. Also, installation directly from your own sources using setuptools
may be essential when working on multiple related packages at the same time.
setup.py install #
The install
command installs the package in your current Python environment. It will try to build the package if no previous build was made and then inject the result into the filesystem directory where Python is looking for installed packages. If you have an archive with a source distribution of some package, you can decompress it in a temporary folder and then install it with this command. The install
command will also install dependencies that are defined in the install_requires
argument. Dependencies will be installed from the Python Package Index.
An alternative to the bare setup.py
script when installing a package is to use pip. Since it is a tool that is recommended by PyPA, it should be used even when installing a package in the local environment just for development purposes. In order to install a package from local sources, run the following command:
pip install <project-path>
Uninstalling Packages #
Amazingly, setuptools
and distutils
lack the uninstall
command. Fortunately, it is possible to uninstall any Python package using pip
as follows:
pip uninstall <package-name>
Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.
setup.py develop or pip -e #
Packages installed with setup.py install
are copied to the site-packages
directory of your current Python environment. This means that whenever any changes are made to the sources of that package, reinstalling it would be required. This is often a problem during intensive development because it is very easy to forget about the need to perform the installation again. This is why setuptools
provide an extra develop
command that allows you to install packages in the development mode. This command creates a special link to project sources in the deployment directory (site-packages
) instead of copying the whole package there. Package sources can be edited without the need for reinstallation and are available in the sys.path
as if they were installed normally.
pip
also allows you to install packages in such a mode. This installation option is called editable mode and can be enabled with the -e
parameter in the install
command as follows:
pip install -e <project-path>
Once you install the package in your environment in editable mode, you can freely modify the installed package in place and all the changes will be immediately visible without the need to reinstall the package.
In this article, we summarized how to create a package and understood a common pattern for all packages that describes similarities between Python packages and how distutils and setuptools play a central role in the packaging process. If you found this useful and wish you explore it further, ‘Expert Python Programming – Third Edition’ might appear to be helpful. This book primarily takes you through the new features in Python 3.7. With this, you will be able to advanced components of Python syntax and much more. By the end, you should expect to become an expert in writing efficient and maintainable Python code.
Other Tutorials (Sponsors)
This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!
View Comments