Pickling Objects in Python

(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!

In the post Reading and Writing JSON in Python we have seen how to work JSON data in Python. If you haven’t gone through this post, I suggest you to do so and then come back here.

It turns out that the json module is not the only way to serialize data. Python provides another module called pickle to serialize and deserialize data.

Here are the main differences between the json and pickle module.

  1. The pickle module is Python-specific which means that once the object is serialized you can’t deserialize it using another language like PHP, Java, Perl etc. If interoperability is what you need stick to the json module.

  2. Unlike json module which serializes objects as human-readable JSON string, the pickle module serializes data in the binary format.

  3. The json module allows us to serialize only the basic Python types (like int, str, dict, list etc.). If you need to serialize custom objects you would have to supply your own serialization function. However, the pickle module works with a wide variety of Python types right out of the box, including the custom objects you define.

  4. Most of the pickle module is coded in C. So it provides a great performance boost while handling large data sets as compared to the json module.

The interface provided by the pickle module is same as than of json module and consists of dump()/ load() and dumps()/ loads() functions.

To use the pickle module import it as follows:

Let’s now see how we can serialize and deserialize objects using the pickle module.

Note: Serialization and Deserialization is also sometimes known as Pickling and Unpickling respectively.

Pickling with dump()

Pickling data is done via the dump() function. It accepts data and a file object. The dump() function then serializes the data and writes it to the file. The syntax of dump() is as follows:

Syntax: dump(obj, file)

objObject to be pickled.
fileFile object where pickled data will be written.

Here is an example:

There are two things to notice here:

  1. First, we have opened the file in binary mode instead of text mode. This is necessary otherwise the data will get corrupt while writing.
  2. Second, the dump() function is able to serialize the datetime.datetime object without supplying any custom serialization function.

Obviously, we are not just limited to datetime.datetime objects. To give you an example, the following listing serializes some other types available in Python.

We have now pickled some data. At this point, if you try to read from the file you will get the data as bytes object.

That’s not very readable. right?

To restore the picked objects we use the load() function

Unpickling with load()

The load() function takes a file object, reconstruct the objects from the pickled representation, and returns it.

Its syntax is as follows:

filefile object from where the serialized data will be read.

Let’s now try reading the my_pickle file we created earlier in this post.

Notice that objects are returned in the same order in which we have pickled them in the first place. Also, notice that the file is opened in binary mode for reading. When there is no more data to return, the load() function throws EOFError.

Similarly, we can read the pickled data from the other_pickles file.

Once you have unpickled the data you can use it like an ordinary Python object.

Pickling and Unpickling with dumps() and loads()

The dumps() works exactly like dump() but instead of sending the output to a file, it returns the pickled data as a string. Its syntax is as follows:

Syntax: dumps(obj) -> pickled_data

objObject to be serialized

Similarly, the loads() function is same as load(), but instead of reading pickled data from a file, it reads from a string. Its syntax is as follows:

Syntax: loads(pickled_data) -> obj

pickled_dataPickled data

Here is an example:

Keep in mind that, when you unpickle data, objects spring into life, so never try to process pickled data from untrusted sources. A malicious user can use such technique to execute arbitrary commands on the system.

Other Tutorials (Sponsors)

This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!

Leave a Reply

Your email address will not be published. Required fields are marked *