Pickling Objects in Python


In the post Reading and Writing JSON in Python we have seen how to work JSON data in Python. If you haven't gone through this post, I suggest you to do so and then come back here.

It turns out that the json module is not the only way to serialize data. Python provides another module called pickle to serialize and deserialize data.

Here are the main differences between the json and pickle module.

  1. The pickle module is Python-specific which means that once the object is serialized you can't deserialize it using another language like PHP, Java, Perl etc. If interoperability is what you need stick to the json module.

  2. Unlike json module which serializes objects as human-readable JSON string, the pickle module serializes data in the binary format.

  3. The json module allows us to serialize only the basic Python types (like int, str, dict, list etc.). If you need to serialize custom objects you would have to supply your own serialization function. However, the pickle module works with a wide variety of Python types right out of the box, including the custom objects you define.

  4. Most of the pickle module is coded in C. So it provides a great performance boost while handling large data sets as compared to the json module.

The interface provided by the pickle module is same as than of json module and consists of dump()/load() and dumps()/loads() functions.

To use the pickle module import it as follows:

1
2
3
>>> 
>>> import pickle
>>>

Let's now see how we can serialize and deserialize objects using the pickle module.

note:

Serialization and Deserialization is also sometimes known as Pickling and Unpickling respectively.

Pickling with dump() #


Pickling data is done via the dump() function. It accepts data and a file object. The dump() function then serializes the data and writes it to the file. The syntax of dump() is as follows:

Syntax: dump(obj, file)

Argument Description
obj Object to be pickled.
file File object where pickled data will be written.

Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
>>> 
>>> import pickle 
>>> 
>>> from datetime import datetime
>>>
>>>
>>> f = open("my_pickle", "wb") # remember to open the file in binary mode
>>> 
>>> pickle.dump(10, f)
>>> pickle.dump("a string", f)
>>> pickle.dump({'a': 1, 'b': 2}, f)
>>> pickle.dump(datetime.now(), f) # serialize datetime.datetime object
>>> 
>>> f.close()
>>> 
>>>

There are two things to notice here:

  1. First, we have opened the file in binary mode instead of text mode. This is necessary otherwise the data will get corrupt while writing.
  2. Second, the dump() function is able to serialize the datetime.datetime object without supplying any custom serialization function.

Obviously, we are not just limited to datetime.datetime objects. To give you an example, the following listing serializes some other types available in Python.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
>>> 
>>> class My_class:
...     def __init__(self, name):
...         self.name = name
... 
>>> 
>>>
>>> def func(): return "func() called"
... 
>>> 
>>>
>>> f = open("other_pickles", "wb")
>>> 
>>> pickle.dump(My_class, f) # serialize class object
>>> 
>>> pickle.dump(2 + 3j, f) # serialize complex number
>>> 
>>> pickle.dump(func, f) # serialize function object
>>> 
>>> pickle.dump(bytes([1, 2, 3, 4, 5]), f) # serialize bytes object
>>> 
>>> pickle.dump(My_class("name"), f) # serialize class instance
>>> 
>>> f.close()
>>> 
>>>

We have now pickled some data. At this point, if you try to read from the file you will get the data as bytes object.

1
2
3
4
5
6
7
8
9
>>> 
>>> open("my_pickle", "rb").read()
b'\x80\x03K\n.\x80\x03X\x08\x00\x00\x00a stringq\x00.\x80\x03}q\x00(X\x01\x00\x00\x00bq\x01K\x02X\x01\x00\x00\x00aq\x02K\x01u.\x80\x03cdatetime\ndatetime\nq\x00C\n\x07\xe2\t\x1e\x10.\x1e\r9\x92q\x01\x85q\x02Rq\x03.'
>>> 
>>> 
>>> open("other_pickles", "rb").read()
b'\x80\x03c__main__\nMy_Class\nq\x00.\x80\x03cbuiltins\ncomplex\nq\x00G@\x00\x00\x00\x00\x00\x00\x00G@\x08\x00\x00\x00\x00\x00\x00\x86q\x01Rq\x02.\x80\x03c__main__\nfunc\nq\x00.\x80\x03C\x05\x01\x02\x03\x04\x05q\x00.\x80\x03c__main__\nMy_Class\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03h\x03sb.'
>>> 
>>>

That's not very readable. right?

To restore the picked objects we use the load() function

Unpickling with load() #


The load() function takes a file object, reconstruct the objects from the pickled representation, and returns it.

Its syntax is as follows:

Argument Description
file file object from where the serialized data will be read.

Let's now try reading the my_pickle file we created earlier in this post.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
>>> 
>>> f = open("my_pickle", "rb")
>>> 
>>> pickle.load(f)
10
>>> pickle.load(f)
'a string'
>>> 
>>> pickle.load(f)
{'b': 2, 'a': 1}
>>> 
>>> pickle.load(f)
datetime.datetime(2018, 9, 30, 16, 46, 30, 866706)
>>> 
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>> 
>>> f.close()
>>>

Notice that objects are returned in the same order in which we have pickled them in the first place. Also, notice that the file is opened in binary mode for reading. When there is no more data to return, the load() function throws EOFError.

Similarly, we can read the pickled data from the other_pickles file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
>>> 
>>> 
>>> f = open("other_pickles", "rb") # open the file for reading in binary mode
>>> 
>>> My_class = pickle.load(f)
<class '__main__.My_class'>
>>> 
>>> 
>>> c = pickle.load(f)
>>>
>>> c
(2+3j)
>>> 
>>> 
>>> func = pickle.load(f)
>>>
>>> func
<function func at 0x7f9aa6ab6488>
>>> 
>>> 
>>> b = pickle.load(f)
>>> 
>>> b
b'\x01\x02\x03\x04\x05'
>>> 
>>> 
>>> my_class_obj = pickle.load(f)
>>> my_class_obj
<__main__.My_Class object at 0x7f9aa74e61d0>
>>> 
>>> 
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>> 
>>>
>>> f.close() 
>>> 
>>>

Once you have unpickled the data you can use it like an ordinary Python object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> 
>>> func()
'func() called'
>>> 
>>>
>>> c.imag, c.real
(3.0, 2.0)
>>>
>>> 
>>> My_class("Tom")
<__main__.My_Class object at 0x7f9aa74e6358>
>>> 
>>> 
>>> my_class_obj.name
'name'
>>> 
>>>

Pickling and Unpickling with dumps() and loads() #


The dumps() works exactly like dump() but instead of sending the output to a file, it returns the pickled data as a string. Its syntax is as follows:

Syntax: dumps(obj) -> pickled_data

Argument Description
obj Object to be serialized

Similarly, the loads() function is same as load(), but instead of reading pickled data from a file, it reads from a string. Its syntax is as follows:

Syntax: loads(pickled_data) -> obj

Argument Description
pickled_data Pickled data

Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
>>> 
>>> employee = {
...     "first_name": "Mike",
...     "designation": 'Manager',
...     "doj": datetime(year=2016, month=5, day=2), # date of joining
... }
>>> 
>>> 
>>> pickled_emp = pickle.dumps(employee) # pickle employee dictionary
>>> 
>>> pickled_emp
b'\x80\x03}q\x00(X\x0b\x00\x00\x00designationq\x01X\x07\x00\x00\x00Managerq\x02X\x03\x00\x00\x00dojq\x03cdatetime\ndatetime\nq\x04C\n\x07\xe0\x05\x02\x00\x00\x00\x00\x00\x00q\x05\x85q\x06Rq\x07X\n\x00\x00\x00first_nameq\x08X\x04\x00\x00\x00Mikeq\tu.'
>>> 
>>> 
>>> pickle.loads(pickled_emp) # unpickle employee dictionary
{'designation': 'Manager', 'doj': datetime.datetime(2016, 5, 2, 0, 0), 'first_name': 'Mike'}
>>> 
>>>

Keep in mind that, when you unpickle data, objects spring into life, so never try to process pickled data from untrusted sources. A malicious user can use such technique to execute arbitrary commands on the system.


Other Tutorials (Sponsors)