(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!
Pickling Objects in Python
Updated on Jan 07, 2020
In the post Reading and Writing JSON in Python we have seen how to work JSON data in Python. If you haven't gone through this post, I suggest you to do so and then come back here.
It turns out that the json module is not the only way to serialize data. Python provides another module called pickle to serialize and deserialize data.
Here are the main differences between the json and pickle module.
- The - picklemodule is Python-specific which means that once the object is serialized you can't deserialize it using another language like PHP, Java, Perl etc. If interoperability is what you need stick to the- jsonmodule.
- Unlike - jsonmodule which serializes objects as human-readable JSON string, the- picklemodule serializes data in the binary format.
- The - jsonmodule allows us to serialize only the basic Python types (like- int,- str,- dict,- listetc.). If you need to serialize custom objects you would have to supply your own serialization function. However, the- picklemodule works with a wide variety of Python types right out of the box, including the custom objects you define.
- Most of the - picklemodule is coded in C. So it provides a great performance boost while handling large data sets as compared to the- jsonmodule.
The interface provided by the pickle module is same as than of json module and consists of dump()/load() and dumps()/loads() functions.
To use the pickle module import it as follows:
| 1 2 3 | >>> 
>>> import pickle
>>>
 | 
Let's now see how we can serialize and deserialize objects using the pickle module.
note:
Serialization and Deserialization is also sometimes known as Pickling and Unpickling respectively.
Pickling with dump() #
Pickling data is done via the dump() function. It accepts data and a file object. The dump() function then serializes the data and writes it to the file. The syntax of dump() is as follows:
Syntax: dump(obj, file)
| Argument | Description | 
|---|---|
| obj | Object to be pickled. | 
| file | File object where pickled data will be written. | 
Here is an example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>> 
>>> import pickle 
>>> 
>>> from datetime import datetime
>>>
>>>
>>> f = open("my_pickle", "wb") # remember to open the file in binary mode
>>> 
>>> pickle.dump(10, f)
>>> pickle.dump("a string", f)
>>> pickle.dump({'a': 1, 'b': 2}, f)
>>> pickle.dump(datetime.now(), f) # serialize datetime.datetime object
>>> 
>>> f.close()
>>> 
>>>
 | 
There are two things to notice here:
- First, we have opened the file in binary mode instead of text mode. This is necessary otherwise the data will get corrupt while writing.
- Second, the dump()function is able to serialize thedatetime.datetimeobject without supplying any custom serialization function.
Obviously, we are not just limited to datetime.datetime objects. To give you an example, the following listing serializes some other types available in Python.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | >>> 
>>> class My_class:
...     def __init__(self, name):
...         self.name = name
... 
>>> 
>>>
>>> def func(): return "func() called"
... 
>>> 
>>>
>>> f = open("other_pickles", "wb")
>>> 
>>> pickle.dump(My_class, f) # serialize class object
>>> 
>>> pickle.dump(2 + 3j, f) # serialize complex number
>>> 
>>> pickle.dump(func, f) # serialize function object
>>> 
>>> pickle.dump(bytes([1, 2, 3, 4, 5]), f) # serialize bytes object
>>> 
>>> pickle.dump(My_class("name"), f) # serialize class instance
>>> 
>>> f.close()
>>> 
>>>
 | 
We have now pickled some data. At this point, if you try to read from the file you will get the data as bytes object.
| 1 2 3 4 5 6 7 8 9 | >>> 
>>> open("my_pickle", "rb").read()
b'\x80\x03K\n.\x80\x03X\x08\x00\x00\x00a stringq\x00.\x80\x03}q\x00(X\x01\x00\x00\x00bq\x01K\x02X\x01\x00\x00\x00aq\x02K\x01u.\x80\x03cdatetime\ndatetime\nq\x00C\n\x07\xe2\t\x1e\x10.\x1e\r9\x92q\x01\x85q\x02Rq\x03.'
>>> 
>>> 
>>> open("other_pickles", "rb").read()
b'\x80\x03c__main__\nMy_Class\nq\x00.\x80\x03cbuiltins\ncomplex\nq\x00G@\x00\x00\x00\x00\x00\x00\x00G@\x08\x00\x00\x00\x00\x00\x00\x86q\x01Rq\x02.\x80\x03c__main__\nfunc\nq\x00.\x80\x03C\x05\x01\x02\x03\x04\x05q\x00.\x80\x03c__main__\nMy_Class\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03h\x03sb.'
>>> 
>>>
 | 
That's not very readable. right?
To restore the picked objects we use the load() function
Unpickling with load() #
The load() function takes a file object, reconstruct the objects from the pickled representation, and returns it.
Its syntax is as follows:
| Argument | Description | 
|---|---|
| file | file object from where the serialized data will be read. | 
Let's now try reading the my_pickle file we created earlier in this post.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | >>> 
>>> f = open("my_pickle", "rb")
>>> 
>>> pickle.load(f)
10
>>> pickle.load(f)
'a string'
>>> 
>>> pickle.load(f)
{'b': 2, 'a': 1}
>>> 
>>> pickle.load(f)
datetime.datetime(2018, 9, 30, 16, 46, 30, 866706)
>>> 
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>> 
>>> f.close()
>>>
 | 
Notice that objects are returned in the same order in which we have pickled them in the first place. Also, notice that the file is opened in binary mode for reading. When there is no more data to return, the load() function throws EOFError.
Similarly, we can read the pickled data from the other_pickles file.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | >>> 
>>> 
>>> f = open("other_pickles", "rb") # open the file for reading in binary mode
>>> 
>>> My_class = pickle.load(f)
<class '__main__.My_class'>
>>> 
>>> 
>>> c = pickle.load(f)
>>>
>>> c
(2+3j)
>>> 
>>> 
>>> func = pickle.load(f)
>>>
>>> func
<function func at 0x7f9aa6ab6488>
>>> 
>>> 
>>> b = pickle.load(f)
>>> 
>>> b
b'\x01\x02\x03\x04\x05'
>>> 
>>> 
>>> my_class_obj = pickle.load(f)
>>> my_class_obj
<__main__.My_Class object at 0x7f9aa74e61d0>
>>> 
>>> 
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>> 
>>>
>>> f.close() 
>>> 
>>>
 | 
Once you have unpickled the data you can use it like an ordinary Python object.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | >>> 
>>> func()
'func() called'
>>> 
>>>
>>> c.imag, c.real
(3.0, 2.0)
>>>
>>> 
>>> My_class("Tom")
<__main__.My_Class object at 0x7f9aa74e6358>
>>> 
>>> 
>>> my_class_obj.name
'name'
>>> 
>>>
 | 
Pickling and Unpickling with dumps() and loads() #
The dumps() works exactly like dump() but instead of sending the output to a file, it returns the pickled data as a string. Its syntax is as follows:
Syntax: dumps(obj) -> pickled_data
| Argument | Description | 
|---|---|
| obj | Object to be serialized | 
Similarly, the loads() function is same as load(), but instead of reading pickled data from a file, it reads from a string. Its syntax is as follows:
Syntax: loads(pickled_data) -> obj
| Argument | Description | 
|---|---|
| pickled_data | Pickled data | 
Here is an example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | >>> 
>>> employee = {
...     "first_name": "Mike",
...     "designation": 'Manager',
...     "doj": datetime(year=2016, month=5, day=2), # date of joining
... }
>>> 
>>> 
>>> pickled_emp = pickle.dumps(employee) # pickle employee dictionary
>>> 
>>> pickled_emp
b'\x80\x03}q\x00(X\x0b\x00\x00\x00designationq\x01X\x07\x00\x00\x00Managerq\x02X\x03\x00\x00\x00dojq\x03cdatetime\ndatetime\nq\x04C\n\x07\xe0\x05\x02\x00\x00\x00\x00\x00\x00q\x05\x85q\x06Rq\x07X\n\x00\x00\x00first_nameq\x08X\x04\x00\x00\x00Mikeq\tu.'
>>> 
>>> 
>>> pickle.loads(pickled_emp) # unpickle employee dictionary
{'designation': 'Manager', 'doj': datetime.datetime(2016, 5, 2, 0, 0), 'first_name': 'Mike'}
>>> 
>>>
 | 
Keep in mind that, when you unpickle data, objects spring into life, so never try to process pickled data from untrusted sources. A malicious user can use such technique to execute arbitrary commands on the system.
Other Tutorials (Sponsors)
This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!
 
    
View Comments