(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!
Pickling Objects in Python
Updated on Jan 07, 2020
In the post Reading and Writing JSON in Python we have seen how to work JSON data in Python. If you haven't gone through this post, I suggest you to do so and then come back here.
It turns out that the json
module is not the only way to serialize data. Python provides another module called pickle
to serialize and deserialize data.
Here are the main differences between the json
and pickle
module.
The
pickle
module is Python-specific which means that once the object is serialized you can't deserialize it using another language like PHP, Java, Perl etc. If interoperability is what you need stick to thejson
module.Unlike
json
module which serializes objects as human-readable JSON string, thepickle
module serializes data in the binary format.The
json
module allows us to serialize only the basic Python types (likeint
,str
,dict
,list
etc.). If you need to serialize custom objects you would have to supply your own serialization function. However, thepickle
module works with a wide variety of Python types right out of the box, including the custom objects you define.Most of the
pickle
module is coded in C. So it provides a great performance boost while handling large data sets as compared to thejson
module.
The interface provided by the pickle
module is same as than of json
module and consists of dump()
/load()
and dumps()
/loads()
functions.
To use the pickle
module import it as follows:
1 2 3 | >>>
>>> import pickle
>>>
|
Let's now see how we can serialize and deserialize objects using the pickle
module.
note:
Serialization and Deserialization is also sometimes known as Pickling and Unpickling respectively.
Pickling with dump() #
Pickling data is done via the dump()
function. It accepts data and a file object. The dump()
function then serializes the data and writes it to the file. The syntax of dump()
is as follows:
Syntax: dump(obj, file)
Argument | Description |
---|---|
obj |
Object to be pickled. |
file |
File object where pickled data will be written. |
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>>
>>> import pickle
>>>
>>> from datetime import datetime
>>>
>>>
>>> f = open("my_pickle", "wb") # remember to open the file in binary mode
>>>
>>> pickle.dump(10, f)
>>> pickle.dump("a string", f)
>>> pickle.dump({'a': 1, 'b': 2}, f)
>>> pickle.dump(datetime.now(), f) # serialize datetime.datetime object
>>>
>>> f.close()
>>>
>>>
|
There are two things to notice here:
- First, we have opened the file in binary mode instead of text mode. This is necessary otherwise the data will get corrupt while writing.
- Second, the
dump()
function is able to serialize thedatetime.datetime
object without supplying any custom serialization function.
Obviously, we are not just limited to datetime.datetime
objects. To give you an example, the following listing serializes some other types available in Python.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | >>>
>>> class My_class:
... def __init__(self, name):
... self.name = name
...
>>>
>>>
>>> def func(): return "func() called"
...
>>>
>>>
>>> f = open("other_pickles", "wb")
>>>
>>> pickle.dump(My_class, f) # serialize class object
>>>
>>> pickle.dump(2 + 3j, f) # serialize complex number
>>>
>>> pickle.dump(func, f) # serialize function object
>>>
>>> pickle.dump(bytes([1, 2, 3, 4, 5]), f) # serialize bytes object
>>>
>>> pickle.dump(My_class("name"), f) # serialize class instance
>>>
>>> f.close()
>>>
>>>
|
We have now pickled some data. At this point, if you try to read from the file you will get the data as bytes
object.
1 2 3 4 5 6 7 8 9 | >>>
>>> open("my_pickle", "rb").read()
b'\x80\x03K\n.\x80\x03X\x08\x00\x00\x00a stringq\x00.\x80\x03}q\x00(X\x01\x00\x00\x00bq\x01K\x02X\x01\x00\x00\x00aq\x02K\x01u.\x80\x03cdatetime\ndatetime\nq\x00C\n\x07\xe2\t\x1e\x10.\x1e\r9\x92q\x01\x85q\x02Rq\x03.'
>>>
>>>
>>> open("other_pickles", "rb").read()
b'\x80\x03c__main__\nMy_Class\nq\x00.\x80\x03cbuiltins\ncomplex\nq\x00G@\x00\x00\x00\x00\x00\x00\x00G@\x08\x00\x00\x00\x00\x00\x00\x86q\x01Rq\x02.\x80\x03c__main__\nfunc\nq\x00.\x80\x03C\x05\x01\x02\x03\x04\x05q\x00.\x80\x03c__main__\nMy_Class\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03h\x03sb.'
>>>
>>>
|
That's not very readable. right?
To restore the picked objects we use the load()
function
Unpickling with load() #
The load()
function takes a file object, reconstruct the objects from the pickled representation, and returns it.
Its syntax is as follows:
Argument | Description |
---|---|
file |
file object from where the serialized data will be read. |
Let's now try reading the my_pickle
file we created earlier in this post.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | >>>
>>> f = open("my_pickle", "rb")
>>>
>>> pickle.load(f)
10
>>> pickle.load(f)
'a string'
>>>
>>> pickle.load(f)
{'b': 2, 'a': 1}
>>>
>>> pickle.load(f)
datetime.datetime(2018, 9, 30, 16, 46, 30, 866706)
>>>
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>>
>>> f.close()
>>>
|
Notice that objects are returned in the same order in which we have pickled them in the first place. Also, notice that the file is opened in binary mode for reading. When there is no more data to return, the load()
function throws EOFError
.
Similarly, we can read the pickled data from the other_pickles
file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | >>>
>>>
>>> f = open("other_pickles", "rb") # open the file for reading in binary mode
>>>
>>> My_class = pickle.load(f)
<class '__main__.My_class'>
>>>
>>>
>>> c = pickle.load(f)
>>>
>>> c
(2+3j)
>>>
>>>
>>> func = pickle.load(f)
>>>
>>> func
<function func at 0x7f9aa6ab6488>
>>>
>>>
>>> b = pickle.load(f)
>>>
>>> b
b'\x01\x02\x03\x04\x05'
>>>
>>>
>>> my_class_obj = pickle.load(f)
>>> my_class_obj
<__main__.My_Class object at 0x7f9aa74e61d0>
>>>
>>>
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>>
>>>
>>> f.close()
>>>
>>>
|
Once you have unpickled the data you can use it like an ordinary Python object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | >>>
>>> func()
'func() called'
>>>
>>>
>>> c.imag, c.real
(3.0, 2.0)
>>>
>>>
>>> My_class("Tom")
<__main__.My_Class object at 0x7f9aa74e6358>
>>>
>>>
>>> my_class_obj.name
'name'
>>>
>>>
|
Pickling and Unpickling with dumps() and loads() #
The dumps()
works exactly like dump()
but instead of sending the output to a file, it returns the pickled data as a string. Its syntax is as follows:
Syntax: dumps(obj) -> pickled_data
Argument | Description |
---|---|
obj |
Object to be serialized |
Similarly, the loads()
function is same as load()
, but instead of reading pickled data from a file, it reads from a string. Its syntax is as follows:
Syntax: loads(pickled_data) -> obj
Argument | Description |
---|---|
pickled_data |
Pickled data |
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | >>>
>>> employee = {
... "first_name": "Mike",
... "designation": 'Manager',
... "doj": datetime(year=2016, month=5, day=2), # date of joining
... }
>>>
>>>
>>> pickled_emp = pickle.dumps(employee) # pickle employee dictionary
>>>
>>> pickled_emp
b'\x80\x03}q\x00(X\x0b\x00\x00\x00designationq\x01X\x07\x00\x00\x00Managerq\x02X\x03\x00\x00\x00dojq\x03cdatetime\ndatetime\nq\x04C\n\x07\xe0\x05\x02\x00\x00\x00\x00\x00\x00q\x05\x85q\x06Rq\x07X\n\x00\x00\x00first_nameq\x08X\x04\x00\x00\x00Mikeq\tu.'
>>>
>>>
>>> pickle.loads(pickled_emp) # unpickle employee dictionary
{'designation': 'Manager', 'doj': datetime.datetime(2016, 5, 2, 0, 0), 'first_name': 'Mike'}
>>>
>>>
|
Keep in mind that, when you unpickle data, objects spring into life, so never try to process pickled data from untrusted sources. A malicious user can use such technique to execute arbitrary commands on the system.
Other Tutorials (Sponsors)
This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!
View Comments