(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!
Reading and Writing JSON in Python
Updated on Jan 07, 2020
JSON (JavaScript Object Notation) is language-neutral data interchange format. It was created and popularized by Douglas Crockford. In its short history, JSON has become a defacto standard for data transfer across the web.
JSON is a text-based format which is derived from JavaScript object syntax. However, it is completely independent of JavaScript, so you don't need to know any JavaScript to use JSON.
JSON is commonly used by web applications to transfer data between client and server. If you are using a web service then there are good chances that data will be returned to you in JSON format, by default.
Before the inception of JSON, XML was predominantly used to send and receive data between the client and the server. The problem with XML is that that it is verbose, heavy and not easy to parse. However, this is not the case with JSON, as you will see soon.
The following is an example of an XML document describing a person.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | <?xml version="1.0" encoding="UTF-8" ?>
<root>
<firstName>John</firstName>
<lastName>Smith</lastName>
<isAlive>true</isAlive>
<age>27</age>
<address>
<streetAddress>21 2nd Street</streetAddress>
<city>New York</city>
<state>NY</state>
<postalCode>10021-3100</postalCode>
</address>
<phoneNumbers>
<type>home</type>
<number>212 555-1234</number>
</phoneNumbers>
<phoneNumbers>
<type>office</type>
<number>646 555-4567</number>
</phoneNumbers>
<phoneNumbers>
<type>mobile</type>
<number>123 456-7890</number>
</phoneNumbers>
<spouse />
</root>
|
The same information can be represented using JSON as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | {
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
"children": [],
"spouse": null
}
|
I am sure you will agree that the JSON counterpart is much easier to read and write.
Also, notice that the JSON format closely resembles to dictionaries in Python.
Serialization and Deserialization #
Serialization: The process of converting an object into a special format which is suitable for transmitting over the network or storing in file or database is called Serialization.
Deserialization: It is the reverse of serialization. It converts the special format returned by the serialization back into a usable object.
In the case of JSON, when we serializing objects, we essentially convert a Python object into a JSON string and deserialization builds up the Python object from its JSON string representation.
Python provides a built-in module called json
for serializing and deserializing objects. To use json
module import it as follows:
1 2 3 | >>>
>>> import json
>>>
|
The json
module mainly provides the following functions for serializing and deserializing.
dump(obj, fileobj)
dumps(obj)
load(fileobj)
loads(s)
Let's start with the dump()
function.
Serializing with dump() #
The dump()
function is used to serialize data. It takes a Python object, serializes it and writes the output (which is a JSON string) to a file like object.
The syntax of dump()
function is as follows:
Syntax: dump(obj, fp)
Argument | Description |
---|---|
obj |
Object to be serialized. |
fp |
A file-like object where the serialized data will be written. |
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | >>>
>>> import json
>>>
>>> person = {
... 'first_name': "John",
... "isAlive": True,
... "age": 27,
... "address": {
... "streetAddress": "21 2nd Street",
... "city": "New York",
... "state": "NY",
... "postalCode": "10021-3100"
... },
... "hasMortgage": None
... }
>>>
>>>
>>> with open('person.json', 'w') as f: # writing JSON object
... json.dump(person, f)
...
>>>
>>>
>>> open('person.json', 'r').read() # reading JSON object as string
'{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}'
>>>
>>>
>>> type(open('person.json', 'r').read())
<class 'str'>
>>>
>>>
|
Notice that while serializing the object, the Python's type None
is converted to JSON's null
type.
The following table lists the conversion happens between types when we serialize the data.
Python Type | JSON Type |
---|---|
dict |
object |
list , tuple |
array |
int |
number |
float |
number |
str |
string |
True |
true |
False |
false |
None |
null |
When we deserialize object, JSON type is converted back to its equivalent Python type. This action is depicted in the table below:
JSON Type | Python Type |
---|---|
object |
dict |
array |
list |
string |
str |
number (int) |
int |
number (real) |
float |
true |
True |
false |
False |
null |
None |
Here is another example which serializes a list of two persons:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | >>>
>>>
>>> persons = \
... [
... {
... 'first_name': "John",
... "isAlive": True,
... "age": 27,
... "address": {
... "streetAddress": "21 2nd Street",
... "city": "New York",
... "state": "NY",
... "postalCode": "10021-3100"
... },
... "hasMortgage": None,
... },
... {
... 'first_name': "Bob",
... "isAlive": True,
... "age": 32,
... "address": {
... "streetAddress": "2428 O Conner Street",
... "city": " Ocean Springs",
... "state": "Mississippi",
... "postalCode": "20031-9110"
... },
... "hasMortgage": True,
... }
...
... ]
>>>
>>> with open('person_list.json', 'w') as f:
... json.dump(persons, f)
...
>>>
>>>
>>> open('person_list.json', 'r').read()
'[{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}, {"hasMortgage": true, "isAlive": true, "age": 32, "address": {"state": "Mississippi", "streetAddress": "2428 O Conner Street", "city": " Ocean Springs", "postalCode": "20031-9110"}, "first_name": "Bob"}]'
>>>
>>>
|
Our Python objects are now serialized to the file. To deserialize it back to the Python object we use the load()
function.
Deserializing with load() #
The load()
function deserializes the JSON object from the file like object and returns it.
Its syntax is as follows:
load(fp) -> a Python object
Argument | Description |
---|---|
fp |
A file-like object from where the JSON string will be read. |
Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 | >>>
>>> with open('person.json', 'r') as f:
... person = json.load(f)
...
>>>
>>> type(person) # notice the type of data returned by load()
<class 'dict'>
>>>
>>> person
{'age': 27, 'isAlive': True, 'hasMortgage': None, 'address': {'state': 'NY', 'streetAddress': '21 2nd Street', 'city': 'New York', 'postalCode': '10021-3100'}, 'first_name': 'John'}
>>>
>>>
|
Serializing and Deserializing with dumps()
and loads()
#
The dumps()
function works exactly like dump()
but instead of sending the output to a file-like object, it returns the output as a string.
Similarly, loads()
function is as same as load()
but instead of deserializing the JSON string from a file, it deserializes from a string.
Here are some examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | >>>
>>> person = {
... 'first_name': "John",
... "isAlive": True,
... "age": 27,
... "address": {
... "streetAddress": "21 2nd Street",
... "city": "New York",
... "state": "NY",
... "postalCode": "10021-3100"
... },
... "hasMortgage": None
... }
>>>
>>> data = json.dumps(person) # serialize
>>>
>>> data
'{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}'
>>>
>>>
>>> person = json.loads(data) # deserialize from string
>>>
>>> type(person)
<class 'dict'>
>>>
>>> person
{'age': 27, 'isAlive': True, 'hasMortgage': None, 'address': {'state': 'NY', 'streetAddress': '21 2nd Street', 'city': 'New York', 'postalCode': '10021-3100'}, 'first_name': 'John'}
>>>
>>>
|
note:
Since dictionary doesn't preserve the order of elements, the order in which you get the keys may vary.
Customizing the Serializer #
The following are some optional keyword arguments that can be passed to the dumps
or dump()
function to customize the Serializer.
Argument | Description |
---|---|
indent |
A positive integer which determines the amount of indentation of key-value pairs at each level. The indent arguments come in handy to prettify the output if you have deeply nested data structures. The default value of indent is None . |
sort_keys |
A boolean flag, if set to True returns a JSON string ordered by keys, instead of being randomly ordered. Its default value is False . |
skipkeys |
JSON format expects the keys to be a string, if you try to use a type which can't be converted to a string (like tuple) then a TypeError exception will be raised. To prevent the exception from being raised and skip the non-string keys set the skipkeys argument to True . |
separators |
It refers to a tuple of the form (item_separator, key_separator) . The item_separator is a string which is used to separate items in a list. The key_separator is also a string and is used to separate keys and values in a dictionary. By default, the separators set to (',', ': ') . |
Here are some examples which demonstrates how to use these arguments in action:
Example 1: Using indent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | >>>
>>> print(json.dumps(person)) # without indent
{"age": 27, "isAlive": true, "hasMortgage": null, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}
>>>
>>>
>>> print(json.dumps(person, indent=4)) # with 4 levels of indentation
{
"age": 27,
"isAlive": true,
"hasMortgage": null,
"address": {
"state": "NY",
"streetAddress": "21 2nd Street",
"city": "New York",
"postalCode": "10021-3100"
},
"first_name": "John"
}
>>>
>>>
|
Keep in mind that increasing indentation also increases the size of the data. So, don't use indent
in the production environment.
Example 2: Using sort_keys
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | >>>
>>> print(json.dumps(person, indent=4)) # print JSON string in random order
{
"address": {
"state": "NY",
"postalCode": "10021-3100",
"city": "New York",
"streetAddress": "21 2nd Street"
},
"hasMortgage": null,
"first_name": "John",
"isAlive": true,
"age": 27
}
>>>
>>>
>>> print(json.dumps(person, indent=4, sort_keys=True)) # print JSON string in order by keys
{
"address": {
"city": "New York",
"postalCode": "10021-3100",
"state": "NY",
"streetAddress": "21 2nd Street"
},
"age": 27,
"first_name": "John",
"hasMortgage": null,
"isAlive": true
}
>>>
>>>
|
Example 3: Using skipkeys
1 2 3 4 5 6 7 8 9 | >>>
>>> data = {'one': 1, 'two': 2, (1,2): 3}
>>>
>>> json.dumps(data, indent=4)
Traceback (most recent call last):
...
TypeError: key (1, 2) is not a string
>>>
>>>
|
In this case, the key (1,2)
can't be converted to a string, so a TypeError
exception is raised. To prevent the exception from being raised and skip the non-string keys use the skipkeys
argument.
1 2 3 4 5 6 7 8 | >>>
>>> print(json.dumps(data, indent=4, skipkeys=True))
{
"two": 2,
"one": 1
}
>>>
>>>
|
Example 4: Using separators
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | >>>
>>> employee = {
... 'first_name': "Tom",
... "designation": 'CEO',
... "Salary": '2000000',
... "age": 35,
... "cars": ['chevy cavalier', 'ford taurus', 'tesla model x']
... }
>>>
>>>
>>> print(json.dumps(employee, indent=4, skipkeys=True,))
{
"designation": "CEO",
"age": 35,
"cars": [
"chevy cavalier",
"ford taurus",
"tesla model x"
],
"Salary": "2000000",
"first_name": "Tom"
}
>>>
>>>
|
There are three things to notice in the above output:
- Each key-value pair is separated using a comma (
,
). - Items in the array (like
cars
) is also separated using a comma (,
). - The keys of the JSON object are separated from values using
': '
(i.e a colon followed by a space).
The separator in the first two cases is controlled using the item_separator
string and the last one is controlled using the key_separator
. The following example changes and item_separator
and key_separator
to pipe (|
) and dash (-
) characters respectively
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | >>>
>>> print(json.dumps(employee, indent=4, skipkeys=True, separators=('|', '-')))
{
"designation"-"CEO"|
"age"-35|
"cars"-[
"chevy cavalier"|
"ford taurus"|
"tesla model x"
]|
"Salary"-"2000000"|
"first_name"-"Tom"
}
>>>
>>>
|
Now you know how separators
work, we can make the output more compact by removing the space character from the item_separator
string. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | >>>
>>> print(json.dumps(employee, indent=4, skipkeys=True, separators=(',', ':')))
{
"designation":"CEO",
"age":35,
"cars":[
"chevy cavalier",
"ford taurus",
"tesla model x"
],
"Salary":"2000000",
"first_name":"Tom"
}
>>>
>>>
|
Serializing Custom Object #
By default, json
module only allows us to serialize following basic types:
int
float
str
bool
list
tuple
dict
None
If you try to serialize or deserialize a custom object or any other built-in types, a TypeError
exception will be raised. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | >>>
>>> from datetime import datetime
>>>
>>> now = datetime.now()
>>>
>>> now
datetime.datetime(2018, 9, 28, 22, 16, 46, 16944)
>>>
>>> d = {'name': 'bob', 'dob': now}
>>>
>>> json.dumps(d)
Traceback (most recent call last):
...
TypeError: datetime.datetime(2018, 9, 28, 22, 7, 0, 622242) is not JSON serializable
>>>
>>>
>>>
>>>
>>> class Employee:
...
... def __init__(self, name):
... self.name = name
...
>>>
>>> e = Employee('John')
>>>
>>> e
<__main__.Employee object at 0x7f20c82ee4e0>
>>>
>>>
>>> json.dumps(e)
Traceback (most recent call last):
...
TypeError: <__main__.Employee object at 0x7f20c82ee4e0> is not JSON serializable
>>>
>>>
|
To serialize custom objects or built-in types, we have to create our own serialization function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def serialize_objects(obj):
# serialize datetime object
if isinstance(obj, datetime):
return {
'__class__': datetime.__name__,
'__value__': str(obj)
}
# serialize Employee object
#
# if isinstance(obj, Employee):
# return {
# '__class__': 'Employee',
# '__value__': obj.name
# }
raise TypeError(str(obj) + ' is not JSON serializable')
|
Here are a few things to notice about the function.
The function takes a single argument named
obj
.In line 5, we check the type of the object using the
isinstance()
function. Checking the type is not strictly necessary if your function is serializing only a single type, but makes it easy to add serialization for other types.In line 6-9, we create a dictionary with two keys:
__class__
and__value__
. The__class__
key stores the original name of the class and will be used to deserialize the data. The__value__
key stores the value of the object, in this case, we are simply convertingdatetime.datetime
object to its string representation using the built-instr()
function.In line 18, we raise
TypeError
exception. This is necessary otherwise our serialization function wouldn't report errors for objects it can't serialize.
Our serialization function is now ready to serialize datetime.datetime
objects.
The next question is - how do we pass our custom serialization function to dumps()
or dump()
.
We can pass custom serialization function to dumps()
or dump()
using the default
keyword argument. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | >>>
>>> def serialize_objects(obj):
... if isinstance(obj, datetime):
... return {
... '__class__': datetime.__name__,
... '__value__': str(obj)
... }
... raise TypeError(str(obj) + ' is not JSON serializable')
...
>>>
>>> employee = {
... 'first_name': "Mike",
... "designation": 'Manager',
... "doj": datetime(year=2016, month=5, day=2), # date of joining
... }
>>>
>>>
>>> emp_json = json.dumps(employee, indent=4, default=serialize_objects)
>>>
>>>
>>> print(emp_json)
{
"designation": "Manager",
"doj": {
"__value__": "2016-05-02 00:00:00",
"__class__": "datetime"
},
"first_name": "Mike"
}
>>>
>>>
|
Notice how datetime.datetime
object is serialized as a dictionary with two keys.
It is important to note that the serialize_objects()
function will only be called to serialize objects which are not one of the basic types in Python.
We have now successfully serialized datetime.datetime
object. Let's see what would happen if we try to deserialize it.
1 2 3 4 5 6 7 8 9 10 11 12 13 | >>>
>>> emp_dict = json.loads(emp_json)
>>>
>>> type(emp_dict)
<class 'dict'>
>>>
>>> emp_dict
{'designation': 'Manager', 'doj': {'__value__': '2016-05-02 00:00:00', '__class__': 'datetime'}, 'first_name': 'Mike'}
>>>
>>> emp_dict['doj']
{'__value__': '2016-05-02 00:00:00', '__class__': 'datetime'}
>>>
>>>
|
Notice that the value of doj
key is returned as a dictionary instead of datetime.datetime
object.
This happens because loads()
function doesn't know anything about the serialize_objects()
function which serializes the datetime.datetime
object in the first place.
What we need is the opposite of the serialize_objects()
function - A function that takes a dictionary object, check the existence of __class__
key and build the datetime.datetime
object from the string representation stored in the __value__
key.
1 2 3 4 5 6 7 8 9 | def deserialize_objects(obj):
if '__class__' in obj:
if obj['__class__'] == 'datetime':
return datetime.strptime(obj['__value__'], "%Y-%m-%d %H:%M:%S")
# if obj['__class__'] == 'Employee':
# return Employee(obj['__value__'])
return obj
|
The only thing to notice here is that we are using datetime.strptime
function to convert datetime string into a datetime.datetime
object.
To pass our custom deserialization function to the loads()
method we use the object_hook
keyword argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | >>>
>>> def deserialize_objects(obj):
... if '__class__' in obj:
... if obj['__class__'] == 'datetime':
... return datetime.strptime(obj['__value__'], "%Y-%m-%d %H:%M:%S")
... # if obj['__class__'] == 'Employee':
... # return Employee(obj['__value__'])
... return obj
...
>>>
>>>
>>> emp_dict = json.loads(emp_json, object_hook=deserialize_objects)
>>>
>>> emp_dict
{'designation': 'Manager', 'doj': datetime.datetime(2016, 5, 2, 0, 0), 'first_name': 'Mike'}
>>>
>>> emp_dict['doj']
datetime.datetime(2016, 5, 2, 0, 0)
>>>
>>>
|
As expected, this time the value of doj
key is a datetime.datetime
object instead of a dictionary.
Other Tutorials (Sponsors)
This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!
View Comments