Reading and Writing JSON in Python

Updated on Jan 07, 2020


JSON (JavaScript Object Notation) is language-neutral data interchange format. It was created and popularized by Douglas Crockford. In its short history, JSON has become a defacto standard for data transfer across the web.

JSON is a text-based format which is derived from JavaScript object syntax. However, it is completely independent of JavaScript, so you don't need to know any JavaScript to use JSON.

JSON is commonly used by web applications to transfer data between client and server. If you are using a web service then there are good chances that data will be returned to you in JSON format, by default.

Before the inception of JSON, XML was predominantly used to send and receive data between the client and the server. The problem with XML is that that it is verbose, heavy and not easy to parse. However, this is not the case with JSON, as you will see soon.

The following is an example of an XML document describing a person.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?xml version="1.0" encoding="UTF-8" ?>
<root>
    <firstName>John</firstName>
    <lastName>Smith</lastName>
    <isAlive>true</isAlive>
    <age>27</age>
    <address>
        <streetAddress>21 2nd Street</streetAddress>
        <city>New York</city>
        <state>NY</state>
        <postalCode>10021-3100</postalCode>
    </address>
    <phoneNumbers>
        <type>home</type>
        <number>212 555-1234</number>
    </phoneNumbers>
    <phoneNumbers>
        <type>office</type>
        <number>646 555-4567</number>
    </phoneNumbers>
    <phoneNumbers>
        <type>mobile</type>
        <number>123 456-7890</number>
    </phoneNumbers>
    <spouse />
</root>

The same information can be represented using JSON as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "firstName": "John",
    "lastName": "Smith",
    "isAlive": true,
    "age": 27,
    "address": {
    "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021-3100"
    },
    "phoneNumbers": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "office",
            "number": "646 555-4567"
        },
        {
            "type": "mobile",
            "number": "123 456-7890"
        }
    ],
    "children": [],
    "spouse": null
}

I am sure you will agree that the JSON counterpart is much easier to read and write.

Also, notice that the JSON format closely resembles to dictionaries in Python.

Serialization and Deserialization #


Serialization: The process of converting an object into a special format which is suitable for transmitting over the network or storing in file or database is called Serialization.

Deserialization: It is the reverse of serialization. It converts the special format returned by the serialization back into a usable object.

In the case of JSON, when we serializing objects, we essentially convert a Python object into a JSON string and deserialization builds up the Python object from its JSON string representation.

Python provides a built-in module called json for serializing and deserializing objects. To use json module import it as follows:

1
2
3
>>>
>>> import json
>>>

The json module mainly provides the following functions for serializing and deserializing.

  1. dump(obj, fileobj)
  2. dumps(obj)
  3. load(fileobj)
  4. loads(s)

Let's start with the dump() function.

Serializing with dump() #


The dump() function is used to serialize data. It takes a Python object, serializes it and writes the output (which is a JSON string) to a file like object.

The syntax of dump() function is as follows:

Syntax: dump(obj, fp)

Argument Description
obj Object to be serialized.
fp A file-like object where the serialized data will be written.

Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
>>> 
>>> import json
>>>
>>> person = {
...     'first_name': "John",
...     "isAlive": True,
...     "age": 27,
...     "address": {
...         "streetAddress": "21 2nd Street",
...         "city": "New York",
...         "state": "NY",
...         "postalCode": "10021-3100"
...     },
...     "hasMortgage": None
... }
>>>
>>> 
>>> with open('person.json', 'w') as f:  # writing JSON object
...     json.dump(person, f)
... 
>>> 
>>>  
>>> open('person.json', 'r').read()   # reading JSON object as string
'{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}'
>>> 
>>> 
>>> type(open('person.json', 'r').read())   
<class 'str'>
>>> 
>>>

Notice that while serializing the object, the Python's type None is converted to JSON's null type.

The following table lists the conversion happens between types when we serialize the data.

Python Type JSON Type
dict object
list, tuple array
int number
float number
str string
True true
False false
None null

When we deserialize object, JSON type is converted back to its equivalent Python type. This action is depicted in the table below:

JSON Type Python Type
object dict
array list
string str
number (int) int
number (real) float
true True
false False
null None

Here is another example which serializes a list of two persons:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
>>> 
>>> 
>>> persons = \
... [
...     {
...         'first_name': "John",
...         "isAlive": True,
...         "age": 27,
...         "address": {
...             "streetAddress": "21 2nd Street",
...             "city": "New York",
...             "state": "NY",
...             "postalCode": "10021-3100"
...         },
...         "hasMortgage": None,
...     },
...     {
...         'first_name': "Bob",
...         "isAlive": True,
...         "age": 32,
...         "address": {
...             "streetAddress": "2428 O Conner Street",
...             "city": " Ocean Springs",
...             "state": "Mississippi",
...             "postalCode": "20031-9110"
...         },
...         "hasMortgage": True,
...     }
... 
... ]
>>> 
>>> with open('person_list.json', 'w') as f:
...     json.dump(persons, f)
... 
>>> 
>>> 
>>> open('person_list.json', 'r').read()
'[{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}, {"hasMortgage": true, "isAlive": true, "age": 32, "address": {"state": "Mississippi", "streetAddress": "2428 O Conner Street", "city": " Ocean Springs", "postalCode": "20031-9110"}, "first_name": "Bob"}]'
>>> 
>>>

Our Python objects are now serialized to the file. To deserialize it back to the Python object we use the load() function.

Deserializing with load() #


The load() function deserializes the JSON object from the file like object and returns it.

Its syntax is as follows:

load(fp) -> a Python object
Argument Description
fp A file-like object from where the JSON string will be read.

Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
>>> 
>>> with open('person.json', 'r') as f:
...     person = json.load(f)
... 
>>> 
>>> type(person)  # notice the type of data returned by load()
<class 'dict'>
>>> 
>>> person
{'age': 27, 'isAlive': True, 'hasMortgage': None, 'address': {'state': 'NY', 'streetAddress': '21 2nd Street', 'city': 'New York', 'postalCode': '10021-3100'}, 'first_name': 'John'}
>>> 
>>>

Serializing and Deserializing with dumps() and loads() #


The dumps() function works exactly like dump() but instead of sending the output to a file-like object, it returns the output as a string.

Similarly, loads() function is as same as load() but instead of deserializing the JSON string from a file, it deserializes from a string.

Here are some examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
>>> 
>>> person = {
...     'first_name': "John",
...     "isAlive": True,
...     "age": 27,
...     "address": {
...         "streetAddress": "21 2nd Street",
...         "city": "New York",
...         "state": "NY",
...         "postalCode": "10021-3100"
...     },
...     "hasMortgage": None
... }
>>> 
>>> data = json.dumps(person)   # serialize
>>>
>>> data
'{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}'
>>> 
>>> 
>>> person = json.loads(data)  # deserialize from string
>>> 
>>> type(person)
<class 'dict'>
>>> 
>>> person
{'age': 27, 'isAlive': True, 'hasMortgage': None, 'address': {'state': 'NY', 'streetAddress': '21 2nd Street', 'city': 'New York', 'postalCode': '10021-3100'}, 'first_name': 'John'}
>>> 
>>>

note:

Since dictionary doesn't preserve the order of elements, the order in which you get the keys may vary.

Customizing the Serializer #


The following are some optional keyword arguments that can be passed to the dumps or dump() function to customize the Serializer.

Argument Description
indent A positive integer which determines the amount of indentation of key-value pairs at each level. The indent arguments come in handy to prettify the output if you have deeply nested data structures. The default value of indent is None.
sort_keys A boolean flag, if set to True returns a JSON string ordered by keys, instead of being randomly ordered. Its default value is False.
skipkeys JSON format expects the keys to be a string, if you try to use a type which can't be converted to a string (like tuple) then a TypeError exception will be raised. To prevent the exception from being raised and skip the non-string keys set the skipkeys argument to True.
separators It refers to a tuple of the form (item_separator, key_separator). The item_separator is a string which is used to separate items in a list. The key_separator is also a string and is used to separate keys and values in a dictionary. By default, the separators set to (',', ': ').

Here are some examples which demonstrates how to use these arguments in action:

Example 1: Using indent

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
>>>
>>> print(json.dumps(person))  # without indent
{"age": 27, "isAlive": true, "hasMortgage": null, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}
>>> 
>>>
>>> print(json.dumps(person, indent=4))  # with 4 levels of indentation
{
    "age": 27,
    "isAlive": true,
    "hasMortgage": null,
    "address": {
        "state": "NY",
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "postalCode": "10021-3100"
    },
    "first_name": "John"
}
>>> 
>>>

Keep in mind that increasing indentation also increases the size of the data. So, don't use indent in the production environment.

Example 2: Using sort_keys

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
>>> 
>>> print(json.dumps(person, indent=4))   # print JSON string in random order
{
    "address": {
        "state": "NY",
        "postalCode": "10021-3100",
        "city": "New York",
        "streetAddress": "21 2nd Street"
    },
    "hasMortgage": null,
    "first_name": "John",
    "isAlive": true,
    "age": 27
}
>>> 
>>> 
>>> print(json.dumps(person, indent=4, sort_keys=True))  # print JSON string in order by keys
{
    "address": {
        "city": "New York",
        "postalCode": "10021-3100",
        "state": "NY",
        "streetAddress": "21 2nd Street"
    },
    "age": 27,
    "first_name": "John",
    "hasMortgage": null,
    "isAlive": true
}
>>> 
>>>

Example 3: Using skipkeys

1
2
3
4
5
6
7
8
9
>>> 
>>> data = {'one': 1, 'two': 2, (1,2): 3}
>>> 
>>> json.dumps(data, indent=4)
Traceback (most recent call last):
  ...      
TypeError: key (1, 2) is not a string
>>> 
>>>

In this case, the key (1,2) can't be converted to a string, so a TypeError exception is raised. To prevent the exception from being raised and skip the non-string keys use the skipkeys argument.

1
2
3
4
5
6
7
8
>>> 
>>> print(json.dumps(data, indent=4, skipkeys=True))
{
    "two": 2,
    "one": 1
}
>>> 
>>>

Example 4: Using separators

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
>>>
>>> employee = {
...     'first_name': "Tom",
...     "designation": 'CEO',
...     "Salary": '2000000',
...     "age": 35,
...     "cars": ['chevy cavalier', 'ford taurus', 'tesla model x']
... }
>>> 
>>>
>>> print(json.dumps(employee, indent=4, skipkeys=True,))
{
    "designation": "CEO",
    "age": 35,
    "cars": [
        "chevy cavalier",
        "ford taurus",
        "tesla model x"
    ],
    "Salary": "2000000",
    "first_name": "Tom"
}
>>> 
>>>

There are three things to notice in the above output:

  1. Each key-value pair is separated using a comma (,).
  2. Items in the array (like cars) is also separated using a comma (,).
  3. The keys of the JSON object are separated from values using ': ' (i.e a colon followed by a space).

The separator in the first two cases is controlled using the item_separator string and the last one is controlled using the key_separator. The following example changes and item_separator and key_separator to pipe (|) and dash (-) characters respectively

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
>>> 
>>> print(json.dumps(employee, indent=4, skipkeys=True, separators=('|', '-')))
{
    "designation"-"CEO"|
    "age"-35|
    "cars"-[
        "chevy cavalier"|
        "ford taurus"|
        "tesla model x"
    ]|
    "Salary"-"2000000"|
    "first_name"-"Tom"
}
>>> 
>>>

Now you know how separators work, we can make the output more compact by removing the space character from the item_separator string. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
>>>
>>> print(json.dumps(employee, indent=4, skipkeys=True, separators=(',', ':')))
{
    "designation":"CEO",
    "age":35,
    "cars":[
        "chevy cavalier",
        "ford taurus",
        "tesla model x"
    ],
    "Salary":"2000000",
    "first_name":"Tom"
}
>>> 
>>>

Serializing Custom Object #


By default, json module only allows us to serialize following basic types:

  • int
  • float
  • str
  • bool
  • list
  • tuple
  • dict
  • None

If you try to serialize or deserialize a custom object or any other built-in types, a TypeError exception will be raised. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
>>>
>>> from datetime import datetime
>>>
>>> now = datetime.now()
>>> 
>>> now
datetime.datetime(2018, 9, 28, 22, 16, 46, 16944)
>>>
>>> d  = {'name': 'bob', 'dob': now}
>>>
>>> json.dumps(d)
Traceback (most recent call last):
  ...
TypeError: datetime.datetime(2018, 9, 28, 22, 7, 0, 622242) is not JSON serializable
>>> 
>>>
>>> 
>>> 
>>> class Employee:
...     
...     def __init__(self, name):
...             self.name = name
... 
>>> 
>>> e = Employee('John')
>>> 
>>> e
<__main__.Employee object at 0x7f20c82ee4e0>
>>> 
>>>
>>> json.dumps(e)
Traceback (most recent call last):
  ...
TypeError: <__main__.Employee object at 0x7f20c82ee4e0> is not JSON serializable
>>> 
>>>

To serialize custom objects or built-in types, we have to create our own serialization function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def serialize_objects(obj):    

    # serialize datetime object

    if isinstance(obj, datetime):
        return {
            '__class__': datetime.__name__,
            '__value__': str(obj)
        }

    # serialize Employee object
    # 
    # if isinstance(obj, Employee):
    #     return {
    #         '__class__': 'Employee',
    #         '__value__': obj.name
    #     }
    raise TypeError(str(obj) + ' is not JSON serializable')

Here are a few things to notice about the function.

  1. The function takes a single argument named obj.

  2. In line 5, we check the type of the object using the isinstance() function. Checking the type is not strictly necessary if your function is serializing only a single type, but makes it easy to add serialization for other types.

  3. In line 6-9, we create a dictionary with two keys: __class__ and __value__. The __class__ key stores the original name of the class and will be used to deserialize the data. The __value__ key stores the value of the object, in this case, we are simply converting  datetime.datetime object to its string representation using the built-in str() function.

  4. In line 18, we raise TypeError exception. This is necessary otherwise our serialization function wouldn't report errors for objects it can't serialize.

Our serialization function is now ready to serialize datetime.datetime objects.

The next question is - how do we pass our custom serialization function to dumps() or dump().

We can pass custom serialization function to dumps() or dump() using the default keyword argument. Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
>>> 
>>> def serialize_objects(obj):
...     if isinstance(obj, datetime):
...         return {
...             '__class__': datetime.__name__,
...             '__value__': str(obj)
...         }
...     raise TypeError(str(obj) + ' is not JSON serializable')
... 
>>>
>>> employee = {
...     'first_name': "Mike",
...     "designation": 'Manager',
...     "doj": datetime(year=2016, month=5, day=2),  # date of joining
... }
>>>
>>> 
>>> emp_json = json.dumps(employee, indent=4, default=serialize_objects)
>>> 
>>> 
>>> print(emp_json)
{
    "designation": "Manager",
    "doj": {
        "__value__": "2016-05-02 00:00:00",
        "__class__": "datetime"
    },
    "first_name": "Mike"
}
>>> 
>>>

Notice how datetime.datetime object is serialized as a dictionary with two keys.

It is important to note that the serialize_objects() function will only be called to serialize objects which are not one of the basic types in Python.

We have now successfully serialized datetime.datetime object. Let's see what would happen if we try to deserialize it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
>>> 
>>> emp_dict = json.loads(emp_json)
>>> 
>>> type(emp_dict)
<class 'dict'>
>>> 
>>> emp_dict
{'designation': 'Manager', 'doj': {'__value__': '2016-05-02 00:00:00', '__class__': 'datetime'}, 'first_name': 'Mike'}
>>> 
>>> emp_dict['doj']
{'__value__': '2016-05-02 00:00:00', '__class__': 'datetime'}
>>> 
>>>

Notice that the value of doj key is returned as a dictionary instead of datetime.datetime object.

This happens because loads() function doesn't know anything about the serialize_objects() function which serializes the datetime.datetime object in the first place.

What we need is the opposite of the serialize_objects() function - A function that takes a dictionary object, check the existence of __class__ key and build the datetime.datetime object from the string representation stored in the __value__ key.

1
2
3
4
5
6
7
8
9
def deserialize_objects(obj):   
    if '__class__' in obj:
        if obj['__class__'] == 'datetime':
            return datetime.strptime(obj['__value__'], "%Y-%m-%d %H:%M:%S")

        # if obj['__class__'] == 'Employee':
        #     return Employee(obj['__value__'])

    return obj

The only thing to notice here is that we are using datetime.strptime function to convert datetime string into a datetime.datetime object.

To pass our custom deserialization function to the loads() method we use the object_hook keyword argument.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
>>> 
>>> def deserialize_objects(obj):       
...     if '__class__' in obj:
...         if obj['__class__'] == 'datetime':
...             return datetime.strptime(obj['__value__'], "%Y-%m-%d %H:%M:%S")
...         # if obj['__class__'] == 'Employee':
...         #     return Employee(obj['__value__'])
...     return obj
... 
>>> 
>>> 
>>> emp_dict = json.loads(emp_json, object_hook=deserialize_objects)
>>> 
>>> emp_dict
{'designation': 'Manager', 'doj': datetime.datetime(2016, 5, 2, 0, 0), 'first_name': 'Mike'}
>>> 
>>> emp_dict['doj']
datetime.datetime(2016, 5, 2, 0, 0)
>>> 
>>>

As expected, this time the value of doj key is a datetime.datetime object instead of a dictionary.


Other Tutorials (Sponsors)