Reading and Writing JSON in Python

(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!

JSON (JavaScript Object Notation) is language-neutral data interchange format. It was created and popularized by Douglas Crockford. In its short history, JSON has become a defacto standard for data transfer across the web.

JSON is a text-based format which is derived from JavaScript object syntax. However, it is completely independent of JavaScript, so you don’t need to know any JavaScript to use JSON.

JSON is commonly used by web applications to transfer data between client and server. If you are using a web service then there are good chances that data will be returned to you in JSON format, by default.

Before the inception of JSON, XML was predominantly used to send and receive data between the client and the server. The problem with XML is that that it is verbose, heavy and not easy to parse. However, this is not the case with JSON, as you will see soon.

The following is an example of an XML document describing a person.

The same information can be represented using JSON as follows:

I am sure you will agree that the JSON counterpart is much easier to read and write.

Also, notice that the JSON format closely resembles to dictionaries in Python.

Serialization and Deserialization

Serialization: The process of converting an object into a special format which is suitable for transmitting over the network or storing in file or database is called Serialization.

Deserialization: It is the reverse of serialization. It converts the special format returned by the serialization back into a usable object.

In the case of JSON, when we serializing objects, we essentially convert a Python object into a JSON string and deserialization builds up the Python object from its JSON string representation.

Python provides a built-in module called json for serializing and deserializing objects. To use json module import it as follows:

The json module mainly provides the following functions for serializing and deserializing.

  1. dump(obj, fileobj)
  2. dumps(obj)
  3. load(fileobj)
  4. loads(s)

Let’s start with the dump() function.

Serializing with dump()

The dump() function is used to serialize data. It takes a Python object, serializes it and writes the output (which is a JSON string) to a file like object.

The syntax of dump() function is as follows:

Syntax: dump(obj, fp)

objObject to be serialized.
fpA file-like object where the serialized data will be written.

Here is an example:

Notice that while serializing the object, the Python’s type None is converted to JSON’s null type.

The following table lists the conversion happens between types when we serialize the data.

Python TypeJSON Type
dict object
list, tuple array
int number
float number
str string
True true
False false
None null

When we deserialize object, JSON type is converted back to its equivalent Python type. This action is depicted in the table below:

JSON TypePython Type
object dict
array list
string str
number (int) int
number (real) float
true True
false False
null None

Here is another example which serializes a list of two persons:

Our Python objects are now serialized to the file. To deserialize it back to the Python object we use the load() function.

Deserializing with load()

The load() function deserializes the JSON object from the file like object and returns it.

Its syntax is as follows:

fpA file-like object from where the JSON string will be read.

Here is an example:

Serializing and Deserializing with dumps() and loads()

The dumps() function works exactly like dump() but instead of sending the output to a file-like object, it returns the output as a string.

Similarly, loads() function is as same as load() but instead of deserializing the JSON string from a file, it deserializes from a string.

Here are some examples:

Note: Since dictionary doesn’t preserve the order of elements, the order in which you get the keys may vary.

Customizing the Serializer

The following are some optional keyword arguments that can be passed to the dumps or dump() function to customize the Serializer.

indentA positive integer which determines the amount of indentation of key-value pairs at each level. The indent arguments come in handy to prettify the output if you have deeply nested data structures. The default value of indent is None.
sort_keysA boolean flag, if set to True returns a JSON string ordered by keys, instead of being randomly ordered. Its default value is False.
skipkeysJSON format expects the keys to be a string, if you try to use a type which can’t be converted to a string (like tuple) then a TypeError exception will be raised. To prevent the exception from being raised and skip the non-string keys set the skipkeys argument to True.
separatorsIt refers to a tuple of the form (item_separator, key_separator). The item_separator is a string which is used to separate items in a list. The key_separator is also a string and is used to separate keys and values in a dictionary. By default, the separators set to (',', ': ').

Here are some examples which demonstrates how to use these arguments in action:

Example 1: Using indent

keep in mind that increasing indentation also increases the size of the data. So, don’t use indent in the production environment.

Example 2: Using sort_keys

Example 3: Using skipkeys

In this case, the key (1,2) can’t be converted to a string, so a TypeError exception is raised. To prevent the exception from being raised and skip the non-string keys use the skipkeys argument.

Example 4: Using separators

There are three things to notice in the above output:

  1. Each key-value pair is separated using a comma ( ,).
  2. Items in the array (like cars) is also separated using a comma ( ,).
  3. The keys of the JSON object are separated from values using ': ' (i.e a colon followed by a space).

The separator in the first two cases is controlled using the item_separator string and the last one is controlled using the key_separator. The following example changes and item_separator and key_separator to pipe ( |) and dash ( -) characters respectively

Now you know how separators work, we can make the output more compact by removing the space character from the item_separator string. For example:

Serializing Custom Object

By default, json module only allows us to serialize following basic types:

  • int
  • float
  • str
  • bool
  • list
  • tuple
  • dict
  • None

If you try to serialize or deserialize a custom object or any other built-in types, a TypeError exception will be raised. For example:

To serialize custom objects or built-in types, we have to create our own serialization function.

Here are a few things to notice about the function.

  1. The function takes a single argument named obj.

  2. In line 5, we check the type of the object using the isinstance() function. Checking the type is not strictly necessary if your function is serializing only a single type, but makes it easy to add serialization for other types.

  3. In line 6-9, we create a dictionary with two keys: __class__ and __value__. The __class__ key stores the original name of the class and will be used to deserialize the data. The __value__ key stores the value of the object, in this case, we are simply converting  datetime.datetime object to its string representation using the built-in str() function.

  4. In line 18, we raise TypeError exception. This is necessary otherwise our serialization function wouldn’t report errors for objects it can’t serialize.

Our serialization function is now ready to serialize datetime.datetime objects.

The next question is – how do we pass our custom serialization function to dumps() or dump().

We can pass custom serialization function to dumps() or dump() using the default keyword argument. Here is an example:

Notice how datetime.datetime object is serialized as a dictionary with two keys.

It is important to note that the serialize_objects() function will only be called to serialize objects which are not one of the basic types in Python.

We have now successfully serialized datetime.datetime object. Let’s see what would happen if we try to deserialize it.

Notice that the value of doj key is returned as a dictionary instead of datetime.datetime object.

This happens because loads() function doesn’t know anything about the serialize_objects() function which serializes the datetime.datetime object in the first place.

What we need is the opposite of the serialize_objects() function – A function that takes a dictionary object, check the existence of __class__ key and build the datetime.datetime object from the string representation stored in the __value__ key.

The only thing to notice here is that we are using datetime.strptime function to convert datetime string into a datetime.datetime object.

To pass our custom deserialization function to the loads() method we use the object_hook keyword argument.

As expected, this time the value of doj key is a datetime.datetime object instead of a dictionary.

Other Tutorials (Sponsors)

This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!

Leave a Reply

Your email address will not be published. Required fields are marked *