Working with JSON data in Python


Working with JSON data in Python

JSON is a very popular data format. Its popularity is probably due to it’s simplicity, flexibility and ease of use. Python provides some pretty powerful tools that makes it very easy to work with JSON data.

Python has a builtin module called JSON which contains all the features you need to deal with exporting, importing and validating JSON data.

What is JSON?

JSON stands for JavaScript Object Notation. It comes from JavaScript, but can be used in any programming language. It can be used to transfer or store data in a simple human-readable format.

It is a subset of JavaScript (so it is executable with eval, but you should never ever do that, as it can lead to very serious security issues)

It is important to note, that JSON is not a concrete technology it is just a standard for describing data. So it does not define things like maximal string length, biggest available integer or floating point accuracy - however the underlying language or a certain implementation of a JSON parser will certainly have these kinds of limitations.

  1. Generating and parsing JSON is easy for machines
  2. JSON is a human-readable data format
  3. It is extremely simple
  4. Despite of it’s simplicity, it’s still quite powerful and flexible

What does JSON data look like?

As I mentioned above, JSON is a subset of JavaScript, but it has some restrictions. Basically, you can define JSON objects the way you would define objects in JavaScript.

An example of a piece of JSON data:

{
    "exampleString": "hello",
    "exampleObject": {"field":  "value"},
    "exampleNumber": 1234,
    "exampleArray": ["aString", 1234, {"field2": "value2"}]
}

Note, that the syntax is a bit stricter than in JavaScript:

  • JSON objects cannot have field names without the surrounding double quotes ({field: "value",} is invalid)
  • JSON strings must be enclosed in double quotes - single quotes are not allowed ({"field": 'value',} is invalid)
  • Trailing commas after the last field are not allowed in JSON objects ({"field": "value",} is invalid)

JSON data types

JSON defines four data types: string, number, object, array, and the special values of "true", "false" and "null". That’s all. Of course arrays and objects can contain strings, numbers or nested arrays and objects, so you can build arbitrarily complex data structures.

JSON strings

JSON strings consist of zero or more characters enclosed in double quotes.

Examples: "", "hello world"

JSON number

JSON numbers can be integers or decimals, the scientific notation is also allowed.

Examples: 123, -10, 3.14, 1.23e-14

JSON object

Objects are a collection of key-value pairs. Keys should be enclosed in double quotes. Keys and values are separated by colons and the pairs are separated by commas. Values can be of any valid JSON type. The object is enclosed in curly braces.

Example:

{"hello": "world", "numberField": 123}

JSON array

JSON arrays can contain zero or more items separated by commas. Items can be of any valid type.

Examples: [], ["a"],[1, 2, 3], ["abc", 1234, {"field": "value"}, ["nested", "list"]]

Where is JSON used?

JSON can be used to both transfer and store data.

JSON web APIs - JSON data transfer in HTTP REST APIs

JSON is commonly used in REST APIs both in the request and the response of the body. The clients’ requests are usually marked with the application/json header. An http client can also indicate that it excepts a JSON response by using the Accept header.

Example HTTP request:

POST /hello HTTP/1.1
Content-Type: application/json
Accept: application/json

{"exampleData": "hello world"}
HTTP/1.1 200 OK
Content-Type: application/json

{"exampleResponse": "hello"}

NoSQL databases

JSON is commonly used for communicating with non-relational databases (such as MongoDB). NoSQL databases let you dynamically define the structure of your data, and JSON is perfect for the task because of its simplicity and flexibility.

JSON in Python - The JSON module

Working with JSON in Python is rather simple as Python has a builtin module that does all the heavy lifting for you. With the help of the json module you can parse and generate JSON-encoded strings and also read or write JSON encoded files directly.

Working with JSON strings

Exporting data to JSON format

You can turn basic Python data types into a JSON-encoded string with the help of json.dumps, the usage is pretty simple:

data = {
    "list": ["hello", "world"],
    "integer": 1234,
    "float": 3.14,
    "dir": {"a": "b"},
    "bool": False,
    "null": None
}

import json
json_encoded_data = json.dumps(data)
print(json_encoded_data)

Output:

{
    "float": 3.14,
    "list": ["hello", "world"],
    "bool": false,
    "integer": 1234,
    "null": null,
    "dir": {"a": "b"}
}

Parsing a JSON string

The reverse - parsing a JSON-encoded string into Python objects can be done by using the json.loads method, like so:

json_encoded_data = '''{
    "float": 3.14,
    "list": ["hello", "world"],
    "bool": false,
    "integer": 1234,
    "null": null,
    "dir": {"a": "b"}
}'''

import json
data = json.loads(json_encoded_data)
print(data)

output

{
    'float': 3.14,
    'list': ['hello',
    'world'],
    'bool': False,
    'integer': 1234,
    'null': None,
    'dir': {'a': 'b'}
}

Validating a JSON string

The Python json module does not have a dedicated way to validate a piece of JSON data, however you can use json.loads to do that. json.loads will raise a JSONDecodeError exception, so you can use that to determine whether or not a string contains properly formatted JSON.

For example, you can define the following function to validate JSON strings:

import json

def is_valid_json(data: str) -> bool:
    try:
        json.loads(data)
    except json.JSONDecodeError:
        return False
    return True

This function accepts a string as its single argument and will return a boolean. It will try to load the string and if it is not a valid JSON, it will catch the raised exception, and return False. If the JSON is valid, no exception will be raised, so the return value will be True.

Working with JSON files in Python

The json module also makes it possible for you to work with JSON files directly. Instead of loads and dumps you can use the load and dump methods. These methods work directly on files - they take an extra argument, and instead of reading/writing strings in memory they will let you import/export JSON data from/to the files you pass.

Exporting data to a JSON file

Export JSON data can be done by using the json.dump function. It takes two arguments, the first is the Python object that you’d like to export, while the second is the file where you want to write the encoded data.

Example usage:

data = {
    "list": ["hello", "world"],
    "integer": 1234,
    "float": 3.14,
    "dir": {"a": "b"},
    "bool": False,
    "null": None
}

import json
with open('ouptut.json', 'w') as output_file:
    json_encoded_data = json.dump(data, output_file)

First we opened the file for writing and passed the file handle to json.dump as its second argument. output.json will contains something like (added whitespace for readability):

{
    "float": 3.14,
    "list": ["hello", "world"],
    "bool": false,
    "integer": 1234,
    "null": null,
    "dir": {"a": "b"}
}

Parsing a JSON file

Reading JSON data from a file to an in-memory Python object can be done very similarly - with the help of the json.load method.

This method takes a file as it’s argument - the file that you’d like to read from.

For example, to parse the file that we created in the previous example, we can write:

import json
with open('ouptut.json', 'w') as input_file:
    data = json.load(input_file)
print(data)

First we open the file for reading, and then pass the file handle to json.load

Expected output;

{
    'float': 3.14,
    'list': ['hello',
    'world'],
    'bool': False,
    'integer': 1234,
    'null': None,
    'dir': {'a': 'b'}
}

Validating a JSON file

To validate that a file contains valid JSON data, we can use the json.load method and try to load the JSON contained in the file. On failure we can catch the JSONDecodeError raised by json.load. If no exception occurs, the file contains valid JSON.

import json

def is_valid_json_file(input_file: str) -> bool:
    try:
        with open(input_file, 'r') as f:
            json.load(f)
    except json.JSONDecodeError:
        return False
    return True