Is Python Memory Safe?


Is Python Memory Safe?

In this article I aim to explain the basics of memory handling in Python, and to answer some practical questions like: Is it possible to have a segfault in Python? Can you introduce memory leaks? Does Python have a garbage collector? How does Pythons GC work?

tldr: Python is a memory safe language, you don’t have to explicitly deal with low-level memory handling. You cannot easily introduce a segfault or a memory access violation.

Python is a garbage collected language, so it doesn’t have memory leaks like C does, but it is possible to create references in such an unfortunate way that the GC won’t be able to free them. I’ll explain all of this in more detail below.

What is Memory Safety?

A programming language is memory safe if it is impossible for its users to introduce memory access errors. A memory access error occurs when a program tries to access memory that it has not specifically allocated. Memory access errors are notoriously hard to track down, and at the same time they can have serious consequences, as they can lead to memory corruption, security breaches or crashing the program.

Types of Memory Access Errors

There are a few different ways a memory access error can occur:

  1. Use of uninitialized memory
  2. Null pointer dereference
  3. Buffer overflow
  4. Use after free
  5. Illegal free (calling free on non-allocated memory or double free)

You might notice that memory leaks are not included in this list, and that’s because a memory leak is not a memory access error as the program does not try to access unallocated memory. A memory leak occurs when a process fails to free an allocated piece of memory.

Memory leaks are just as hard to track down as memory access errors, but the consequences are slightly different. Memory leaks can go unnoticed without any serious impact on the system especially if they occur in a short-lived process (allocated memory is released when a process terminates). In long running processes though, memory leaks can cause the system to run out of free memory. These errors can also be exploited by a malicious user and can easily lead to successful denial-of-service (DoS) attacks.

Memory Safety in Python

In low-level programming languages like C it is the developers responsibility to take care of properly allocating and freeing memory. If the developer makes a mistake and forgets to initialize a pointer or tries to access an illegal index in an array it will lead to a memory access error. If the programmer forgets to free a chunk of dynamically allocated memory, it will lead to a memory leak.

In contrast high-level languages like Python, take care of some aspects of memory management. There are no pointers in Python, when an object is initialized, the interpreter will take care of allocating memory for it. Arrays and buffer objects are also checked runtime, the interpreter will throw an exception if there is an attempt to access unallocated memory. This is why Python is said to be memory safe.

The following example written in C demonstrates how easy it is to corrupt the contents of a variable.

#include <stdio.h>
int main()
{
    char a[1];
    char b[1];

    // by overindexing `a` we can modify the contents of `b[0]`
    a[1] = 'x'; 

    // prints x
    printf("%c", b[0]); 

    return 0;
}

It is not possible to make the same mistake in Python as the following will throw an exception.

a=[]
a[1]='x'   // IndexError: list index out of range

Of course these examples are overly simplistic, in real programs these mistakes are much less obvious to find.

Can Memory Access Errors Can Happen in Python?

There are a few exceptions to the above though. The python interpreter is written in C, so bugs in the interpreter can lead to memory errors and the same stands for native extensions.

It’s also worth noting that if you use foreign function libraries like cffi or ctypes, you basically circumvent the built-in memory safety features of Python, so your program becomes prone to memory access errors - that’s a trade-off for the extra flexibility and potential performance boost that comes with these libs.

Can Memory Leaks in Happen in Python?

As you cannot even explicitly allocate or free memory in Python you cannot introduce memory leaks as you could in a low-level programming language. However there are some special cases when the Python interpreter is not able to free memory for you, thus memory consumption can grow indefinitely.

To understand how that can happen we need to clarify a few more things about Pythons memory management.

Reference Counting

Python also lifts the burden off your shoulders of having to release unused memory, it does that for you automatically. It keeps track of objects that are no longer accessible and it releases the memory allocated by them. This mechanism is called reference counting.

This automatic memory management makes Python programs safer and it also allows for a much more rapid development cycle as developers do not have to deal with low-level memory management details. It has a price though: this added housekeeping requires some extra CPU cycles and some additional memory space, so it does negatively impact performance.

Reference counting itself is a rather simple mechanism: it means that each object has a counter and this counter is updated whenever the object has a new reference to it or a reference is deleted.

A reference can be a variable, a function parameter, a list, dict or any other object pointing to the object in question. Whenever an object is assigned to a variable, passed as parameter or added to a list for example, the refcount increases. When the opposite happens (the function returns, the variable, dict or holding object gets deleted) the refcount is decreased. When there are no references to an object it can be safely deleted.

You can get the reference count for any object with the help of sys.getrefcount

from sys import getrefcount

a="hello"

# returns 2, because of the variable a,` plus the call to getrefcount itself
getrefcount(a)   

b=a

# returns 3, because of the variables a and b, plus the call to getrefcount itself
getrefcount(a)   

del(b)

# returns 2 again, because b is no longer a reference to a
getrefcount(a)   

Isolated Cyclic References and Generational Garbage Collection

There is the problem of isolated cyclic references though. A cyclic reference can be an object having a reference to itself, or a series of objects referencing each other. When there are no external references other than these cyclic ones, it is an isolated cyclic reference. In this case the objects are not accessible from the program code, but the refcount is not zero. To deal with cyclic references Python has a different mechanism called generational garbage collection. You can learn more about generational garbage collection in the official docs.

When the GC Fails to Clean Up Inaccessible Memory - Strong and Weak References

In some cases the Python garbage collector might fail to clean up inaccessible objects and that might lead to ever-growing memory consumption. For example if you have circular dependencies with some objects implementing a __del__, the generational GC will detect that, but won’t be able to clean it up. You can use the weakref module in such cases.

A weak reference works a bit differently than a strong reference. If there are only weak references to an object, but no strong references, then Python is free to delete that object. This comes handy in some edge cases: you can use weak references to avoid isolated cycles of strong refs, so you don’t have to rely on the generational GC to clean up these objects.

Summary

Python itself is memory safe, if you stay away from native extensions. It is also safe from memory leaks, except for a few edge cases.

It is always useful to have a basic understanding of what’s going on under the hood, but if you stick your best practices and use clean, idiomatic Python then you won’t have to worry about memory management at all.