Rezilion Validate in Depth: How We Analyze Python Runtime
by Tomer Shefi, Senior Researcher, Rezilion
At Rezilion, we eliminate friction in the DevSecOps process by identifying which vulnerabilities pose an actual risk to an organization. This dynamic approach allows us to filter out unloaded vulnerabilities and reduce the workload of the security and development teams.
Because we need to analyze the process we also need to understand its runtime environment (native, c#, java, python, etc.) and based on its runtime, analyze it differently. This blog post will summarize Rezilion’s approach to analyze Python runtime using Rezilion Validate.
Let’s say one of our applications runs this code:
If we statically scan the code, we see that NumPy is being imported. In the Python advisory, there are multiple vulnerabilities that are related to NumPy:
But in a closer look we will see that the vulnerable code for PYSEC-2018-34 is related to numpy.f2py making it irrelevant if f2py is not loaded.
How Do We Solve This?
After some research we were able to find how to dynamically analyze Python programs and extract what files, classes, and functions are loaded to memory, allowing us to differentiate between vulnerabilities that are a part of the same package but might not have their code actually loaded to memory.
The script above prints the list of all files/modules loaded to memory. Because no f2py file is loaded to memory we will not consider the PYSEC-2018-34 vulnerability as loaded. Saving precious time for the security team. On the other hand, if we had run a script like this:
We would see its files are loaded:
Before we found this solution, we considered multiple other ways, but they all had issues.
For example, running over the memory with a regex expression could take up to 10 times longer because the python process memory can grow by hundreds of megabytes (and it is even quite common).
Moreover, when we use the regex solution, we can only find the files loaded. In order for us to find the classes and functions we would need to run over the memory multiple times, making the regex solution both less useful and less efficient.
We then decided to go for a more in-depth research that would allow us to minimize the amount of data we process.
Using our technique, we are able to scan over 4500 loaded functions, 1400 loaded classes, and 5000 other objects (i.e. variables) from 218 loaded modules in less than one second. From each class, we extract its name, what classes it inherits from, and more.
A Little More In Depth
Our first step is finding our interpreters, sub-interpreters, and their states.
(Python uses multiple sub-interpreters for threading / multi-core support purposes; the API support for this process exists since Python 1.5).
Each interpreter (including sub-interpreters) instance holds a pointer to a hashmap (or dictionary in Python) of all the loaded modules and their instances.
This is, of course, very useful but we first need to find the interpreter objects.
The Block Starting Symbol (BSS) section holds the addresses of uninitialized global variables.
We can find the interpreter objects by reading symbol addresses from the BSS section, and by reading data from certain offsets and pointer addresses, we can identify the main interpreter instance.
Each interpreter node points to the next interpreter, enabling us to run across all interpreters and extract all loaded modules.
We can then parse out the internal objects in the module (equivalent to running the dir() function on a module, but with a pointer reference to each object this allows us to see up to the class/function level).
How can Rezilion Validate help you patch less?
Free yourself from the burden of your backlog and get back to building. Rezilion Validate reduces patching needs by 70% or more by aggregating vulnerability scan results and automatically filtering them to focus on what’s actually loaded and exploitable.