Case Study: The Compatibility Challenge of Using Outside Software in Product Development
By: Bareket Sarusi, Senior Researcher, Rezilion
Whoever you are, whether you’re a developer that intends to contribute to a project or a researcher that seeks to reveal how a project works, consider this: When it comes to facing a large codebase, understanding how the project’s package dependency resolution works is one of the most important and underrated actions one can easily skip.
As part of the research done for Rezilion’s platform, I’ve recently been focusing my efforts on studying the dependency resolution process within the .NET frameworks. Through this study, I’ve achieved a deep understanding, formed some essential steps to approach this kind of research, and even discovered a creative way to find all existing packages on a .NET operating host.
In this post, I’ll walk you through a guide for exploring your environment dependencies. This guide contains both the steps I formed and my .NET research results for you to use as a case study throughout the reading.
Dependency Resolution: What Is It?
Dependency resolution is a process that usually consists of two phases which are repeated until the dependency graph is complete:
Eventually, this kind of procedure will enable you to know which exact libraries or packages (and in what versions) should exist for your code to depend on.
Why It’s Important to Understand How The Process Works
We live in a world where software platforms are regularly updated. This increases the risk that a dependent package could change underneath your deployed software, and cause your previously working software to fail.
For example, if the package is available on a user’s machine, but the required version is not, you will find that the code will fail to run due to a versioning conflict. This might happen during deployment (because the version isn’t available) or during execution (due to incompatible library APIs between versions). In many ways, a version dependency failure during execution is worse, because the dependency problem might go unnoticed for some time.
You may have developed a fully functional software system in your development environment, but deployment in the target environment yields compatibility problems and runtime errors. This renders your software useless. Without the exact knowledge, resolving these issues can be incredibly time-consuming and stressful, particularly if deployment is left until the last minute!
I bet it’s obvious to you that a researcher should know the language the code is written in and how it functions. Similarly, it should seem pretty obvious to you that a researcher should know the code dependencies and how they are resolved.
Software research can reach the point where to complete the research, a full white-box, reverse engineering inspection is required. As established, although a software is a whole lot greater than the sum of its parts, it is the sum of its parts. So, inspecting the software itself might eventually result in inspecting a particular library it depends on.
These kinds of research on their own are expensive and time-consuming. Imagine yourself going through a whole process of thoroughly examining an external library. Then, you realize you’ve picked a different library version than the one the software is using, thus returning empty-handed to your starting point. Painfully frustrating.
It’s no secret that getting the hang of your software bill of materials (SBOM for short) is becoming a more pressing matter these days. The reasoning behind this, amongst other things, is to regulate the implications of using external software and to easily determine whether they are at potential risk of a newly discovered vulnerability.
Vulnerabilities are part of the software development process, because it’s a given that errors will occur. So, it’s critical to have the ability to identify and address the most serious ones. What better way to start doing that than seeking the information about the components the application depends on yourself!
If you can connect all of these arguments into a compelling enough reason, I encourage you to continue reading.
How to Approach This Study
Overall, the key is to be aware of dependencies in your software and its intended operating environment and to take into account the changeability of the software’s operating environment over time.
Step 1 – Investigating the relevant package management system
The package manager and the package itself play a very important role in this study.
The package manager’s job is to build and resolve the dependency graph so that it knows what package to fetch from the remote repository and which not to.
The package’s job, other than equipping you with additional pieces of code, is to provide whatever is necessary for the package manager to complete its job. Yet, the fact that the package manager knows which packages are installed, doesn’t change the fact that you don’t know which packages are installed. Or could it?
The packages and package manager relations create a rather unique ecosystem that only those who understand the internal processes and have used the protocols can benefit from.
Let’s take a look at the .NET frameworks’ ecosystem and evaluate what it could provide for a user.
The mechanism for sharing code is called NuGet. Put simply, a NuGet package is a single ZIP file with the .nupkg extension. That ZIP file mainly contains a .nuspec file, which is the package manifest and .dll files that contain the actual code.
In .NET, to use a certain package it needs to be added to the specific project. This can be done in many ways, but the result is mostly the same: a package management format is generated for the project, and when the project is to be built, run or restored, the package manager figures out the project’s dependency graph. The dependencies are then downloaded respectively.
In order to keep track of the project requirements, .NET has provided several package management formats over the years:
These formats allow you to list the immediate project dependencies, without bothering yourself with what is already installed or which versions should satisfy every need.
After figuring out the dependency graph, .NET writes the output to a file called project.assets.json, and if enabled, to a file called packages.lock.json. Meaning these files will describe the relationships between packages at all levels, and most importantly the versions of each package, in order to satisfy your project or other packages’ needs.
With this knowledge, this unique ecosystem is no longer just for the package manager to use, and you are more than capable of taking advantage of it!
Step 2 – Figuring out the given context
This step is important because it is one thing to know what the package manager provides you with generally, but it might provide you with something entirely different within the context you’re working on. Is it a single project? Or a whole production environment? What kind of functionality is available?
In terms of .NET, when the given context is a development environment, it is most likely that the .NET SDK is installed. This doesn’t only provide us the functionality to run a .NET project, but to build it as well. Meaning the files that contain the dependency graph will be available since they’re created when a project is built or restored.
Rezilion’s Validate product runs on a client’s production environment, and it concerns all existing dependencies in production and not just one project’s. Therefore, it is plausible to assume that .NET SDK won’t be installed, but that just the .NET Runtime will be. The latter comes with much fewer functionalities, and will probably need different metadata to successfully run.
Keeping that in mind, it’s important to redo the first step, since you can only use what’s within your given context. In my case, what’s given in a production environment.
Step 3 – Inspecting the existing manifests
This third step is basically connecting the dots of the first and second steps. It’s as if 1 + 2 = 3 🙂
Think – within the given context, what kinds of functionalities and data can be found and used?
When it comes to the .NET research, parsing all project.assets.json files in the environment could be the perfect solution. But alas, it does not comply with the existing context.
Usually a .NET production environment contains deployed projects.
There are 2 types of deployment:
- Framework-dependent deployment produces a cross-platform executable that uses the locally installed .NET runtime.
- Self-contained deployment produces a platform-specific executable and includes a local copy of the .NET runtime.
Bottom line, a deployed .NET environment will contain a whole bunch of .dll files. .NET Runtime is composed of various .dll files, each installed package is eventually a .dll file and each project translates to a .dll file.
Listing all existing DLL files will result in a list of all existing projects, dependencies and internal runtime libraries. Yet, it’s only half a solution since dependency resolution consumes both package name and package version.
How is it possible to resolve a package version, if no other metadata files exist?
It appears that within each DLL file some metadata information is embedded in the form of resources. Specifically, .NET DLL structures contain a VERSIONINFO resource. So, by treating the DLL files as package manifests and parsing them and the resource they contain, it is possible to find out the package version.
Interestingly, each .NET environment, regardless of which .NET is installed, will contain DLL files for the projects and dependencies. So, it is in fact the lowest common denominator, and using them will work in all given contexts.
When your software needs more functionality, there’s no need to waste your time re-inventing the wheel. This is the very first lesson you learn when you start programming. As a result, many people give their software the functionality it needs by reusing other people’s software, such as code libraries or packages. However, using dependent software may lead to expensive consequences when dependency problems are caused.
Each language has its own package management mechanisms you could investigate and be surprised by. Even .NET has a few nifty tricks up its sleeve. Now that I know them better, I feel much more confident using a .NET running environment, as you can only fully trust something when it’s familiar to you.
Get to trust yours!