CVE Manager GitHub

CVE Manager is the backbone of most statistics and is essential to validating the found CVE entries. It is a lightweight solution that downloads the CVE data from the MITRE Corporation’s website. We store most of the collected data in a PostgreSQL database. The tool is used to query for CVE entries found by the miner and some of their properties like their id, impact score, severity, and so on.

Git Log Parser GitHub

The other important tool used by the miner is our git log parser solution. It simulates user commands using the Python subprocess module, which allows it to bypass some of git’s limitations. The script is prepared to mine local directories for data in the contained repository’s commits. The parser first navigates to the path provided by the user through command line input, then issues the git log command that lists every commit and their meta-data.

It then saves this information into a list that will later be printed into a JSON file. This basic data is being extended with the line and file change information by comparing each commit to its predecessor with the git diff command.

The reports generated by the parser can be useful in a variety of situations, similar to ours, where an external utility needs the logs of a specific git repository. Currently merge commits are ignored by the parser.

CVE Miner GitHub

The main tool of our project is the miner, which uses both the CVE Manager and the Git Log Parser to create a JSON file and a database entry for each CVE found and presumably, fixed in the actual repository.

The miner requires some initial setup since the CVE data needs to be downloaded and inserted into a local PostgreSQL database. This is done in two steps. In the first step, the data is collected into a local NVD directory in the form of csv files from which we read and upload it to the database in the second step.

There are multiple ways to start working with the CVE Miner. It can mine from both local and online sources. These options can be accessed using the command-line interface. When an online source is provided, a ‘repos’ directory will be created if one does not already exist and the given repository will be automatically downloaded into it. Then the miner will continue as if a local directory had been provided. Multiple targets can be specified at once using a JSON file and the appropriate command-line argument.

The miner then processes the repositories by using Git Log Parser. After the JSON file is generated, the tool searches the messages attached to the commits for CVE entries. If a CVE is mentioned once, the miner assumes that the associated commit fixes the CVE. If it is mentioned multiple times, it is assumed that the CVE required multiple fixes or later reemerged. During this process, other data is collected, including but not limited to the contributors, the number of changed files, and the number of commits between the first and last mention of the CVE.

The next step is the calculation of statistics. The miner uses the previously acquired information to calculate the average time between the first commit that mentioned the CVE and the last. The other part of our statistics is correlation testing. The tool calculates the correlation between a CVE entry’s severity and the time needed to fix it.