VISaR: A Code Scanning Tool for Data Platform Engineers

From Client Question to Open-Source Tool

When a client asks whether a particular open-source library is safe to bring into their environment, the honest answer is rarely a simple yes or no. It depends on what vulnerabilities exist, how severe they are, and whether your organisation can live with the risk. For teams working in regulated industries — defense, aviation, healthcare, financial services — that question carries real compliance weight.

That is why I built VISaR (Vulnerability Identification, Scanning and Reporting): a free, open-source Python tool that automates this assessment and delivers a structured, actionable report you can put in front of a stakeholder or attach to a compliance dossier.

VISaR v1.0.0 is now available on GitHub: github.com/AtLongLastAnalytics/visar

The Problem VISaR Solves

Every data platform team eventually faces the same decision: a developer wants to add an open-source library. It is well-maintained, widely used, and solves the problem elegantly. But before it enters your approved software list, someone needs to answer: does it have known vulnerabilities?

In practice, that question is answered inconsistently. Some teams do a quick Google. Some check the GitHub issues tab. Some rely entirely on Dependabot alerts that only fire after the library is already in production. None of these approaches produce a systematic, repeatable record of what was assessed and when.

VISaR gives you a repeatable pipeline that runs the same checks every time, against authoritative sources, and outputs a structured report you can version-control.

Who It Is For

VISaR is designed for three audiences:

Data Engineers and Data Platform Teams — Evaluating open-source libraries and frameworks before they enter your data stack. This is the primary use case: run VISaR before you approve a new dependency, keep the output as your evidence record.

Software Engineers — Validating your own codebase before a release. If you are preparing a production deployment or a public release, VISaR helps you confirm your dependency graph does not carry known vulnerabilities into that milestone.

Independent Developers and Hobbyists — Verifying code generated by AI assistants or sourced from the community. AI-generated code often pulls in third-party libraries without commentary on their security posture. VISaR gives you a fast, objective check.

How It Works

VISaR orchestrates two external components that are already trusted by the security community:

OSSF Scorecard is an open-source security health tool from the Open Source Security Foundation. It evaluates repositories against a range of security best practices and surfaces known vulnerability identifiers. VISaR runs Scorecard via Docker, so there is nothing proprietary in the scan itself — you are leveraging the same tooling used by Google, Microsoft, and other major OSS contributors.

OSV (Open Source Vulnerabilities) is Google's open database of vulnerabilities affecting open-source packages. Once VISaR has the vulnerability IDs from Scorecard, it calls the OSV API to retrieve severity ratings and full descriptions for each finding.

The pipeline runs in six steps:

Prerequisite checks — Docker running, GitHub token valid, Scorecard image present
OSSF Scorecard scan (summary pass)
OSSF Scorecard scan (vulnerability detail pass)
Extract vulnerability IDs from Scorecard output
Enrich each ID via the OSV API (severity, description)
Write results to a CSV file in the data/ directory

The output is a CSV sorted by severity — CRITICAL first, then HIGH, MODERATE, and LOW — so the most important findings are always at the top.

What You Get

A single command produces a structured report:

cd src/
uv run python main.py https://github.com/matplotlib/matplotlib

The output CSV contains one row per vulnerability with the ID, severity, and a full description sourced directly from the OSV database. The file is named after the repository and placed in data/ — ready to open in Excel, import into a ticketing system, or attach to a risk register. A typical scan completes in a couple of minutes, depending on repository size and network conditions

If the scan finds nothing, VISaR tells you clearly and exits cleanly. No output file means the scan returned no findings.

What Is Under the Hood

VISaR v1.0.0 ships with:

Retry logic for Docker commands, so transient failures do not abort a scan
Rotating log files in logs/ for every run — useful when debugging or auditing
Type hints throughout the codebase for readability and IDE support
~95% test coverage across 71 unit tests
Apache 2.0 licence — completely free to use, modify, and distribute

System requirements are modest: Python 3.12+, Docker Desktop with at least 2 GB available memory, and a GitHub personal access token with public_repo scope.

Why Open Source

I built VISaR as an open-source tool because the problem it solves is a shared one. Every team evaluating open-source software faces the same friction. Making VISaR free and open means any organisation can run it without procurement overhead, and contributors can extend it for their own contexts.

If VISaR is useful to you, I would welcome feedback, issues, and pull requests on GitHub.

Roadmap

The roadmap is public. The next releases will add compliance-ready evidence reports, pass/fail severity thresholds for CI/CD integration, local directory scanning, and eventually offline operation for air-gapped environments — all driven by the real needs of teams working in regulated domains.

VISaR is free and open-source under the Apache 2.0 licence. Contributions welcome.