I need to do a license compliance check on my dependencies

I’m struggling with the Pkg API. I want to generate a list of all julia packages and I guess artifact software that may have been installed and obtain the name, version, git URL, and license name.

Is there an easy way to do this or do I need to write some custom readers to find the directories and license files, etc.

It seems like there were some old ways to do this with METADATA, but it looks depreciated. How do I do this now?

Best Regards,
Allan Baker

1 Like

The Manifest.toml file contains a complete list of every dependency (including all transitive dependencies) for the current package environment. To check all of the dependencies for a single package, you could:

  • Create a new folder and cd to it
  • Use pkg> activate . to activate a new environment in that folder
  • Use pkg> add YourPackageName to add your target package to that environment.

You should now have a Project.toml in the folder listing just your package and a Manifest.toml listing your package and all of its transitive dependencies.

3 Likes

But how do I query the registry to lookup the git location and get the license type?

It seems like there should be a dictionary or command that lets me query the registry to get the license name and the git repository location in one fail swoop. I can’t find that.

I found this topic by search for dependency licenses and julia. In case anyone finds this again, I found the following packages to be useful:

https://knbrt.gitlab.io/LicenseGrabber.jl/

https://docs.juliahub.com/LicenseCheck

Here’s an example to generate a data frame of licenses found from all dependencies.

using DataFrames
using DataFramesMeta
using CSV
using LicenseGrabber
using LicenseCheck

using Pkg
Pkg.activate("/path/to/project")

license_locations = LicenseGrabber.getlicloc()
license_checks = Dict(
    pkg => filter(nt -> nt.license_file_percent_covered > 0, map(f -> licensecheck(read(f, String)), fs))
    for (pkg, fs) in license_locations
        )
license_approvals = Dict(pkg => all(map(nt -> (length(nt.licenses_found)) > 0 && is_osi_approved(nt), nts)) for (pkg, nts) in license_checks)
packages = collect(keys(license_locations))
licenses = [map(nt -> nt.licenses_found, license_checks[p]) for p in packages]
coverages = [map(nt -> nt.license_file_percent_covered, license_checks[p]) for p in packages]
df = @chain DataFrame(package = packages, license = licenses, coverage = coverages) begin
    flatten([:license, :coverage])
    flatten(:license)
end

Note that there may be multiple text files found per package, and multiple licenses found per text file.

2 Likes

How does this handle external python or other libraries that may be brought in during the build process? Does it catch them all? Can this be done without installing first? Can a license be checked before it is downloaded?