Secure coding in Julia?

gdalle · June 21, 2024, 11:57am

Hi all,
Someone asked a question on the ModernJuliaWorkflows repo, and I have no clue how to answer. They are looking for tools related to SAST (Static Analysis Security Testing) and SCA (Software Composition Analysis) in Julia (see issue below). Do they even exist? If so, I’ll happily add them to the blog.

github.com/modernjuliaworkflows/modernjuliaworkflows.github.io

Secure coding and tooling

opened 11:18AM - 21 Jun 24 UTC

deltamarnix

I am a rather recent solutions architect and some of my teams are using Julia. I… have concerns about secure coding and the tools that can be used to perform SAST (Static Analysis Security Testing) and SCA (Software Composition Analysis). I found out that modernjuliaworkflows addresses the issue of linting and code quality, but security analysis doesn't seem to be a part of it. Are there any tools available that could any in my journey of writing secure Julia code? And if so, I would love to see this added to this extensive resource for Julia programmers. I can also mention that I have been in contact with JuliaHub, but they only offer some sort of firewall that project admins can alter: https://help.juliahub.com/juliahub/stable/tutorials/package_analytics/ Besides that I found one company that seems to support SAST for Julia: https://semgrep.dev/blog/2023/announcing-semgrep-s-experimental-support-for-julia If there are others with more serious experience in this field for Julia, I would love to hear their stories. To me it seems there isn't really a widely accepted solution yet, but I would hope that this could be addressed in this page.

foobar_lv2 · June 21, 2024, 1:10pm

I happen to work in the SAST/SCA space in my dayjob (ugh, so much java).

So far, I was not under the impression that there is significant demand for such tooling in the julia world, given that the corporate uptake of julia is rather limited.

However, if I am mistaken and there is interest, please comment here (but no promises).

If there is potentially paying interest, even better

TimG · June 21, 2024, 2:57pm

Could this causality possibly be back to front?

I wonder if there is not much corporate uptake of Julia because such tooling is absent/limited?

mihalybaci · June 21, 2024, 3:36pm

I think it’s a true catch-22. If there is no SAST, then a company may not allow Julia use. Since companies don’t use Julia, there is no monetary incentive to develop a SAST.

foobar_lv2 · June 21, 2024, 5:58pm

I don’t think that this is a big blocker for anyone, given python adoption, but I would be delighted to hear from other people.

Generally, the thing that julia is uniquely good at – “technical computing”, how the homepage used to describe it – doesn’t lend itself so well to corporate adoption.

Consider some examples: Simulations. The recent thing where facebook prototyped their low bitrate codec in julia.

None of these are settings where SCA / SAST is important.

Now consider the things julia is not good at, but SCA / SAST are important:

The production version of the FB low bitrate codec (cannot use julia, must have tiny linked library!). Embedded applications (same reason – prototype algorithms in julia, but then you need to use C/C++). “Services” (must have medium-low-latency GC, like e.g. golang or G1GC in java; sometimes need even low-latency like ZGC). Business logic legacy morasses (want static language checking, IDEs with ease of refactoring, and limit the damage that incompetent devs can do). Server-side UI things (want nodejs + typescript, easy to hire people for; also, the issue with GC latency).

Settings where I currently see a problem is when you want to use julia for what it’s good at, but work in an environment with pretty strict security policies. Say, finite-element simulations for military applications.

Your company / legal policy might require certain things like SCA / SBOM, regardless of how much sense it makes (SAST is completely meaningless, since your attack surface is nil. Not getting backdoored is always important, though. As is knowing what e.g. chinese / russian - controlled dependencies are in your software supply chain).

If anyone has run into this problem, please talk about it here (as much as you are permitted)! Or DM me.

If there is money involved, SCA tooling for julia is certainly doable (i.e. even modest demand can induce existing SCA vendors to add julia support, and I would love to be the person tasked to add that support).

If no money is involved, SCA tooling for julia is also doable – but that’s not something I would do for fun if nobody needs it.

SAST, or static analysis in general, for julia is harder. Basically because julia is very dynamic.

I think it should be possible to build very good semi-static analysis engines for julia. But that would be a research project, breaking pretty new ground: You would build your view of the codebase not from source-code, but instead by taking typed SSA-IR, plus do some tracing on the remaining dynamic dispatches. That requires a test suite with good coverage.

That would be awesome, not just for security/SAST but for general static analysis / refactoring / development! But too big of a project for now, and SSA-IR is not very stable.

Also, almost surely commercially not viable

deltamarnix · June 25, 2024, 6:51am

Hi! I am the original poster of the question. I am a solutions architect at a research institute and therefore we are exactly in the niche market as you describe. We make a lot of simulation software and need technical computing. But my colleagues are primarily researchers and know python. But python is not sufficient for the tasks that they are doing and C++ is too hard to grasp. Julia seems to be a nice fit, but I am hesitant because some of essential tools seem to be lacking.

Now, why do we need security tooling? Our researchers tend to just grab any type of package from the web and use that. The software that we deliver in our research is used for calculations where human lives are at stake, which means that clients start asking for SBOMs and vulnerability assessments. SCA is of higher priority than SAST at the moment.

So the scenarios that you describe are correct, and especially since Julia is so community driven and no large companies seem to adopt it, I am also hesitant on continuing the efforts. On the other hand I see the enthusiasm in the teams that are working with it and I would love to keep that spirit high.

nsajko · June 25, 2024, 9:10am

There’s the Julia package PkgToSoftwareBOM.jl. That’s SCA, right?

oheil · June 25, 2024, 9:54am

Welcome @deltamarnix .
Reading about researchers grabbing every package from web made me lough and I am in a field where live is at stake too (cancer research with patients), so here is my answer. First you can read this Why is it reliable to use open source packages for research? - #4 by oheil and perhaps the whole discussion is interesting to you (well, it has a few side tracks ).

My main argument for using Julia when having security concerns is, that Julia is NOT a black box, in the opposite, Julia code is so much readable and comprehendible, that it is so easy to validate that any concerns are becoming much smaller than using some libraries from some “reliable” manufacturer. The problem with security is not the evil developer who tries to drop a backdoor for example, this is easy to spot, especially with Julia, but the unavoidable bugs which open a system up for malicious intruders in the future. For this scenario open source is better, and readable and comprehendible open source is best.

Datseris · June 25, 2024, 10:07am

Even simpler than malicius introduers: the unavoidable bugs that exist means that when you are computing quantity X to save life Y, quantity X has incorrect value due to bug in the logic/algorithm implementation. So life Y is lost.

With Julia’s (and its packages) transparency of source code and test suite, I would always trust a Julia package more than a black box matlab GUI product I have to pay for when it comes to such bugs about the things I am actually computing.

EDIT: to attach some “proof” to my statement: we just did a review in our recent article about software that are used routinely to analyze EEG, ECG and other physiological timeseries. None of the had any publicly acessible test suite, and all of them are cited dozens of times. and all of them have passed “peer review” and have been published in reputable journals. ([2406.05011] ComplexityMeasures.jl: scalable software to unify and accelerate entropy and complexity timeseries analysis)

foobar_lv2 · June 25, 2024, 10:19am

^ this should do what you need to start @deltamarnix

Otherwise, two tips regarding julia security (none of them especially rare insights, but imo under-communicated):

The vscode julia extension will execute any code it finds with full user permissions. So the “trusted workspace” thing is really important! Dev clicks on a file from discourse to simply take a peek, opens it in vscode, clicks “go away” on the nagging permission screen, bam powned. For this reason, and because I cannot trust myself to keep the vscode workspace trust thingy separated without missteps, I have the vscode extension perma-disabled unless I need it for a couple hours.
Deserialization: Some julia deserialization packages try to be smart about deserializing full typed custom objects without custom object deserialization code. This is not safe and will never be safe to do on untrusted data. Same as in java object-stream or python unpickle (some utter clowns still believe that the problems are “gadgets”, i.e. things on the class-path / in your environment that permit arbitrary code-exec from malicious object-streams. This is nonsense of the same style as “I don’t need to care about buffer overflows, I have stack protector”. Lol nope, reading an object stream is functionally equivalent to curl | bash, and doing that on untrusted data is the vuln, clearing gadgets is a super leaky mitigation that may or may not be worthwhile).

foobar_lv2 · June 25, 2024, 10:39am

That’s where SCA starts and that’s the only julia-specific you should need. I haven’t looked into the internals of PkgToSoftwareBOM.jl, so I’m not sure how well it deals with foreign-language deps / binary artifacts. But that kind of detail belongs to that package’s github.

From that SBOM, you and your customers can match against CVEs, license policies, etc. Existing language-agnostic tools and processes can ingest the spdx SBOM (potentially converting it to cycloneDX) and manage it and its history.

In many cases this is also the end of it (you attach the auto-generated SBOM to the code you deliver your customer, a checkbox on the compliance checklist can be checked, and your customer may or may not ever look at it).

nsajko · June 25, 2024, 10:59am

Shout out to the package author: @SamuraiAku

ericphanson · June 25, 2024, 11:38am

There is also Julia support in Trivy now (with Julia support added by @Octogonapus in feat: Add Julia language analyzer support by Octogonapus · Pull Request #5635 · aquasecurity/trivy · GitHub)

DeepDattaX · June 25, 2024, 12:04pm

Hi @deltamarnix and all, there are efforts being made to bring more formal static code analysis and software composition analysis to Julia. We’ve actually already built a list of about 100 rules that can be used for static code analysis using SEMGREP. You can see a video about that here: https://www.youtube.com/watch?v=RhYJ51jcf8Y&t=1352s

For SAST, the rules are here (sorry for the sign up wall): White Paper: Secure Julia Coding Best Practices - JuliaHub

For Software Composition Analysis, the best case scenario would be for a large security provider to work with the community on being a CVE authority. A lot of things in SCA need a community effort to be successful.

If anyone in the industry wants to discuss: you can email me. deep.datta (at) juliahub (dot) com

sgaure · June 25, 2024, 12:55pm

The main obstacle to using julia in such strict security environments is that one is not connected to the internet, and there is no centralized repository which can be mirrored. It’s all over github.

deltamarnix · June 25, 2024, 1:36pm

I agree on this for sure. Even if there would be SCA support, things still need to be reported to the CVE authority. Otherwise there is nothing to compare against. Some companies, like Sonatype, Synopsys, Snyk, spend money on investigating packages in order to report that was seems malicious or broken. Still it is better to have some scanning in place than nothing.

I also agree on using open source software being a white box and therefore giving the opportunity to be transparent. But I don’t want to be in charge of reviewing all the code that I’m consuming and I rather have a company performing those tests.

Also, SCA testing often also comes with license checks and operability risks. It can warn if a license of software does not match your own license, and respectively checks if the package you are using is actively maintained and developed. This can aid developers in choosing which packages are good to use, or might pose a threat.

Semgrep seems promising, I will consider it.

I will also see if I can send Julia SBOMs into a more generic system to check on transitive dependencies to other language dependencies (C++).

foobar_lv2 · June 25, 2024, 3:44pm

I took a look (version 1.1 / july 2023).

Would you care to add the two points to the next whitepaper update, since neither of them was mentioned?

(deserialization vulnerabilities, including but not limited to the built-in deserialization, and vscode defaults)

SamuraiAku · June 25, 2024, 4:24pm

Hello, developer of PkgToSoftwareBOM.jl here.

To answer a few questions from earlier in the discussion and add some more details

The SBOM does list all binary components (aka artifacts), including download location, and tarball hash
The artifacts list is tailored to the system you are running on. In other words it lists only the artifacts used on your MacOS system and if you generated the same SBOM on linux_x86 you’d get a different set of artifacts. There is an untested hook for specifying what platform you want to generate the SBOM for.
The SBOM does not include the source code that the binaries were generated from. In the general case that’s impossible and even for binaries hosted on Yggdrasil it’s too hard for me. It requires parsing the build scripts to extract the download information. Although if anyone wants to contribute a method for doing that, I’ll be happy to incorporate it
The SBOM includes the git tree hash in its download location, so it’s possible for scripting to extract the exact version from the repository for analysis
The SBOM also includes a best effort attempt at finding the software license declared by the package author for all packages and artifacts. It uses LicenseCheck.jl, a Go library for the license scanning
If a package has test dependencies other than the standard libraries (which are ignored in the SBOM), then I’m not sure what will show up in the SBOM. Certainly if they are in a separate Project.toml inside the test sub-folder they won’t show up. I’m not actually certain what happens for test dependencies listed in the main Project.toml. This is something I’m currently looking into.

deltamarnix · July 3, 2024, 10:03am

I checked out the Julia support for Trivy, but the documentation only mentions SBOM generation. It does not perform SCA or License Scanning unfortunately.

Topic		Replies	Views
Security auditing tools for Julia? Tooling	0	277	September 23, 2022
Vulnerability analysis of the julia dependency library General Usage security	0	214	July 24, 2023
Security scanning of Julia code Tooling security	5	2420	May 16, 2019
Any updates on Julia Stipple General Usage	0	268	June 9, 2021
Commercial codes using Julia - code obfuscation? General Usage cryptography	6	4837	July 18, 2018

Secure coding in Julia?

Related topics