Julia losing popularity among Data Science users (KDnuggets Software Poll)

Imagine this scenario:

There is machine M in your corporation where is some Linux distribution installed.

M has no connection to Internet. You could log into that machine and analyze for example some logs.

So python is usable by default (there could be more restricted machines too, but you could see levels of accessibilities) you could not use pandas, because there is no conda (it is (usually) not part of distribution) pip install pandas is useless (and probably not allowed too).

python could be old (for example v.2.6.6 for Red-Hat 6.9 with endlife 2020-11)!

Installing additional software on machine M could be part of approving process included work on security department.

I think it will be same with Julia when it will be so good to be in standard installation :slight_smile:

Indeed it is a problem! See my thread on finding that Julia does not work with some corporate proxies. You cannot predict that in the future some corporate proxy will need an authentication mechanism which Julia has not implemented.
I feel you should always be able to define the source of Julia packages as a local Git style server or an NFS share. Similar to the way that Linux distros can have a locally added mirror of their Internet based repositories.
Then you can carry your precious Julia packages on a thumb drive past the chainlink fence on that
Government secure site.

1 Like

Although it is true what you write it could have common sense too.

Imagine you have account in bank where anybody could do Pkg.clone in server where secure data are present! Could it be subject of common sense? :stuck_out_tongue:

(BTW I was writing here in discourse about malicious python packages discovered in pypi several months ago)

We did that a few years ago at Dynactionize (we set up our own METADATA.jl repository, that pointed to our local clones of the GitHub repos). I don’t recall it being that difficult to set up.

Having had experience of this both on UK defence sites (UK Secure in the jargon) and in industry I do not agree. I would say that any company, if it needs certain software, will allow it to be copied to a local disk within the firewall. What they do not want is employees downloading random software which might contain malware. Or perhaps need a license which the smart lonewolf scientist has subverted, but leaves the company open to being sued.

On the defence side, I have only ever encountered one site, (Atomic Weapons), where it is forbidden to carry media on your person past the gate. And woe betide you if you try to carry any out…
In the defence sphere an audit trail is more important. When you are asked “what software is installed on that secure computer system and where did it come from” you should have the answers.

My apology for the huge quote here. I think this needs a separate topic.
The state of Python for commercial software is… intersting. I have seen several software quites which ship with their own entire Python tree, just so that the supplier can reply on a known version and a known set of modules which they need. This is certainly true of the PBS Pro batch system and Abaqus simulation.

I hope Julia does not get to this stage. Maybe in fact it will, since if commerical package A is shipped and certified to be cpmpaticle with Julia version 1.1 then the commercial suppliers may want again to ship a whole Julia tree.

I didn’t say “always” not subject to any common sense.
Having to toss toe-nail clippers when boarding a flight, was not common sense, for example.
Many times “security” rules are more for show, to give people a feeling of safety, rather than actually increasing security at all (like the questions they ask about whether you packed your bags yourself - seems like it is more for show, compared to the grilling you get at the Tel Aviv airport, where the agents are trained to spot inconsistencies and indications of untruthfulness or evasion)

But for example XY has to do, sometimes big (sometimes futile (you know - humans)) work to convince people that some Julia’s package is necessary and other python’s is not good enough.

Are we sure to want that XY is pushed to do this unpleasant bureaucratic work if we could help them with good batteries in stdlib? :wink:

Check out some of the cool stuff in the new package manager (thanks to Stefan and Kristoffer).
The Manifest.toml file can describe exactly the set of packages used (with UUIDs and sha1 values recorded),
so you can get a reproducible environment (IIUC).

2 Likes

@ScottPJones Great! Julia is thinking about these things now, not when it is too late to change.

I’m coming from the same area as Austin (I’m a physician doing basic science) and I agree with the majority of the comments of Austin

I still think that a trusted zone for packages being peer reviewed would be a great benefit for Julia usage inside companies.

I imagine that a scalable peer review process could be set up this way:
A package author who want its package to enter the trusted zone should accepts to review N (3?) other packages of the same size. The initial work is significant for a package author/team but should remain small after this, provided you are bound to these same 3 packages and review only the updates.

Otherwise, a company has 4 choices;

  • not looking at the problem
  • do the review of all package dependencies (and all updates!) itself.
  • do not use package out of stdlib (in this case the std lib should be fairly complete).
  • Not use package based languages :wink:

After all, peer reviewing is a well established strategy for the scientific community.

If a package is mission-critical, someone in the company should be familiar enough with it to contribute issues/PRs, which is tantamount to a “review”.

If the company is unwilling to do or finance this, or contribute to the Julia community in some other way, then it is unclear to me why it is relevant whether or not they are using Julia.

1 Like

I agree, but to me the problem does not arise using with a single or a few packages out of the stdlib.
The problem happens if you intend to let your employee pull (let’s say through a proxy) all the packages they want and more importantly all the package updates from the registered Julia repositories without control.

I am considering the case of a malicious package update (possibly written by its original author).

I have heard this before, that what Julia needs is more contributors, not more users.

But more users => more contributors, right?

I can say that the low number of users is in itself stopping me from contributing. I can certainly not use Julia for work (except tiny personal projects), customers have never heard of it, and are just now cautiously dipping their toes with Python. They would not know what to do with a bunch of Julia code. I cannot justify putting any real amount of work into Julia, and I am therefore basically useless to the JuliaLanguage in that perspective.

Increasing the user base, increasing the number of people who would know how to even run the code, and increasing the number of companies who will accept Julia code, would almost certainly increase the number of people who are able and willing to contribute to the language.

9 Likes

Totally agree that

more users leads to more contributors

Just look at javascript, and python…

1 Like

Thinking about Javascript… and referring to my frothing rants about not needing full time Internet connectivity.
Remember the Javascript function which did some simple 2’s complement arithmetic, which was one or two lines of code? It was used by thousands of websites worldwide. The developer had it hosted on a site, and for some reason removed that code. All those websites went down. Yes, they were quickly patched.
It shows the stupidity of pulling in two lines of code as a function instead of cutting an pasting the code.
Well, not in Julia where functions are first class objects :slight_smile:
(that was a Julia joke I suppose)

But anyway - remember that codes like Julia might be run in a particle accelerator or a wind tunnel or a race car simulator … a facility which is intentionally cut off from the Internet during safety critical runs.
I KNOW Julia doesnt do this, but that Javascript example should be noted.

At the previous company I was at, we 1) had a server with a clone of all the packages we used (and their dependencies) 2) we had our own “dynactionize” branch on our clone of METADATA.jl, which pointed to where we had the packages locally.
Every so often, someone would run a script, which would checkout a new branch of of “dynactionize”, update it with any changes from the github METADATA.jl, update all of the packages (on another server), run all of our unit test and some end-to-end tests for our product, and if everything still worked correctly, the main server was updated, and then the “dynactionize” branch was also updated.
(We also used our own compiled version of Julia, because we had to disable USE_GPL_LIBS).

If a company needed to do “extreme vetting” of the code, they could review all of the changes to the code, after the unit tests, etc. had passed, before making that the new deployable version.

I believe this process will only get easier with Pkg3 and having the Manifest, and support for alternate registries.
(thanks again, Stefan and Kris!)

2 Likes

I genuinely wonder why my concerns about package security triggered no response at all…

Is it because it is:

  • irrelevant (i.e. wrong assumption on the security issue) ?
  • totally irrelevant for readers who are not concerned with security issues ?
  • the proposed solution is not feasible ?
  • there is no general solution ?

Malicious software do exist, the problem is different if it is related to a commercial product (e.g. MATLAB), open source software (e.g. debian, debian package mainteners).

Thank you very much for any hint :wink:

I’m very concerned about all issues of security (in lots of places, including things like string handling!)

I’m hoping that Pkg3 will help with being able to lock down a specific set of packages.

Maybe you should open up a Discourse topic on “Security Issues in Julia”.
It’s much better to bring any/all concerns out into the open, rather than stick our heads in the sand.
:hear_no_evil::speak_no_evil::see_no_evil:

1 Like