Thank you. Also, on re-reading, my message does sound a bit snarky and please accept my apologies as well. I wanted to highlight the fact that the data collected is a community resource, mainly to be used for making Julia and the package ecosystem better for all of us. Thus, while the concerns raised are valid - we are actively making trade-offs as a community about what we will let ourselves collect and analyze.
Just because they wont store it, doesn’t mean an advanced persistent threat won’t store both UUID and the IP address together as it intercepts all data.
Perhaps the transmitted UUID and metadata should be encrypted as it is transferred over the internet, instead of using plain text, to prevent data collection from packet sniffing and so on.
And I mean encrypting all Pkg server traffic over the internet as a whole, not just the UUID. That way you can separate the IP and the UUID when it is decrypted and somewhat prevent lazy attacks listening in on plain text requests for Pkg data.
Pkg refuses to use HTTP unless the host is local, even if you explicitly use an http:// URL as your package server value. This prevents people from accidentally leaving themselves open to snooping or MITM attacks.
The short TL/DR is that no UUIDs are sent in Julia 1.5. Julia 1.5 sends less information and is more protective of the information than Python is — and said information is only sent if and when you download packages from a package server. The package server is easily changed. The slightly longer version is the marked solution in this thread.
To be abundantly clear:
The data collected is not owned or managed by MIT, the Julia Lab, nor Julia Computing. It’s a community resource.
IP addresses are sent because they’re needed to send the packages. It’s kinda how the internet works.
IP addresses are only stored to help identify abuse and DDoS attacks (intentional or not) and thus are purged on a regular basis.
This data is not for targeting ads or emails.
If your sensitive research topic can be revealed through open source package usage, you may want to re-evaluate your security model.
If you don’t care about this issue, please don’t spread FUD about it.
Gotchya - missed that this was marked as solved. All I did was click Summary (which did reduce the thread from 370+ posts to ~80). Basically was trying to install julia 1.5 and remembered some of the discussion, then saw the notice on the website(didn’t coincide?), so I tried to check the thread, missed the solution and saw a huge pile of text with lots of turns. My bad
That page is still linked to if you’re using Julia master, so I’ve left it as-is. It is no longer linked to from Julia 1.5 or anywhere else since the release does not use UUIDs or send any other data that can be used to track or profile users. There is a notice on the Julia downloads page about pkg servers seeing IP address (like all servers). That page will be updated or deleted whenever master changes.
Those two quotes are consistent in that it’s Julia community project behind the Pkg server, not MIT / the Julia lab or Julia computing, even though the individual people doing a lot of the development etc have various overlapping affiliations with those groups. Discourse says this post has already been linked, but I think it’s worth sharing again: The Julia Project and Its Entities. Even when you say “Most companies would not have gone as far as this” it’s kind of off, because there isn’t a company doing this, it’s the Julia project. If we were downloading packages from Julia Computing it would probably be in the context of Julia Team or one of their products. Here we are downloading open source projects from the Julia community, using Julia community infrastructure.
Thanks for explaining this. Eric that link really clarifies most of my concerns about Julia/JuliaComputing/etc. I wish I would have seen this like … 2 years ago. I won’t even share how I thought it was structured because it could be misinterperetted later by a passerby.
So the language itself is more insulated then I thought - which is good
The client uuid was removed from the release because of this discussion, so we don’t know how many users there are. Download statistics are being collected, however, and data about download numbers will be published once the infrastructure is in place to aggregate, process and publish that data.