Pkg.jl telemetry should be opt-in

Thank you. Also, on re-reading, my message does sound a bit snarky and please accept my apologies as well. I wanted to highlight the fact that the data collected is a community resource, mainly to be used for making Julia and the package ecosystem better for all of us. Thus, while the concerns raised are valid - we are actively making trade-offs as a community about what we will let ourselves collect and analyze.

Yes, that is quite a reasonable expectation.

-viral

9 Likes

Just because they wont store it, doesn’t mean an advanced persistent threat won’t store both UUID and the IP address together as it intercepts all data.

Perhaps the transmitted UUID and metadata should be encrypted as it is transferred over the internet, instead of using plain text, to prevent data collection from packet sniffing and so on.

And I mean encrypting all Pkg server traffic over the internet as a whole, not just the UUID. That way you can separate the IP and the UUID when it is decrypted and somewhat prevent lazy attacks listening in on plain text requests for Pkg data.

I was under the impression from Pkg.jl#1377 that

Both protocols work over HTTPS, using only GET and HEAD requests

5 Likes

Do you just mean HTTPS?

1 Like

Yea, HTTPS would do it, I wasn’t aware of what kind of data transmission would be used for this. It really should be using that by default for all this.

Pkg refuses to use HTTP unless the host is local, even if you explicitly use an http:// URL as your package server value. This prevents people from accidentally leaving themselves open to snooping or MITM attacks.

34 Likes

I think that this is uncalled for.

4 Likes

The short TL/DR is that no UUIDs are sent in Julia 1.5. Julia 1.5 sends less information and is more protective of the information than Python is — and said information is only sent if and when you download packages from a package server. The package server is easily changed. The slightly longer version is the marked solution in this thread.

To be abundantly clear:

  • The data collected is not owned or managed by MIT, the Julia Lab, nor Julia Computing. It’s a community resource.
  • IP addresses are sent because they’re needed to send the packages. It’s kinda how the internet works.
  • IP addresses are only stored to help identify abuse and DDoS attacks (intentional or not) and thus are purged on a regular basis.
  • This data is not for targeting ads or emails.
  • If your sensitive research topic can be revealed through open source package usage, you may want to re-evaluate your security model.

If you don’t care about this issue, please don’t spread FUD about it.

22 Likes

Thanks for clarifying! Not trying to spread FUD - but having worked at universities, in security like roles, and general industrial environments, those are the things that come to mind.

I’ll remove my posts, its good to know how you all decided to handle the situation and have really thought about protecting the end users. Most companies would not have gone as far as this.

Would it be worth making some kind of official post stating the resolution of this thread because it’s huge and full of twists and turns?

There’s a summary post marked as the solution, no?

4 Likes

Gotchya - missed that this was marked as solved. All I did was click Summary (which did reduce the thread from 370+ posts to ~80). Basically was trying to install julia 1.5 and remembered some of the discussion, then saw the notice on the website(didn’t coincide?), so I tried to check the thread, missed the solution and saw a huge pile of text with lots of turns. My bad

1 Like

https://julialang.org/legal/data/ should be updated with the new information though (it still mentions UUIDs for example).

2 Likes

yea thats kinda what I’m saying, theres little tid bits here and there not exactly lining up.
IE:

Data sent to pkg.julialang.org is only accessible to a limited subset of core Julia developers and is not made public or shared with any third party.

vs

The data collected is not owned or managed by MIT, the Julia Lab, nor Julia Computing. It’s a community resource.

edit - world moves fast I get it, just trying to know what I’m supporting and signing up for hahaha.

1 Like

That page is still linked to if you’re using Julia master, so I’ve left it as-is. It is no longer linked to from Julia 1.5 or anywhere else since the release does not use UUIDs or send any other data that can be used to track or profile users. There is a notice on the Julia downloads page about pkg servers seeing IP address (like all servers). That page will be updated or deleted whenever master changes.

3 Likes

Those two quotes are consistent in that it’s Julia community project behind the Pkg server, not MIT / the Julia lab or Julia computing, even though the individual people doing a lot of the development etc have various overlapping affiliations with those groups. Discourse says this post has already been linked, but I think it’s worth sharing again: The Julia Project and Its Entities. Even when you say “Most companies would not have gone as far as this” it’s kind of off, because there isn’t a company doing this, it’s the Julia project. If we were downloading packages from Julia Computing it would probably be in the context of Julia Team or one of their products. Here we are downloading open source projects from the Julia community, using Julia community infrastructure.

5 Likes

Thanks for explaining this. Eric that link really clarifies most of my concerns about Julia/JuliaComputing/etc. I wish I would have seen this like … 2 years ago. I won’t even share how I thought it was structured because it could be misinterperetted later by a passerby.

So the language itself is more insulated then I thought - which is good

3 Likes

Will any data (number of users (by package?), etc.) be published? Or has this already happened?
I would be interested to see such statistics.

1 Like

The client uuid was removed from the release because of this discussion, so we don’t know how many users there are. Download statistics are being collected, however, and data about download numbers will be published once the infrastructure is in place to aggregate, process and publish that data.

17 Likes

https://julialang.org/legal/data/

This page is returning a 404. Is there a new location for this page?

No, since the telemetry was removed.

5 Likes