Pkg.jl telemetry should be opt-in

Please stop. You know very well that what you are doing now is disingenuous. In no way was this at the “expense” of them (what expense did they suffer?). Anyway, for the harm, I caused them I also made a donation to their foundation. Message me if you want the receipt.

Edit: Replied to the wrong person.

4 Likes

This post was temporarily hidden by the community for possibly being off-topic, inappropriate, or spammy.

This post was temporarily hidden by the community for possibly being off-topic, inappropriate, or spammy.

It’s really getting weird. Instead of reflecting on your own count of likes and than on the validity of your position, you are attacking others because they liked a post? What you show here makes me sad.

Don’t get me wrong: your opinion is fine and it’s ok to bring it in here. But it is an opinion and it’s not you who has to decide whats wrong or right. Respect to your opinion, same to others from you.

In general about this discussion and how it went: those who argue with this highest moral standards claim to have the right position. They aren’t part of the discussion anymore, they want to decide and end the discussion, because it’s clear who is right. This isn’t working, not here, not anywhere. Please, step back, think about the real issue, the real data which is gathered and how it is done. Then think about cameras in public, google, facebook, china and social scoring. Bring everything into relation and think again. You fight at the wrong place!

The facts:

  • no personal data collected
  • open source, clear explanation
  • early and open discussion before release
  • clear goal
  • nothing hidden
  • absolutely minimal data
  • opt-out well documented and relatively easy to do (we are software developers, even the new Julia users should be able to opt-out)

I am sure, that this is the best I have ever seen in collecting data (especially the minimal approach), but it is also the reason, why it is getting so much negative response from those who see their chance to be right. Against the holy inquisition there is no argument.

At last my position: I disagree that opt-in would not give enough data, so I am also on the opt-in side, but I would opt-in for sure. Therefore and regarding the facts I am fine with opt-out. And the handling of the IP adress should be mentioned in the data document.

25 Likes

@dlakelan already explained the reasons for missing the link. That being said, I think that comments like “there isn’t yet an official readable document…” and others in the same line, including your question, must be contested.

There are questions that may be debated, but in this conversation there are also many comments implying that the telemetry feature has been dealt with obscurity from the side of developers, and that is really unfair. That document, which is a prominent example of transparency, is (twice) the top link of this conversation, not buried at all. Although everything can be improved, of course, I think that developers have clearly shown that they are very sensitive to the users’ privacy rights and transparency in this issue is one of their top priorities.

2 Likes

ok, since I’ve been mentioned twice here, I figure I should chime in on this trainwreck.

I have no issues with respect to user privacy. My objection stems from the fact that code I freely contributed to the Julia language is now going to be used, without any prior discussion, as a way for Julia Computing – an organization that didn’t exist when my first registry PR was merged – to raise revenue. I object to this use of my code, though I realize there’s little I can do about it given its open-source license.

However legal it may be, in my opinion it’s not right to appropriate the work of volunteers who have taken time to learn and promote this language and use it for your own material benefit by changing the terms under which the work was originally submitted.

Had I known that my packages would be used to track individuals for the financial benefit of some group of other people, I never would have submitted my first PR in 2015.

This move, inasmuch as it engenders hostility from developers, is also short-sighted: You can own the painting; but you don’t own the artist.

4 Likes

Hi Kristoffer,

I believe you. There was no ill intention. Still, this is a serious foundation that deals with victims of domestic abuse, so, yes, please, I encourage people to make a donation. I just did :pray:

image

3 Likes

From a practical perspective, I think it is well-defined: people with commit rights to the repo. Like all most open source projects, it is not democratic, but meritocratic: people who do the work get to define where the project goes. Note the intersection between the top contributors and people participating this topic.

Since Julia is under the MIT license, note that it is super-easy to fork, so in this sense every person or group of people gets a “vote”. It just impies a lot of work if they want to make the fork viable.

Personally, I do not to mind the telemetry too much: I am not very enthusiastic about any kind of data collection, but in case there are benefits, it should be well-designed and minimal. But if I happened to disagree with the whole thing vehemently, I still would not dream of asking that I get to “vote” about this as a “member of the community”. Since my contributions to Julia are pretty sporadic, this would be tantamount to me being involved in a decision about a project that others have dreamed up and devoted a significant amount of work to.

15 Likes

Because I have to feel ashamed, and because I feel to be pushed to donate somewhere in a complete different context, which makes me even more ashamed, because I can’t compete with this high level of ethics and moral, I am leaving this thread (and because I have articulated my opinion already twice). The only thing I am good enough is to give away this little set of data to the maintainers of Julia and its ecosystems, so I feel, I am not good enough for the people here.

1 Like

I have no issues with respect to user privacy. My objection stems from the fact that code I freely contributed to the Julia language is now going to be used, without any prior discussion, as a way for Julia Computing – an organization that didn’t exist when my first registry PR was merged – to raise revenue. I object to this use of my code, though I realize there’s little I can do about it given its open-source license.

Wait a minute – the default package server is run by Julia, the open source project, not by Julia Computing.
The idea is that the open source project may publish package usage statistics, which the package authors in turn could use to apply for funding (see e.g. the JuMP example above).

10 Likes

Then s/Julia Computing/any other organization that is using my package as a way to apply for funding which logically includes Julia Computing as well.

It certainly does not include me.

1 Like

That can help Julia Computing to raise revenue as much as any other company that does business with Julia. Julia Computing does not have special rights with respect to the data that would be collected with Pkg:

1 Like

I’m not entirely sure what scenario you’re thinking of here – that the author of some third party package would use the popularity of a package they’re not an author of to apply for funding? That seems unlikely to work, no?

In any case: You are opposed to any (public or otherwise) statistics about Julia package downloads then?

4 Likes

It’s not Julia Computing who will get the data, but “a limited subset of core Julia developers”

I missed this. This makes it even worse in my mind. Cui bono?

I am opposed to telemetry in principle. I am particularly opposed to this implementation of telemetry: not because users’ privacy is at risk (it is, but for most people it’s not a huge increase), but because the developers whose contributions are being tracked were not consulted, nor given an opportunity to opt out of having their packages participate in this user tracking.

Consider the situation where someone is vehemently opposed to this sort of setup, and then associates the work that I’ve done with the tracking, because s/he sees the tracking request when my package is added. I don’t want the headache of trying to explain that not only am I not a part of this tracking effort, I’m actually against it but there’s nothing I can do about it because the policies under which I originally submitted my code were changed out from under me sometime in mid 2020 and there was no way to withdraw.

2 Likes

Fair enough.

Do consider that GitHub already has (probably very comprehensive) stats about Julia package downloads, just by merit of them hosting something like 99% of the package ecosystem.

8 Likes

But every time you or anyone else have used git with GitHub (Microsoft) you have sent your (unique) public keys and every time someone has downloaded one of your packages their IP address has been sent to GitHub (Microsoft). This was never a problem for you before, even when these things are more intrusive than the potential UUID that will be sent? Could you elaborate on that a bit?

6 Likes

That argument doesn’t persuade me. I knew GitHub tracked activity on its site when I signed up. I knew going in what the deal was, and I accepted that in exchange for the service they provided.

I can also delete my repository at any time and stop the collection of data by hosting it in my own git repository.

The same cannot be said for the General Registry.

3 Likes

As a package developer, if I thought telemetry data would help me, I could simply add (opt-in) telemetry to my package. Does it really need to be baked into the language? The only organization that will benefit from telemetry baked into the language is JC, so it appears to me the language was modified for the benefit of JC. I’m not sure how I feel about that.

One solution that would make me feel a little better is if Pkg were moved out of Julia and hosted by Julia Computing. The data is valuable and I do think JC should benefit from it in some way. As I said in my first comment here, I think there is right a way to do this, but I don’t think we’ve found it yet.

Another solution, why not just host special / curated Julia packages on JuliaHub as an alternative to GitHub and telemetry becomes a non-issue because of course you’ll have most of the data you need in that case. Package maintainers get the benefit of being “special” with some added visibility and JC (or whoever runs JuliaHub.com ) benefits from the data.

1 Like

What about an individual (maybe a data journalist?) protecting and separating his work identity (via VPN/Tor/…) and his regular online identity. If he installed a package overlooking the telemetry warning with and without VPN/Tor/…, the user UUID would link both online identities.
I guess one would be able to deanonymize a VPN/Tor/… user with access to such statistics.

Another deanonymization scenario would be for example package developer. For example a package developer installs e.g. a semi-obscure package and then add a dependency to this package to one of his public projects on GitHub withing e.g. an hour. If a developer does this 3 times, then here is a good chance to link both.

What about specifying who will have access to the full telemetry data (I guess it will be Julia Computing, maybe it is already mentioned but I did not see it) and declare that they will not make any deanonymization attempt. I trust Julia Computing that they will abide with this statement. Additional they can also include a warrant canary statement and tell that so far they did not request from law enforcement to hand over the data and remove this statement when necessary (see [1])

But I am wondering if we cannot count the install-base of a package without this user UUID. Maybe the issue is to prevent double-counting a package installation when a user upgrades package v1.0 to v1.1. As they are two “install events” but just one user. But could this not be solved by counting “upgrade events” differently from “fresh install events” ?

Sorry if I sound paranoid, but I would really hate if the telemetry in a privacy respecting open software like julia would reveal the identity of a whistleblower as did the printer IDs (encoded as a grids of dots) in the case of Reality Winner [2]. Probably this is unrealistic, but maybe not.

[1] https://www.reuters.com/article/us-usa-cyber-reddit-idUSKCN0WX2YF
[2] https://www.theatlantic.com/technology/archive/2017/06/the-mysterious-printer-code-that-could-have-led-the-fbi-to-reality-winner/529350/

1 Like