First of all, thank you to everyone that has contributed constructively. It has been a struggle to keep up and I made a promise to myself to at least try to maintain “the big perspective” as it is all too easy to fall back on sloppy thinking and emotions.
Before I go on to write yet another long post I do want to state that I only speak for myself and make no attempt to unify every voice on the “opt-in side”. As such I will completely ignore some topics that have been discussed at length over the weekend: possible attack vectors, arbitrary code execution, etc., as they do not interest me personally and has nothing to do with my insistence on opt-in as a matter of consent. Although ultimately this is beyond the scope of this discussion, as I did mention on Hacker News I do feel the underlying issue causing this mess is that we lack a good way to ask for and give consent online and that we are left with awful proxys (do contact me by e-mail if you know good research on this matter as it does interest me). Now, on to the matter at hand.
Thank you @StefanKarpinski for your response to my initial “mega post”, it did make me think and the technical feedback was very useful. I see it as a satisfactory rebuttal of my “survey idea” and would only like to clarify one thing. I do agree that automated systems most likely have less of a right to privacy, but what I was trying to aim at in my initial post was that they also may be less interesting to us from a telemetry standpoint and that without them we could possibly safely assume that any Julia installation we are interested in will be run interactively at least once. I will not allude to this again though.
@jeff.bezanson really strikes home when he stated that the UUIDs is really what causes the strongest opposition, this is absolutely true for me (I also want to apologise for Jeff as I feel somewhat guilty of causing this statement: “…if you only ‘warn’ people about [Julia’s] package manager and not anything else, you are sending the message that [Julia] is somehow uniquely nefarious[.]”. Sorry, my intention was never to scream from the rooftops to my students, just to state factually and objectively what the telemetry entailed and allow them to make their own decision). As I did state previously, the UUIDs add additional capabilities that IP and HTTP lack in that they persist and perhaps even more importantly that they pretty much eliminates the plausible deniability that is inherent in IP and HTTP as for example NAT is no longer a possibility. Generating a UUID on someone’s device and transmitting it really is where the rubber hits the road for me. This is good, now I have finally have some clarity in regards to why I felt emotionally strongly about this when I encountered the topic and hopefully I can work back from there to something actionable to “get me on board” as I do want to be on board as I agree with the desiderata.
The clarification regarding IP logs and how they are to be separated from the Pkg logs is one of the greatest accomplishments of this thread so far – @c42f also hinting that the UUID logs could be considered a “toxic asset”. Now, a couple of naive technical questions that if answered would actually sway my opinion that opt-out can be justified in this specific case:
1.) Must one retain the 128-bit UUIDs in the logs in order to reach the desiderata? Is a lower-bound estimate in terms of usage numbers with some controllable confidence interval not sufficient so as to preserve plausible deniability and break the link between what is on the end-user system and the log? HyperLogLog springs to mind, but I am sure those of you less awful with stats and more familiar with the technical landscape know better than me.
2.) Is there any argument in favour of privacy that makes a persistent 128-bit UUID favourable to a transient 32-bit IPv4 address? Let us ignore IPv6 with its 128 bits and it still seems to be at least a decade out…
If it can be argued that there are privacy benefits of the current UUID approach under the assumption that it is never stored in association with an IP I am willing to concede. You will have won me over gradually and fairly. As I have a piece of software that is closely related to this issue I really want this to be the case as I would otherwise be perceived to make a “political statement” when releasing it – hopefully before JuliaCon – and it really is not how I wanted my “return” to the community to look like…
I guess ending with arguments I think we can leave by the wayside is now a tradition of mine:
@PetrKryslUCSD said: “[E]ach Julia executable would have a unique ID, and the telemetry would report usage tied to the executable. There would be no link between the user and the executable, hence complete privacy.” I am really sorry to pick on this one, but is it not the same as saying: “We did not track him, we tracked his car!” or “I did not kill him, the bullet did!” – it did make me smile and a judge would have a field day with this one. Again, you are a fine contributor and I am not trying to pick on you, just the argument itself.
“My code released as FOSS is served as a part of this and I object to the telemetry”: While I do respect this point of view, it is as you state indeed the case that FOSS does not take a stance on the purpose for which code is used – “The Software shall be used for Good, not Evil” springs to mind, which makes a license not FOSS. Thus, while I am sorry to hear that you feel uncomfortable about it, I am not sure how this adds much to the discussion.
“Pkg.jl is not ‘baked into’ the language as it is used for third-party code”: It comes with any release in the tarball, if that does not count as “baked in” I honestly do not know what counts.
Lastly, a lot of nonsense is written on Hacker News, but here is a comment I think many members deserve to feel proud over – assuming none of you wrote it of course…