Pkg.jl telemetry should be opt-in

While that’s one of the top priorities, it’s not the only reason. Serving requests to CI processes is expensive—network bandwidth is the primary cost of running a pkg server, not compute. Telemetry data from CI systems helps understand what people are doing in those automated processes and mitigate those expenses. For example, by deploying package servers that are colocated with CI services (so bandwidth is cheaper or even free). That’s why we check all those CI indicator variables: to try to help understand what services are making requests. If we see a huge deluge of new traffic (this is realistic and does happen already for services we host) and all we have is IP addresses, it’s much harder to figure out what’s going on than if we also have CI indicator variables, Julia version numbers, and client UUIDs, which allow us to figure out which requests are coming from the same instance and which are coming from different ones. Debugging these kinds of situations is hard and doing it completely blind is much harder, so having more context when this happens really helps.

Knowing which CI services people are using is also helpful for prioritizing quality of support for those CI services. Right now we collectively are good at supporting Travis and AppVeyor because that’s what Julia itself uses, but if we find out from CI variables that a ton of people are using Azure Pipelines, for example, then it may be worth the time and effort to make sure that works really flawlessly in the Julia ecosystem. Without those telemtry headers, we can’t know to spend time and energy on that.

27 Likes