Hi Viral @viralbshah , Stefan @StefanKarpinski, and Julia community
Ok so I was just about to post this when I read Viral’s
post making essentially the same points but more elegantly
At any rate I suppose I’ll reemphasize the points …
TLDR summarizing my point up front:
If the telemetry and other customer usage data is used to find ONLY the most Historically successful projects , and there isn’t a policy in place to more evenly distribute the funding secured from Older Historically successful projects to INCUBATE Startup New Innovative community projects there will be a higher risk that Julia will lose its New Innovative cutting edge community projects and almost immediately suffer from lack of innovation and actual progress.
IOW, as my economics professor wisely said with Economics “Incentives Matter”,
so we should be careful about exactly what is incentivized.
So I’m suggesting Julia be sure to fund incubation period for new innovative packages like
EmpiricalCDF.jl as they are just getting started. Below are some reasons why.
I’m not as much concerned with the privacy aspect
here ( Im sure it will be handled correctly ) as the fact that,
even if Pkg.jl telemetry succeeds
in counting up Package-XYZ users for funding and support
there is no clear mechanism to support ALL the
MUCH more interesting to me novel cutting edge
Packages-ABC due to their INITIAL NECESSARILY lower volume of usage.
To be specific, I see EmpiricalCDF.jl with 14 Stars from John @jlapeyre
as a novel cutting edge package here
https://juliaobserver.com/packages/EmpiricalCDFs
and so I have starred it here GitHub - jlapeyre/EmpiricalCDFs.jl: Online empirical cumulative distribution functions
where John @jlapeyre also notes the following :
I’m surprised that this module is not more popular (if stars are a good measure) because it’s rather generic, I use it frequently for new projects, and the functionality is not available elsewhere.
EmpiricalCDFs implements empirical CDFs; building, evaluating, random sampling, evaluating the inverse, etc. It is useful especially for examining the tail of the CDF obtained from streaming a large number of data, more than can be stored in memory. For this purpose, you specify a lower cutoff; data points below this value will be silently rejected, but the resulting CDF will still be properly normalized.
This ability to process and filter data ONLINE AT SCALE is ABSENT (emphasis from @marc.cox) in StatsBase – which I Note has 277 Stars ergo likely to overtake EmpiricalCDFs IF Pkg.jl telemetry succeeds and Julia.org doesn’t intentionally redistribute (any needed) funds from " … funding/awards/etc from knowing how many users we have."
HTH,
Marc
Ps> Separately I checked with John @jlapeyre about reposting his public quotes
and found out he doesn’t require funding at the moment for
GitHub - jlapeyre/EmpiricalCDFs.jl: Online empirical cumulative distribution functions … but I believe
the principle of the matter is still the same in that Julia should continue to
focus, and fund if necessary, innovation and actual progress even for specialized customers.