Interesting observations about Julia language google trends time series

xiaodai · March 5, 2018, 9:02pm

Update: removed the peak in a followup in the thread

I have downloaded some google trends data about Julia and I have I did some basic anlaysis, and it looks like Julia is on a slightly downward trend.

julia_ts_decomposed

Also there is typically a big dip on gogole trends in December for other languages. December is usually time off for people, so there is usually a big dropoff which you can quantify by estimating the seasonality for each month, e.g. for R the December seasonality is 89% i.e. December search volumes is usually 11% lower than what the trend predicts.

If I don’t moreve the spike in Feb 2015, then for Julia the December seasonality is 1.00036, so there is no drop off. This could mean that Julia isn’t used so much at work and so when workers take time off it’s not as affected or it could mean that people are interested in Julia and want to use their free time to learn more . Interestingly, Julia’s Jun, Jul, Aug, Sep, Oct search volumes seasonality factor are all below 1.

If I remove the spike then the December seasonality is 0.9936326.

Here is my code. I plan to expand this into a look at all popular languages

using RCall
using GLM, DataFrames, DataFramesMeta, Lazy, Plots

langs = DataFrame(
    name = ["julia", "r"], 
    lang = ["/m/0j3djl7", "/m/0212jm"]
)

# jld = gtrends(keyword = "/m/0j3djl7") #julia
# jld = gtrends(keyword = "/m/0212jm") # r
# jld = gtrends(keyword = "/m/09gbxjr") # go
# jld = gtrends(keyword = "python")
# jld = gtrends(keyword = "/m/0n50hxv") #typescript
# jld = gtrends("/m/02js86") # groovy
# jld = gtrends("/m/0ncc1sv") #elm
# jld = gtrends(keyword = "/m/06ff5")
# jld = gtrends(keyword = "/m/02l0yf8", time="all") # sas

function analysis_trend(geo = "", lang = "/m/0j3djl7", title = "")
    res = R"""
    library(gtrendsR)
    gtrends(keyword = $lang, geo = $geo)[[1]] #julia
    """

    # trends dataset
    df = DataFrame(res)

    # create a daily average
    df1 = DataFrame(
        day = reduce(vcat, (a .- Dates.Day.(0:6) for a in Date.(df[:date]))),
        hits = repeat(df[:hits], inner = 7)
    )

    sort!(df1, cols = :day)

    df1[:year] = Dates.year(df1[:day])
    df1[:month] = Dates.month(df1[:day])

    df2 = @> df1 begin
        @by([:year, :month], meanh = mean(:hits))
    end

    sort!(df2, cols=[:year, :month])

    df3 = deepcopy(df2)
    if lang == "/m/0j3djl7" # if julia then clean up
        # there is a big spike in Feb 2015 so smooth that out
        mm = maximum(df2[:meanh])
        @> df2 begin
            @where(:meanh .== mm)
        end

        mdf2 = @> df2 begin
            @where((:year .== 2015) .& (:month .== 1) .| (:year .== 2015) .& (:month .== 3))
        end

        mmm = mean(mdf2[:meanh])

        idx = find(
            (df2[:year] .== 2015) .& (df2[:month] .== 2)
        )

        df3 = deepcopy(df2)
        df3[idx,:meanh] = mmm
        plot(df2[:meanh], label="original")
        plot!(df3[:meanh], label="removed spike")
        savefig("julia_trend.png")
    end

    @rput df3

    dt  = R"""
    png("julia_ts_decomposed.png")
    dt = decompose(ts(df3$meanh, deltat=1/12, start=c(2013, 3)), type="m")
    plot(dt)
    dev.off()
    dt
    """
end

analysis_trend("US") # Julia in the US
analysis_trend() # Julia in the world
analysis_trend("", "/m/0212jm") # R in the world

Mattriks · March 5, 2018, 9:28pm

Did you do the comparison with the first 5 years of a ‘new’ language? That would be statistically interesting.

ChrisRackauckas · March 5, 2018, 9:55pm

Academics don’t take December off. Scratch that, they don’t take time off. You just found proof . It’s 1.00036, not 1.0, because there’s a few people who don’t publish in December (they indeed perished).

Tamas_Papp · March 6, 2018, 7:23am

Note that two-sided filters are notorious for introducing spurious patterns (eg Hamilton (2017) has some nice examples). There was some kind of a blip in 2015, which is generating the whole hump. Otherwise, there is no apparent trend either way since 2014 (not that one can draw strong conclusions from Google trends anyway).

xiaodai · March 6, 2018, 10:36am

I updated the originally analysis by first removing the peak in Feb 2015 (what event happened then?). Now the seasonality in Dec is 0.9936983 so there is a bit of slacking off there . I repeated the anlaysis for US only and I found the same pattern, and I assume December is big thing in the US due to Christmas.

julia_ts_decomposed

ChrisRackauckas · March 6, 2018, 10:43am

It’s those gosh darn Millennials.

giordano · March 6, 2018, 10:43am

I guess Julia 1.0 should be released around Christmas then.

xiaodai · March 6, 2018, 10:51am

Now I just need a Google Trends Julia Package and a time series package in Julia to complete the analysis in pure Julia

y4lu · March 14, 2018, 5:37am

A little attempt at an overall season-length trend
Shown are 8 roughly balanced centroids, and the mean with ± 1 standard deviation
There appears to be about a 5% of St. dev. increase over 13 weeks as the current long-term average
season-trend

mash-up of sub-components
season-trendb

Fred · March 14, 2018, 7:49am

@xiaodai please could you explain the “@>” in your code ?

xiaodai · March 14, 2018, 7:57am

It’s from Lazy.jl it means pipe the results of the last line into the first argument of macro/function.

So

@> df begin
  FN(2)
  GN(7)
end

Is the same as

GN(FN(df,2),7)

xiaodai · March 14, 2018, 7:58am

In layman’s terms?

y4lu · March 14, 2018, 9:17am

I might conclude that it isn’t on too much of a downward trend just yet, but the tools used were a bit crusty
Or, on average over a given three months, approx 5% of the amount it jumps around translates to an increase, if that’s a sensible metric

balanced as in the centroids are roughly proportionate, or have a similar weight / number of units per group
season-length would be like seasonal data augmentation, so allowing overlapped offsets of 13 week sequences (~5x13 > 52x13)

Code for a couple of k-means variants, balanced (as above, prefix zl_) and balanced + floating (prefix zf_) which was the alg used

Topic		Replies	Views
Julia ranking trend, TIOBE, RedMonk Community	81	15699	October 16, 2019
Fall in GitHub code frequency General Usage question	7	898	September 20, 2020
Julia is going upwards: Redmonk rank Community	21	3477	March 15, 2018
Julia losing popularity among Data Science users (KDnuggets Software Poll) Community	146	19805	June 23, 2018
The 2021 Julia language survey has completed Announcements survey	26	4133	July 6, 2021

Interesting observations about Julia language google trends time series

Related topics