Interesting observations about Julia language google trends time series

Update: removed the peak in a followup in the thread

I have downloaded some google trends data about Julia and I have I did some basic anlaysis, and it looks like Julia is on a slightly downward trend.

julia_ts_decomposed

Also there is typically a big dip on gogole trends in December for other languages. December is usually time off for people, so there is usually a big dropoff which you can quantify by estimating the seasonality for each month, e.g. for R the December seasonality is 89% i.e. December search volumes is usually 11% lower than what the trend predicts.

If I don’t moreve the spike in Feb 2015, then for Julia the December seasonality is 1.00036, so there is no drop off. This could mean that Julia isn’t used so much at work and so when workers take time off it’s not as affected or it could mean that people are interested in Julia and want to use their free time to learn more :slight_smile:. Interestingly, Julia’s Jun, Jul, Aug, Sep, Oct search volumes seasonality factor are all below 1.

If I remove the spike then the December seasonality is 0.9936326.

Here is my code. I plan to expand this into a look at all popular languages

using RCall
using GLM, DataFrames, DataFramesMeta, Lazy, Plots

langs = DataFrame(
    name = ["julia", "r"], 
    lang = ["/m/0j3djl7", "/m/0212jm"]
)

# jld = gtrends(keyword = "/m/0j3djl7") #julia
# jld = gtrends(keyword = "/m/0212jm") # r
# jld = gtrends(keyword = "/m/09gbxjr") # go
# jld = gtrends(keyword = "python")
# jld = gtrends(keyword = "/m/0n50hxv") #typescript
# jld = gtrends("/m/02js86") # groovy
# jld = gtrends("/m/0ncc1sv") #elm
# jld = gtrends(keyword = "/m/06ff5")
# jld = gtrends(keyword = "/m/02l0yf8", time="all") # sas

function analysis_trend(geo = "", lang = "/m/0j3djl7", title = "")
    res = R"""
    library(gtrendsR)
    gtrends(keyword = $lang, geo = $geo)[[1]] #julia
    """

    # trends dataset
    df = DataFrame(res)

    # create a daily average
    df1 = DataFrame(
        day = reduce(vcat, (a .- Dates.Day.(0:6) for a in Date.(df[:date]))),
        hits = repeat(df[:hits], inner = 7)
    )

    sort!(df1, cols = :day)

    df1[:year] = Dates.year(df1[:day])
    df1[:month] = Dates.month(df1[:day])

    df2 = @> df1 begin
        @by([:year, :month], meanh = mean(:hits))
    end

    sort!(df2, cols=[:year, :month])

    df3 = deepcopy(df2)
    if lang == "/m/0j3djl7" # if julia then clean up
        # there is a big spike in Feb 2015 so smooth that out
        mm = maximum(df2[:meanh])
        @> df2 begin
            @where(:meanh .== mm)
        end

        mdf2 = @> df2 begin
            @where((:year .== 2015) .& (:month .== 1) .| (:year .== 2015) .& (:month .== 3))
        end

        mmm = mean(mdf2[:meanh])

        idx = find(
            (df2[:year] .== 2015) .& (df2[:month] .== 2)
        )

        df3 = deepcopy(df2)
        df3[idx,:meanh] = mmm
        plot(df2[:meanh], label="original")
        plot!(df3[:meanh], label="removed spike")
        savefig("julia_trend.png")
    end

    @rput df3

    dt  = R"""
    png("julia_ts_decomposed.png")
    dt = decompose(ts(df3$meanh, deltat=1/12, start=c(2013, 3)), type="m")
    plot(dt)
    dev.off()
    dt
    """
end

analysis_trend("US") # Julia in the US
analysis_trend() # Julia in the world
analysis_trend("", "/m/0212jm") # R in the world
2 Likes

Did you do the comparison with the first 5 years of a ‘new’ language? That would be statistically interesting.

4 Likes

Academics don’t take December off. Scratch that, they don’t take time off. You just found proof :wink:. It’s 1.00036, not 1.0, because there’s a few people who don’t publish in December (they indeed perished).

9 Likes

Note that two-sided filters are notorious for introducing spurious patterns (eg Hamilton (2017) has some nice examples). There was some kind of a blip in 2015, which is generating the whole hump. Otherwise, there is no apparent trend either way since 2014 (not that one can draw strong conclusions from Google trends anyway).

1 Like

I updated the originally analysis by first removing the peak in Feb 2015 (what event happened then?). Now the seasonality in Dec is 0.9936983 so there is a bit of slacking off there :stuck_out_tongue_winking_eye:. I repeated the anlaysis for US only and I found the same pattern, and I assume December is big thing in the US due to Christmas.

julia_ts_decomposed

It’s those gosh darn Millennials.

1 Like

I guess Julia 1.0 should be released around Christmas then.

Now I just need a Google Trends Julia Package and a time series package in Julia to complete the analysis in pure Julia

A little attempt at an overall season-length trend
Shown are 8 roughly balanced centroids, and the mean with ± 1 standard deviation
There appears to be about a 5% of St. dev. increase over 13 weeks as the current long-term average
season-trend

mash-up of sub-components
season-trendb

@xiaodai please could you explain the “@>” in your code ? :face_with_raised_eyebrow:

It’s from Lazy.jl it means pipe the results of the last line into the first argument of macro/function.

So

@> df begin
  FN(2)
  GN(7)
end

Is the same as

GN(FN(df,2),7)
1 Like

In layman’s terms?

I might conclude that it isn’t on too much of a downward trend just yet, but the tools used were a bit crusty
Or, on average over a given three months, approx 5% of the amount it jumps around translates to an increase, if that’s a sensible metric

  • balanced as in the centroids are roughly proportionate, or have a similar weight / number of units per group
  • season-length would be like seasonal data augmentation, so allowing overlapped offsets of 13 week sequences (~5x13 > 52x13)

Code for a couple of k-means variants, balanced (as above, prefix zl_) and balanced + floating (prefix zf_) which was the alg used