I was curious about which languages programmers-researchers in academic were using, so I went on the public list of reviewers of the Journal of Open Source Software (JOSS) available here and grab the data.
Here the result. Please note that although JOSS publish any open source software whenever the underlying language is open source or not, I highly suspect its activity is biased toward languages that are themselves open source.
*** The 20 most "best kwown" languages...
- python ( 68.74 %)
- r ( 27.52 %)
- c++ ( 18.85 %)
- c ( 13.91 %)
- matlab ( 8.3 %)
- java ( 7.26 %)
- fortran ( 5.76 %)
- javascript ( 4.79 %)
- julia ( 4.71 %)
- bash ( 3.07 %)
- go ( 2.02 %)
- perl ( 1.65 %)
- c# ( 1.57 %)
- rust ( 1.5 %)
- php ( 1.5 %)
- ruby ( 1.27 %)
- sql ( 1.12 %)
- scala ( 0.9 %)
- haskell ( 0.82 %)
- cuda ( 0.75 %)
*** The 20 most "known" languages...
- python ( 79.43 %)
- r ( 33.88 %)
- c++ ( 31.41 %)
- c ( 27.3 %)
- matlab ( 17.88 %)
- java ( 16.45 %)
- javascript ( 12.86 %)
- fortran ( 10.62 %)
- julia ( 8.45 %)
- bash ( 6.36 %)
- perl ( 4.49 %)
- php ( 3.89 %)
- c# ( 3.66 %)
- go ( 3.14 %)
- rust ( 2.99 %)
- ruby ( 2.84 %)
- sql ( 2.24 %)
- scala ( 2.09 %)
- html ( 1.72 %)
- haskell ( 1.5 %)
*** The 4 most common sectors for the 10 most "known" languages...
python : machine learning, bioinformatics, physics, statistics,
r : bioinformatics, machine learning, statistics, genomics,
c++ : machine learning, bioinformatics, physics, statistics,
c : machine learning, bioinformatics, astrophysics, statistics,
matlab : machine learning, image processing, statistics, physics,
java : machine learning, bioinformatics, software engineering, data science,
javascript : machine learning, bioinformatics, data science, statistics,
fortran : physics, astrophysics, computational fluid dynamics, computational chemistry,
julia : machine learning, statistics, physics, data science,
bash : bioinformatics, genomics, machine learning, computational biology,
Source code
# Source: reviewer database of JOSS at https://docs.google.com/spreadsheets/d/1PAPRJ63yq9aPC1COLjaQp8mHmEq3rZUzwUYxTulyu78/edit#gid=856801822
using OdsIO
# Loading data..
dataFile = "joss_reviewers_20200724.ods"
db = ods_read(dataFile,range=((4,2),(1340,9)))
# removing email
db = hcat(db[:,1:2],db[:,5:end])
# replacing "nothing"....
# ..with empty string in the first three columns...
for r in eachrow(db)
for cidx in 1:3
r[cidx] = isnothing(r[cidx]) ? "" : r[cidx]
end
end
# ..and with zero in the number of reviews...
for r in eachrow(db)
for cidx in 4:6
r[cidx] = isnothing(r[cidx]) ? 0 : r[cidx]
end
end
# Converting first 3 columns to string and last 4 to integers
db = convert(Array{Union{String,Int64},2},db)
# Cleaning..
for r in eachrow(db)
for cidx in 1:3
# ugly...
r[cidx] = replace(replace(replace(replace(replace(r[cidx], '/'=>','), '('=>','), ')'=> ','), '\n'=> ',') , "and"=> ',') |> strip |> lowercase
r[cidx] = replace(r[cidx],", " => ',') # to avoid empty data
r[cidx] = replace(r[cidx]," ," => ',') # to avoid empty data
r[cidx] = replace(r[cidx], r",$" => "") # remove ending comma
end
end
# Establishing vocabolaries
vocLangs = Set{String}()
vocActivities = Set{String}()
for (ridx,r) in enumerate(eachrow(db))
##if ridx > 20 break end
for cidx in 1:2
#=
debug = strip.(split(r[cidx],','))
for l in debug
if l == ""
println(l)
println(ridx)
println(cidx)
end
end
=#
if r[cidx] == "" continue end
push!(vocLangs,strip.(split(r[cidx],','))...)
end
for cidx in 3:3
if r[cidx] == "" continue end
push!(vocActivities,strip.(split(r[cidx],','))...)
end
end
vocLangs = collect(vocLangs)
vocActivities = collect(vocActivities)
langIdx = Dict{String,Int64}()
[langIdx[l] = id for (id,l) in enumerate(vocLangs)]
actIdx = Dict{String,Int64}()
[actIdx[a] = id for (id,a) in enumerate(vocActivities)]
nLangs = length(vocLangs)
nActs = length(vocActivities)
nRecords = size(db,1)
preferredLangCount = zeros(Int64,nLangs)
competentLangCount = zeros(Int64,nLangs)
actCountByLang = zeros(Int64,nLangs,nActs)
# Let's count!
for r in eachrow(db)
plangs = strip.(split(r[1],','))
olangs = strip.(split(r[2],','))
langs = union(Set(plangs),Set(olangs))
acts = strip.(split(r[3],','))
[preferredLangCount[langIdx[l]] += 1 for l in plangs if l != ""]
[competentLangCount[langIdx[l]] += 1 for l in langs if l != ""]
[actCountByLang[langIdx[l],actIdx[a]] += 1 for l in langs, a in acts if l != "" && a != ""]
end
# Let's report:
n = 20
println("*** The $n most \"best kwown\" languages...")
sortIdx = reverse(sortperm(preferredLangCount))[1:n]
[println("- $(rpad(vocLangs[i],12))\t ( $(round(100*preferredLangCount[i]/nRecords,digits=2)) %)") for i in sortIdx]
n = 20
println("*** The $n most \"known\" languages...")
sortIdx = reverse(sortperm(competentLangCount))[1:n]
[println("- $(rpad(vocLangs[i],12))\t ( $(round(100*competentLangCount[i]/nRecords,digits=2)) %)") for i in sortIdx]
n = 10
n2 = 4
println("*** The $n2 most common sectors for the $n most \"known\" languages...")
sortIdx = reverse(sortperm(competentLangCount))[1:n]
for i in sortIdx
lang = vocLangs[i]
sortIdxActs = reverse(sortperm(actCountByLang[i,:]))[1:n2]
print("$(rpad(lang,12)): \t")
[print("$(vocActivities[j]), ") for j in sortIdxActs]
print("\n")
end