not sure if this is the correct category to post, let me know if I should change it
Alright, so I am looking for an accurate way to count the lines of code in a Julia .jl file. Ideally I’d like a count of actual source code lines, comments, and documentation string lines.
Does such a thing exist? So far I have not been able to find something that really can count all three…
We want this answer to make a strong selling point for our Julia packages, which, in contrast to competitors, have a really low count of actual source code lines. But due to the documentation quality, they have lots of docs, so justs counting the lines of the Julia files isn’t really helpful to make the point clear!
function countlines_docs(filename)
n = i = 0
ixd = Tuple{Int64, Int64}[]
open(filename) do io
while !eof(io)
i += 1
if occursin(r"^\S*\"\"\"", readline(io))
i1 = i
n += 2
while !occursin(r"^\S*\"\"\"", readline(io))
i += 1
n += 1
end
push!(ixd, (i1, i))
end
end
end
return n, ixd
end
function countlines_comm(filename, ixd)
n = i = 0
ixc = Tuple{Int64, Int64}[]
open(filename) do io
while !eof(io)
i += 1
if occursin(r"^\S*\#\=", readline(io))
i1 = i
n += 2
while !occursin(r"\=\#$", readline(io))
n += 1
end
push!(ixc, (i1, i))
end
end
end
open(filename) do io
i = 0
while !eof(io)
i += 1
if occursin(r"^\S*\#[^\=]", readline(io)) && !any(t[1] ≤ i ≤ t[2] for t in ixd)
n += 1
end
end
end
return n, ixc
end
function countlines_blank(filename, ixd, ixc)
n = 0
open(filename) do io
i = 0
while !eof(io)
i += 1
if occursin(r"^\s*$", readline(io)) && !any(t[1]≤i≤t[2] for t in ixd) && !any(t[1]≤i≤t[2] for t in ixc)
n += 1
end
end
end
return n
end
using PrettyTables
function countlines_juliafile(filename)
n_tot = countlines(filename)
n_docs, ixd = countlines_docs(filename)
n_comm, ixc = countlines_comm(filename, ixd)
n_blank = countlines_blank(filename, ixd, ixc)
n_code = n_tot - n_docs - n_comm - n_blank
header = ["File", "#code", "#comments", "#doc", "#blanks", "total"]
data = [basename(filename) n_code n_docs n_comm n_blank n_tot]
pretty_table(data, header=header, header_crayon=crayon"blue bold", alignment=:c, formatters=ft_printf("%i",1:6))
return n_code, n_docs, n_comm, n_blank, n_tot
end
# TEST EXAMPLE:
filename = raw"C:\Users\jrafa\.julia\config\startup.jl"
countlines_juliafile(filename)
@rafael.guerra This is fantastic and absolutely what I needed!!! (PackageAnalyzer.jl isn’t good enough for me: it doesn’t take into account docstrings. Or, maybe I have misunderstood its docs if it does…)
@rafael.guerra Do you mind if I improve on your code, put it into a DataFrames.jl analysis pipeline, and make it run on a Package, so that it gives details about all directories (src, docs, test) and all files of the package, and then also gives ratios at the end? I can publish it as a simple package and add you as a co-owner in the MIT license.
Just came back here to say that PackageAnalyzer.jl v3 works perfectly with respect to its handling of docstrings now, so it perfectly fits my goals and reports excellent summary of package code stats. Eg:
julia> @time analyze(ComplexityMeasures)
0.204804 seconds (89.49 k allocations: 16.343 MiB, 3.73% gc time)
PackageV1 ComplexityMeasures:
* repo:
* uuid: ab4b797d-85ee-42ba-b621-05d793b346a2
* version: missing
* is reachable: true
* tree hash: bf6898c5ef0f416a90664c997f0de46b0c5dcb7f
* Julia code in `src`: 3813 lines
* Julia code in `ext`: 0 lines (0.0% of `test` + `src` + `ext`)
* Julia code in `test`: 2357 lines (38.2% of `test` + `src` + `ext`)
* documentation in `docs`: 1493 lines (28.1% of `docs` + `src` + `ext`)
* documentation in README & docstrings: 3943 lines (50.8% of README + `src`)
* has license(s) in file: MIT
* filename: LICENSE
* OSI approved: true
* has `docs/make.jl`: true
* has `test/runtests.jl`: true
* has continuous integration: true
* GitHub Actions
glad it’s working! I tried a bunch of tools like cloc, tokei, etc, and none of them handled Julia docstrings correctly, so starting in v3 PackageAnalyzer has it’s own line counting implementation based on JuliaSyntax so we can try to handle things correctly. (We still use tokei for other stuff like TOML files).
If you want to see how particular lines are categorized, you can use PackageAnalyzer.LineCategories. For example, taking a look into the implementation code:
julia> using PackageAnalyzer
julia> file = joinpath(pkgdir(PackageAnalyzer), "src", "LineCategories.jl")
"/Users/eph/.julia/packages/PackageAnalyzer/ddM8Z/src/LineCategories.jl"
julia> PackageAnalyzer.LineCategories(file)
1 | Comment | # Here, we assign a category to every line of a file, with help from JuliaSyntax
2 | Comment | # Module to make it easier w/r/t/ import clashes
3 | Code | module CategorizeLines
4 | Code | export LineCategories, LineCategory, Blank, Code, Docstring, Comment, categorize_lines!
5 | Blank |
6 | Code | using JuliaSyntax: GreenNode, is_trivia, haschildren, is_error, children, span, SourceFile, Kind, kind, @K_str, source_line
7 | Blank |
8 | Comment | # Every line will have a single category. This way the total number across all categories
...
Note that I made the implementation choice to give every line exactly one category, which means we have to choose sometimes, since there can be comments on lines with code and so forth.