Accurate counter of source files: source lines, comments, documentation strings

not sure if this is the correct category to post, let me know if I should change it

Alright, so I am looking for an accurate way to count the lines of code in a Julia .jl file. Ideally I’d like a count of actual source code lines, comments, and documentation string lines.

Does such a thing exist? So far I have not been able to find something that really can count all three…

We want this answer to make a strong selling point for our Julia packages, which, in contrast to competitors, have a really low count of actual source code lines. But due to the documentation quality, they have lots of docs, so justs counting the lines of the Julia files isn’t really helpful to make the point clear!

Eg cloc counts source/comment/blank lines separately, for many languages including Julia. Not sure about separate doc files and dirs.

PackageAnalyzer.jl might be of interest.

1 Like

FWIW, you may check and improve the basic code below, which processes a single *.jl file and outputs:

Basic code
function countlines_docs(filename)
    n = i = 0
    ixd = Tuple{Int64, Int64}[]
    open(filename) do io
        while !eof(io)
            i += 1
            if occursin(r"^\S*\"\"\"", readline(io))
                i1 = i
                n += 2
                while !occursin(r"^\S*\"\"\"", readline(io))
                    i += 1
                    n += 1
                end
                push!(ixd, (i1, i))
            end
        end
    end
    return n, ixd
end

function countlines_comm(filename, ixd)
    n = i = 0
    ixc = Tuple{Int64, Int64}[]
    open(filename) do io
        while !eof(io)
            i += 1
            if occursin(r"^\S*\#\=", readline(io))
                i1 = i
                n += 2
                while !occursin(r"\=\#$", readline(io))
                    n += 1
                end
                push!(ixc, (i1, i))
            end
        end
    end
    open(filename) do io
        i = 0
        while !eof(io)
            i += 1
            if occursin(r"^\S*\#[^\=]", readline(io)) && !any(t[1] ≤ i ≤ t[2] for t in ixd)
                n += 1
            end
        end
    end
    return n, ixc
end

function countlines_blank(filename, ixd, ixc)
    n = 0
    open(filename) do io
        i = 0
        while !eof(io)
            i += 1
            if occursin(r"^\s*$", readline(io)) && !any(t[1]≤i≤t[2] for t in ixd) && !any(t[1]≤i≤t[2] for t in ixc)
                n += 1
            end
        end
    end
    return n
end


using PrettyTables

function countlines_juliafile(filename)
    n_tot  = countlines(filename)
    n_docs, ixd = countlines_docs(filename)
    n_comm, ixc = countlines_comm(filename, ixd)
    n_blank = countlines_blank(filename, ixd, ixc)
    n_code = n_tot - n_docs - n_comm - n_blank

    header = ["File", "#code", "#comments", "#doc", "#blanks", "total"]
    data = [basename(filename) n_code n_docs n_comm n_blank n_tot]
    pretty_table(data, header=header, header_crayon=crayon"blue bold", alignment=:c, formatters=ft_printf("%i",1:6))

    return n_code, n_docs, n_comm, n_blank, n_tot
end


# TEST EXAMPLE:
filename = raw"C:\Users\jrafa\.julia\config\startup.jl"
countlines_juliafile(filename)
1 Like

@rafael.guerra This is fantastic and absolutely what I needed!!! (PackageAnalyzer.jl isn’t good enough for me: it doesn’t take into account docstrings. Or, maybe I have misunderstood its docs if it does…)

@rafael.guerra Do you mind if I improve on your code, put it into a DataFrames.jl analysis pipeline, and make it run on a Package, so that it gives details about all directories (src, docs, test) and all files of the package, and then also gives ratios at the end? I can publish it as a simple package and add you as a co-owner in the MIT license.

1 Like

@Datseris, I’m glad I did something useful with my limited means. By all means, I would be very happy if you improve the code. Cheers.