Help refactoring deprecated Task() to Channels

Joshua_Bowles · April 27, 2018, 3:21pm

Sure enough, simple and works.
I’m gonna work it a bit. In go I’d typically have a goroutine per file or per batch of files. I imagine i’d have to rework it quite a bit if I wanted to run a channel on multiple cpu to large sets of files, per the discourse post here: reading-and-processing-data-files-concurrently

Thank you again.

I did noticed if I put the Channel() do block under scope of the for (root, dirs, files) in walkdir(path) block i get an error.

ERROR: MethodError: no method matching start(::Void)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107

Here is the refactored bit:

function readFile(path::String)
    Channel() do chan
        for (root, dirs, files) in walkdir(path)
            for filename in files 
                if !(filename in SKIPFILES) && filename[1] != '.'
                    fullname = joinpath(root, filename)
                    if isfile(fullname)
                        pastHeader, lines = false, Vector{String}()
                        open(fullname) do f 
                            for line in eachline(f)
                                if !isvalid(line)
                                    line = decode(convert(Array{UInt8,1}, line), "LATIN1")
                                end
                                line = chomp(line)
                                if pastHeader
                                    push!(lines, line)
                                elseif endof(line) == 0
                                    pastHeader = true
                                end
                            end
                        end
                        content = join(lines, NEWLINE)
                        put!(chan, (fullname,content))
                    end
                end
            end
        end
    end
end

function addData!(df::DataFrame, path::String, classification::String)
    for (filename, text) in readFile(path)
        push!(df, @data([text, classification, filename]))
    end
end

function buildDataSet(sources)
    df = DataFrame(text = Vector{String}(), class = Vector{String}(), index = Vector{String}())
    for (path, classification) in sources
        addData!(df, joinpath(SPAMROOT, path), classification)
    end
    return df
end

Topic		Replies	Views
How to replace consume and produce with channels General Usage question	14	4419	December 15, 2020
Basic examples of Tasks/Channels? (or more verbose documentation?) General Usage	10	4700	November 2, 2017
Do Channels work with multithreaded tasks in 1.3? Julia at Scale	1	1404	November 12, 2019
Best way for a task to read from two channels General Usage parallel , task , channel	25	2196	September 27, 2021
[ANN] ChannelBuffers Internals & Design announcement , distributed	1	688	December 19, 2020

Help refactoring deprecated Task() to Channels

Related topics