I have a Text file that user specify an Ordered List of “operations” to perform.
e.g. the text file contains operation name seperated by lines:
Diff
Zscore
Variance
In my code, I define those operations
struct Diff <: Operation
...
end
struct Zscore <: Operation
...
end
struct Variance <: Operation
...
end
function compute(o::Operation, data:Matrix{Float64})
...
end
Now I want to create an Array of Operations, reading from the user specified file.
then it will sequentially apply these operations in order, then print the final result.
Question 1:
How Can I link the “String” I read from txt file, and convert it into a “struct” I already defined?
basically when I read file line = “diff”, how can I let Julia know it map to struct Diff and i can create an instance?
Question 2:
Because I need to sequentially perform a few operations, I need to store an array of those Operations:
e.g.
oa = Vector{Operation}(...)
Then something like:
for o in oa
result += compute(o, data)
end
Because the data is huge (>5GB) I would need to care about Performance.
But I heard that from Julia Performance Tips, i should Avoid Array of AbstractType content.
so here, creating an array of abstract “Operation” will slow down my program?
What is the best solution?
Thank you
how many operations are? if there are three or a similar quantity, maybe just multiple dispatch over the types will be sufficient, where is the data, a separate file?
Number of Operations: could be around 10 - 20 on the long term. Wont be too much and it will be carefully managed by me.
maybe just multiple dispatch over the types will be sufficient
Do you mean this?
op1 = Diff(...)
op2 = Zscore(...)
op3 = Variance(...)
# Read one line from user's parameter file
if thisline == "Diff"
compute(data, op1)
elseif thisline == "Zscore"
compute(data, op2)
elseif thisline == "Variance"
compute(data, op3)
else
throw(ErrorException(...))
end
I still need an answer for Question 1, e.g. how to convert each line of the file, into my struct
i.e. how to convert a string “Diff” to Diff struct in my code.
The data is a simple Matrix of float numbers, stored separately using Serialization. It is 5GB but it takes only 2 second to load the binary file.
1 Like
i was writing this, because is very interesting!, maybe is not the best, but it works
abstract type MyOperation end #abstract type to dispatch
struct MyDiff <: MyOperation end #struct
struct MyZscore <: MyOperation end #struct
struct MyVariance <: MyOperation end #struct
struct MyNothing <: MyOperation
str::String #to store the incorrect string
end #struct
function create_operation(x::String) #a string enters, an operation exits
x=="MyDiff" && (return MyDiff()) # short-circuit evaluation
x=="MyZscore" && (return MyZscore())
x=="MyVariance" && (return MyVariance())
return MyNothing() #an operation that does nothing, a catchall case
end
#using julia dispatch system
function compute!(o::MyDiff,data::Matrix{Float64}) #the data vector is modified inplace,
#so the function name has an !
data .+= 1 #sums data
end
function compute!(o::MyZscore,data::Matrix{Float64}) #the data vector is modified inplace
data .*= 10 #sums data
end
function compute!(o::MyVariance,data::Matrix{Float64}) #the data vector is modified inplace
data ./= 4 #sums data
end
function compute!(o::MyNothing,data::Matrix{Float64}) #this does nothing
errorstring = o.str
println("an operation was incorrectly typed : $errorstring") # you can throw an error too
end
ops_string = readlines("testrun.txt") #this reads a file and gives a Array with all the lines
ops = create_operation.(ops_string) #this transforms the array of string to an array of operations
data1 = rand(100,100) #matrix test data
#looping over the ops
for i = 1:length(ops)
compute!(ops[i],data1)
end
x1 = sum(data1)/(100*100)
x2 = ((0.5+1)*10)/4
println(x1-x2) #-0.002315376904970634
the file testrun.txt have the following lines:
MyDiff
MyVariance
MyZscore
A supossition about the data i did is that you are using the results of the last operation to compute the next one, because that code does the operations in place, if you change dimensions maybe it won’t work, but is a base
Maybe some form of a Dict{String, Any}
Not sure why you are using structs instead of functions directly, either way should work
function MyDiff(result)
diff(result)
end
mapping = Dict("diff"=>MyDiff, "textString"=>OperationCommand)
while ()
text = readline(fid)
result = mapping[text](result)
end
That shouldn’t be a problem if the array is just the 10 operations in an outside loop. It might only be a problem if it is the data is stored as a matrix 1G elements of any which each in turn points to a float.
Maybe the abstract vector of operations could become a problem if you do an element-wise version, where for each element you run through the vector of operations (this time the inner loop).