CC @pdeffebach
But I think the only thing you can do is to add @transform
to the expression and then evaluate it
CC @pdeffebach
But I think the only thing you can do is to add @transform
to the expression and then evaluate it
You can parse the :c = 2 * :a + :b
to become a src => fun => dest
pair that you can pass around and use later. Use the (currently un-exported) DataFramesMeta.@col
feature
julia> using DataFramesMeta
julia> df = DataFrame(a = 1, b = 2);
julia> t = DataFramesMeta.@col :c = 2 * :a + :b
[:a, :b] => (var"#3#4"() => :c)
julia> transform(df, t)
1Γ3 DataFrame
Row β a b c
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 2 4
Iβve been meaning to export this feature, since it is useful.
Thing is in this case the transformation I have as an expression in a variable, which I canβt pass as an argument to @col
If thatβs the case I think you should re-think your whole approach. Use functions to store transformations, not expressions. You should not be passing around expressions like that.
perhaps if you provided more details about the problem you face, more suggestions might come.
Could the symbolics.jl package be useful to you in your case?
Certainly, Iβm not sure it will be of much help (compared to what I mentioned in above posts) given the generic nature of what Iβm trying to achieve, but let me do it anyway.
I have DataFrames
of time series data, over which the end user of the program (very often, me) performs high numbers of transformations for further analysis. These transformations do vary and are best stored as lists of Julia Expr
(made from strings). I realize that this is essentially me using a subset of the Julia language as my own language that is exposed to the end user, hence no way around eval()
unless I describe my transformations ahead of time in very generic terms and reduce flexibility.
So with the example DataFrame
above, I want the user to be able to apply say the 4 operations, sine/cosine, exp, log, and more (some being defined in my module), with any arbitraty combination possible. Iβm essentially recreating a small calculator that is applied column-wise to a DataFrame
. Hope this makes sense. Appreciate you reading my convoluted problem.
This is absolutely feasible with functions. And you should definitely be using them instead of expressions.
julia> df = DataFrame(a = [1, 2, 3]; b = [4, 5, 6]);
julia> function operate_on_col(fun, col1, col2)
DataFramesMeta.@col @byrow :newcol = begin
$col1 * 2 + fun($col2)
end
end;
julia> t1 = operate_on_col(sin, :a, :b); t2 = operate_on_col(cos, :a, :b);
julia> transform(df, t1)
3Γ3 DataFrame
Row β a b newcol
β Int64 Int64 Float64
ββββββΌβββββββββββββββββββββββ
1 β 1 4 1.2432
2 β 2 5 3.04108
3 β 3 6 5.72058
julia> transform(df, t2)
3Γ3 DataFrame
Row β a b newcol
β Int64 Int64 Float64
ββββββΌβββββββββββββββββββββββ
1 β 1 4 1.34636
2 β 2 5 4.28366
3 β 3 6 6.96017
Thank you for the suggestion, I think my issue is fun
is not known ahead of time, but let me take a closer look at your code see if I can leverage that. (And by βnot knownβ I mean βnot definedβ).
In your example, I would pretty much need to have the user pass the definition of fun
as a string and eval()
its definition, which takes me back to square one.
Could you not have the user pass fun
? They canβt make it themselves?
The user lives outside of Julia, meaning they can pass as string, which would have to be parsed and evalβd?
Yeah that would most likely have to be parsed and evalβd, which isnβt ideal. Maybe someone else can chime in on the best way to do that particular task.
julia> using DataFrames, Symbolics
julia> df_flows = DataFrame(;
from = ["p1", "p1", "p2", "p2", "p1", "p2"],
to = ["d", "d", "d", "d", "d", "d"],
rp = [1, 1, 1, 1, 2, 2],
tb = [3,5,7,4,6,8],
index = 1:6,
)
6Γ5 DataFrame
Row β from to rp tb index
β String String Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1
2 β p1 d 1 5 2
3 β p2 d 1 7 3
4 β p2 d 1 4 4
5 β p1 d 2 6 5
6 β p2 d 2 8 6
julia> @variables x y z
3-element Vector{Num}:
x
y
z
julia> w=x^2+y*sqrt(z*x)
x^2 + y*sqrt(x*z)
julia> transform(df_flows,[3,4,5]=>ByRow((r,t,i)->substitute(w, Dict(x=>r,y=>t,z=>i))))
6Γ6 DataFrame
Row β from to rp tb index rp_tb_index_function
β String String Int64 Int64 Int64 Num
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 4.0
2 β p1 d 1 5 2 8.07107
3 β p2 d 1 7 3 13.1244
4 β p2 d 1 4 4 9.0
5 β p1 d 2 6 5 22.9737
6 β p2 d 2 8 6 31.7128
julia> w=3x-2y
3x - 2y
julia> transform(df_flows,[3,4,5]=>ByRow((r,t,i)->substitute(w, Dict(x=>r,y=>t,z=>i))))
6Γ6 DataFrame
Row β from to rp tb index rp_tb_index_function
β String String Int64 Int64 Int64 Num
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 -3
2 β p1 d 1 5 2 -7
3 β p2 d 1 7 3 -11
4 β p2 d 1 4 4 -5
5 β p1 d 2 6 5 -6
6 β p2 d 2 8 6 -10
PS
I saw several old posts that discussed the problem of deriving a function from a string. Apart from the solution with parse() and eval() I have seen the use of the getfield function applied to the current module to obtain the function from the string.
But this applies to functions already defined in the module.
An equivalent would be, in my opinion, a dict with strings as keys and functions as values.
to get suggestions for the current problem (about the use of an input string) it might be more useful to open a new topic with a more specific title
In principle a βsimple calculatorβ seems feasible.
Thereβs a lot of work to do to make it truly functional, but just to start a seed
df=Dict("log"=>log, "sin"=>sin, "*"=>*,"+"=>+,"-"=>-,"^"=>^, "β"=>β)
function str2func(str)
lff=findfirst('(', str)
op=df[str[1:lff-1]]
if !occursin('(',str[lff+1:end])
par=split(str[lff+1:end-1],',')
tp=tryparse.(Int,par)
if all(isnothing,tp)
return ((x,y)->(a->(b->(c->op(a,c))(b)))(x)(y))
else
n=only(filter(!isnothing,tp))
return z->((x,y)->(a->(b->(c->op(a,c))(b)))(x)(y))(n,z)
end
else
return (x...)->"not yet"
end
end
julia> df_flows
6Γ5 DataFrame
Row β from to rp tb index
β String String Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1
2 β p1 d 1 5 2
3 β p2 d 1 7 3
4 β p2 d 1 4 4
5 β p1 d 2 6 5
6 β p2 d 2 8 6
julia> transform(df_flows,[3,4]=>ByRow(str2func("*(x,y)")))
6Γ6 DataFrame
Row β from to rp tb index rp_tb_function
β String String Int64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 3
2 β p1 d 1 5 2 5
3 β p2 d 1 7 3 7
4 β p2 d 1 4 4 4
5 β p1 d 2 6 5 12
6 β p2 d 2 8 6 16
julia> transform(df_flows,[3,4]=>ByRow(str2func("+(x,y)")))
6Γ6 DataFrame
Row β from to rp tb index rp_tb_function
β String String Int64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 4
2 β p1 d 1 5 2 6
3 β p2 d 1 7 3 8
4 β p2 d 1 4 4 5
5 β p1 d 2 6 5 8
6 β p2 d 2 8 6 10
julia> transform(df_flows,[4,3]=>ByRow(str2func("^(x,y)")))
6Γ6 DataFrame
Row β from to rp tb index tb_rp_function
β String String Int64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 3
2 β p1 d 1 5 2 5
3 β p2 d 1 7 3 7
4 β p2 d 1 4 4 4
5 β p1 d 2 6 5 36
6 β p2 d 2 8 6 64
julia> transform(df_flows,[4,3]=>ByRow(str2func("-(x,y)")))
6Γ6 DataFrame
Row β from to rp tb index tb_rp_function
β String String Int64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 2
2 β p1 d 1 5 2 4
3 β p2 d 1 7 3 6
4 β p2 d 1 4 4 3
5 β p1 d 2 6 5 4
6 β p2 d 2 8 6 6
julia> transform(df_flows,[4]=>ByRow(str2func("*(3,y)")))
6Γ6 DataFrame
Row β from to rp tb index tb_function
β String String Int64 Int64 Int64 Int64
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 9
2 β p1 d 1 5 2 15
3 β p2 d 1 7 3 21
4 β p2 d 1 4 4 12
5 β p1 d 2 6 5 18
6 β p2 d 2 8 6 24
julia> transform(df_flows,[4,3]=>ByRow(str2func("+(log(x),β(sin, *(10,y)))")))
6Γ6 DataFrame
Row β from to rp tb index tb_rp_function
β String String Int64 Int64 Int64 String
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β p1 d 1 3 1 not yet
2 β p1 d 1 5 2 not yet
3 β p2 d 1 7 3 not yet
4 β p2 d 1 4 4 not yet
5 β p1 d 2 6 5 not yet
6 β p2 d 2 8 6 not yet
the functions thus defined can be, in appropriate cases (associative operators), applied to more than 2 elements.
And in particular in the case of operators that have methods also defined on vectors, you can do without wrapping everything with ByRow
transform(df_flows,[3,4, 5]=>ByRow((x...)->reduce(str2func("+(x,y)"),x)))
transform(df_flows,[3,4, 5]=>(x...)->foldl(str2func("+(x,y)"),x))
a small step forward(?). If you illustrate the typical cases of formulas used, perhaps something can be added
I realize that the idea and above all the implementation is really naive and, I fear, not very efficient.
But letβs play with Juliaβs expressions and while waiting for ideas for improvement, Iβll give you a small step forward(?).
If you illustrate the typical cases of formulas used, perhaps something can be added
julia> function str2func(str)
lff=findfirst('(', str)
op=df[str[1:lff-1]]
if !occursin('(',str[lff+1:end])
par=split(str[lff+1:end-1],',')
tp=tryparse.(Int,par)
opxy=(x,y)->(b->(c->op(c,b)))(x)(y)
if all(isnothing,tp)
return opxy
else
n=only(filter(!isnothing,tp))
return z->opxy(n,z)
end
else
par=split(str[lff+1:end-1],"),")
par[1:end-1] .*=')'
return (x...)->op([str2func(p)(var) for (p,var) in zip(par,x)]...)
end
end
str2func (generic function with 1 method)
julia> str= "+(*(3,x),*(2,y))"
"+(*(3,x),*(2,y))"
julia> str2func(str)(3,3)
15
julia> str= "*(+(3,x),+(2,y))"
"*(+(3,x),+(2,y))"
julia> str2func(str)(2,3)
25
julia> str= "+(3,x)"
"+(3,x)"
julia> str2func(str)(-3)
0
I posted an updated version of the script here