Hello, all.
I have a DataFrame column, df.polygons, with strings of ten numbers separated by commas as shown below
julia> df.polygons
400000-element Vector{String}:
“79,8,79.09999999999999,8,79.09999999999999,7.9,79,7.9,79,8”
“79.09999999999999,8,79.2,8,79.2,7.9,79.09999999999999,7.9,79.09999999999999,8”
“79.2,8,79.3,8,79.3,7.9,79.2,7.9,79.2,8”
“79.3,8,79.40000000000001,8,79.40000000000001,7.9,79.3,7.9,79.3,8”
“79.40000000000001,8,79.5,8,79.5,7.9,79.40000000000001,7.9,79.40000000000001,8”
“79.5,8,79.59999999999999,8,79.59999999999999,7.9,79.5,7.9,79.5,8”
“79.59999999999999,8,79.7,8,79.7,7.9,79.59999999999999,7.9,79.59999999999999,8”
“79.7,8,79.8,8,79.8,7.9,79.7,7.9,79.7,8”
“79.8,8,79.90000000000001,8,79.90000000000001,7.9,79.8,7.9,79.8,8”
“79.90000000000001,8,80,8,80,7.9,79.90000000000001,7.9,79.90000000000001,8”
“79,7.9,79.09999999999999,7.9,79.09999999999999,7.8,79,7.8,79,7.9”
⋮
“12,29.1,12.1,29.1,12.1,29,12,29,12,29.1”
I want to place each number (excluding the last two), as a Float64, into its own DataFrame column (either in the original DataFrame or a new one, as done below).
I’ve tried two methods (both work, with the second variation being 300 ms slower than the first), but I feel like there’s a more efficient way.
Can you come up with something more clever or efficient?
dfpoly = DataFrame(
:corner1lat => Float64[],
:corner1lon => Float64[],
:corner2lat => Float64[],
:corner2lon => Float64[],
:corner3lat => Float64[],
:corner3lon => Float64[],
:corner4lat => Float64[],
:corner4lon => Float64[]
)
# VARIATION 1
for i in 1:length(df.polygons)
@chain split(df[i, :polygons], ",") begin
[parse(Float64, _[i]) for i in 1:8]
push!(dfpoly, _')
end
end
# VARIATION 2
for stringvector in df.polygons
@chain split.(stringvector, ",") begin
parse.(Float64, _)
push!(dfpoly, _[1:8]')
end
end