Hi all, my MWE is this one with output:
julia> people = DataFrame(name=["john", "marry", "tim"], goodat=["gardening", "teaching,negotiation", ""])
3×2 DataFrame
│ Row │ name │ goodat │
│ │ String │ String │
├─────┼────────┼──────────────────────┤
│ 1 │ john │ gardening │
│ 2 │ marry │ teaching,negotiation │
│ 3 │ tim │ │
julia> transform!(people, :goodat => ByRow(p -> split(p, ',')) => :goodatarray)
3×3 DataFrame
│ Row │ name │ goodat │ goodatarray │
│ │ String │ String │ Array{SubString{String},1} │
├─────┼────────┼──────────────────────┼─────────────────────────────┤
│ 1 │ john │ gardening │ ["gardening"] │
│ 2 │ marry │ teaching,negotiation │ ["teaching", "negotiation"] │
│ 3 │ tim │ │ [""] │
julia> neededSkils = DataFrame(field=["teaching", "gardening"])
2×1 DataFrame
│ Row │ field │
│ │ String │
├─────┼───────────┤
│ 1 │ teaching │
│ 2 │ gardening │
julia> goodat = people[!, :goodatarray] |>
Iterators.flatten |>
@map(_) |>
DataFrame
4×3 DataFrame
│ Row │ string │ offset │ ncodeunits │
│ │ String │ Int64 │ Int64 │
├─────┼──────────────────────┼────────┼────────────┤
│ 1 │ gardening │ 0 │ 9 │
│ 2 │ teaching,negotiation │ 0 │ 8 │
│ 3 │ teaching,negotiation │ 9 │ 11 │
│ 4 │ │ 0 │ 0 │
julia> DataFrames.innerjoin(neededSkils, goodat, on = :field => :string)
1×3 DataFrame
│ Row │ field │ offset │ ncodeunits │
│ │ String │ Int64 │ Int64 │
├─────┼───────────┼────────┼────────────┤
│ 1 │ gardening │ 0 │ 9 │
This example is really artificial, but let’s say there are
- people with skills and
- needed skills at market.
I need to join these two DataFrames to find what needs could be fulfilled.
What’s wrong? the DataFrames.innerjoin
should find gardening
and teaching
, but found only gardening
.
It’s obvious that goodat
is constructed in a bad way. I expected it will be just flattened array of strings.
What is an idiomatic way how to do that?
For faster review, this is the same example without output
using Query
using DataFrames
people = DataFrame(name=["john", "marry", "tim"], goodat=["gardening", "teaching,negotiation", ""])
transform!(people, :goodat => ByRow(p -> split(p, ',')) => :goodatarray)
neededSkils = DataFrame(field=["teaching", "gardening"])
goodat = people[!, :goodatarray] |>
Iterators.flatten |>
@map(_) |>
DataFrame
DataFrames.innerjoin(neededSkils, goodat, on = :field => :string)