UPDATE
You can jump here to make the long story short
Question start point
Iโm reading set of pdf files, and based on the content I add tag(s) in a dic that have the file name and tags, so some of the names will be zero tags, some will be many tags, and I want to convert the generated dict into dataframe.
dict = Dict{String, Array{String}}() #(name, tags)
println(dict)
Dict{String,Array{String,N} where N}("ุนุตุงู
ู
ุญู
ุฏ ูุงูุฒ ุงูุดูุฑู - Mechanical engineering.pdf" => [],"ุฅุจุฑุงููู
ูููุฏ ุฃุญู
ุฏ ุงูุนุฑููุฌ - Mechanical engineering.pdf" => ["Erp"],"ู
ุญู
ุฏ ุนูู ู
ุญู
ุฏ ุงูู
ุญู
ุฏ ุนูู - Mechanical engineering.pdf" => [],"Nawaf Al Yousef.pdf" => [],"ู
ุญู
ุฏ ูุงุตุฑ ุญุณูู ุงูุญุงุฑุซู - Mechanical engineering.pdf" => [],"ู
ูุฏู ุนุจุฏุงููู ู
ูุฏู ุขู ุดูุงุจ - Mechanical engineering.pdf" => [],"ููุณู ุฅุจุฑุงููู
ู
ุญู
ุฏ ุงู ูุฑูุดู - Supply chain management.pdf" => ["Erp", "Supply Chain"],"Noufal Al Zaher.pdf" => [],"Ziyad Al Essa.pdf" => ["Erp", "Shipping Documents", "Supply Chain"],"Majed Al Zahrani - General Secondary.pdf" => [])
And trying to convert it to DataFrame
as:
using DataFrames
df = DataFrame(dict)
But I got this error:
DimensionMismatch("column length 0 for column(s) Majed Al Zahrani - General Secondary.pdf, Nawaf Al Yousef.pdf, Noufal Al Zaher.pdf, ุนุตุงู
ู
ุญู
ุฏ ูุงูุฒ ุงูุดูุฑู - Mechanical engineering.pdf, ู
ุญู
ุฏ ุนูู ู
ุญู
ุฏ ุงูู
ุญู
ุฏ ุนูู - Mechanical engineering.pdf, ู
ุญู
ุฏ ูุงุตุฑ ุญุณูู ุงูุญุงุฑุซู - Mechanical engineering.pdf and ู
ูุฏู ุนุจุฏุงููู ู
ูุฏู ุขู ุดูุงุจ - Mechanical engineering.pdf is incompatible with column length 3 for column(s) Ziyad Al Essa.pdf is incompatible with column length 1 for column(s) ุฅุจุฑุงููู
ูููุฏ ุฃุญู
ุฏ ุงูุนุฑููุฌ - Mechanical engineering.pdf, and is incompatible with column length 2 for column(s) ููุณู ุฅุจุฑุงููู
ู
ุญู
ุฏ ุงู ูุฑูุดู - Supply chain management.pdf")
Stacktrace:
[1] (::getfield(DataFrames, Symbol("##DataFrame#91#94")))(::Bool, ::Type{DataFrame}, ::Array{Any,1}, ::DataFrames.Index) at C:\Users\hasan.DESKTOP-HU2FQ29\.julia\packages\DataFrames\XuYBH\src\dataframe\dataframe.jl:121
[2] Type at .\array.jl:0 [inlined]
[3] #DataFrame#101(::Bool, ::Type{DataFrame}, ::Dict{String,Array{String,N} where N}) at C:\Users\hasan.DESKTOP-HU2FQ29\.julia\packages\DataFrames\XuYBH\src\dataframe\dataframe.jl:155
[4] DataFrame(::Dict{String,Array{String,N} where N}) at C:\Users\hasan.DESKTOP-HU2FQ29\.julia\packages\DataFrames\XuYBH\src\dataframe\dataframe.jl:147
[5] top-level scope at In[34]:1
- Is the way I used to convert Dict to DataFrame correct?
- Is there a way to iterate over the Dict so I remove the tuples where the column is empty?
UPDATE
I tried the below:
shortlisted = filter((k, v) -> v > [], dict)
But got the below:
โ Warning: In `filter(f, dict)`, `f` is now passed a single pair instead of two arguments.
โ caller = top-level scope at In[65]:1
โ @ Core In[65]:1
Dict{String,Array{String,N} where N} with 7 entries:
"ุฅุจุฑุงููู
ูููุฏ ุฃุญู
ุฏ ุงูุนุฑูโฆ => ["Bachelor", "English", "Erp"]
"ู
ุญู
ุฏ ุนูู ู
ุญู
ุฏ ุงูู
ุญู
ุฏ ุนูโฆ => ["Bachelor", "English", "Follow Up"]
"ู
ุญู
ุฏ ูุงุตุฑ ุญุณูู ุงูุญุงุฑุซู โฆ => ["Ba", "Bas", "Excel"]
"Amal Al-Wabel CV.PDF" => ["Bachelor", "Chemicals Permits", "Customs", "Enโฆ
"ู
ูุฏู ุนุจุฏุงููู ู
ูุฏู ุขู ุดูโฆ => ["Ba", "Bsc", "Follow Ups"]
"ููุณู ุฅุจุฑุงููู
ู
ุญู
ุฏ ุงู ูุฑโฆ => ["Ba", "Bachelor", "Erp", "Supply Chain"]
"Ziyad Al Essa.pdf" => ["Bachelor", "Bas", "Erp", "Shipping Documents",โฆ
Then I used:
df = DataFrame(shortlisted)
And got:
DimensionMismatch("column length 13 for column(s) Amal Al-Wabel CV.PDF is incompatible with column length 6 for column(s) Ziyad Al Essa.pdf is incompatible with column length 3 for column(s) ุฅุจุฑุงููู
ูููุฏ ุฃุญู
ุฏ ุงูุนุฑููุฌ - Mechanical engineering.pdf, ู
ุญู
ุฏ ุนูู ู
ุญู
ุฏ ุงูู
ุญู
ุฏ ุนูู - Mechanical engineering.pdf, ู
ุญู
ุฏ ูุงุตุฑ ุญุณูู ุงูุญุงุฑุซู - Mechanical engineering.pdf and ู
ูุฏู ุนุจุฏุงููู ู
ูุฏู ุขู ุดูุงุจ - Mechanical engineering.pdf, and is incompatible with column length 4 for column(s) ููุณู ุฅุจุฑุงููู
ู
ุญู
ุฏ ุงู ูุฑูุดู - Supply chain management.pdf")
Stacktrace:
[1] (::getfield(DataFrames, Symbol("##DataFrame#91#94")))(::Bool, ::Type{DataFrame}, ::Array{Any,1}, ::DataFrames.Index) at C:\Users\hasan.DESKTOP-HU2FQ29\.julia\packages\DataFrames\XuYBH\src\dataframe\dataframe.jl:121
[2] Type at .\array.jl:0 [inlined]
[3] #DataFrame#101(::Bool, ::Type{DataFrame}, ::Dict{String,Array{String,N} where N}) at C:\Users\hasan.DESKTOP-HU2FQ29\.julia\packages\DataFrames\XuYBH\src\dataframe\dataframe.jl:155
[4] DataFrame(::Dict{String,Array{String,N} where N}) at C:\Users\hasan.DESKTOP-HU2FQ29\.julia\packages\DataFrames\XuYBH\src\dataframe\dataframe.jl:147
[5] top-level scope at In[66]:1