Unable to create a dict from split


#1

Hello !

I am relatively new in the Julia world, but it appears to be a cool language I work with more and more.
I encounter an issue that Google and Wikis hasn’t been able to resolve ; maybe this is really easy, but I am coming from PHP and some concepts are hard to fit in my brain.

In the example case I describe to you, here’s the issue :
I have a file with 40000 CSV lines, where every line is divide in different columns :

79373,37336,something,here,also,again

To manage these 40000 lines, I would like to run a loop on them, using eachline(). It works fine, I am now able to explode every single line using the split() function. But, for an easier understanding, I would like to change the key names from 1,2,3,4… to something more like “id”, “subscription”, creating a new variable.

In PHP, I would do something like this :

$elements = explode(",", $line)
$data = [
  "id" => $elements[0],
  "subscription" => $elements[1],
];

But in Julia, it appears impossible

data["id"] = elements[1]
ERROR: ArgumentError: invalid index: id
Stacktrace:
 [1] setindex!(::Array{SubString{String},1}, ::SubString{String}, ::String) at ./abstractarray.jl:968

I heard about Dict(), but I cannot find any working method using it.
Do you have any idea how to do this kind of operations ?

Regards :slight_smile:


#2

The documentation of Dict would be a good starting point. Also, please post a complete working example, especially how you create data.

That said, I would just use an existing package to read CSV, eg CSV.jl.


#3

Do you think something like this?

julia> file = IOBuffer("""1,A
       2,B
       3,C
       3,D""");  # I just want to simulate file here

julia> [Dict((:id=>i[1],:elements=>i[2])) for k in eachline(file) for i in [split(k,',')]]
4-element Array{Dict{Symbol,SubString{String}},1}:
 Dict(:id=>"1",:elements=>"A")
 Dict(:id=>"2",:elements=>"B")
 Dict(:id=>"3",:elements=>"C")
 Dict(:id=>"3",:elements=>"D")

#4

Hi,

Thanks for your answer, great community :wink:
Regarding the working example asked, here we go :

julia> line = string("this,is,an,example")
julia> elements = split(line, ",")
4-element Array{SubString{String},1}:
 "this"   
 "is"     
 "an"     
 "example"
julia> elements[1]
"this"

What I really would like, and maybe I didn’t explained the “issue” the right way, is to being able to define a better keyword than [1] to call the related value “this”. In my mind, I would imagine something like this, which is currently impossible :

julia> line = string("this,is,an,example")
julia> elements = split(line, ",")
4-element Array{SubString{String},1}:
 "this"   
 "is"     
 "an"     
 "example"
julia> data["id"] = elements[1]
julia> data["id"]
"this"

I followed the Dict() documentation, and I finally found a way to do it (needed to be focused…) :

julia> data = Dict("id" => elements[1])

But it seems I cannot add more elements (in different lines in my code) to the data variable without erasing the “id” key.

julia> data = Dict("id" => elements[1])
julia> data["id"]
  "this"
julia> data = Dict("something" => elements[2])
julia> data
Dict{String,SubString{String}} with 1 entry:
  "something" => "is"

From this point, any way to add new keys => values to a Dictionary without erasing it ? The merge!() function looks fine, but is massive and verbose.


#5

First, it looks like you may be looking for something like a DataFrame. These can be used with the previously mentioned CSV.jl. Unfortunately Julia has a bit of a discoverability problem right now (in my opinion): there are a lot of really great and well-maintained packages but if you are coming in cold I can imagine that it’s probably really difficult to know where to look. Some suggestions to that end are browsing here and (probably the best resource at the moment) at JuliaObserver.

Anyway, Julia has a couple of standard naming conventions. One of those is that one you see a “CamelCase” (capitalized words) name such as Dict or AbstractArray, this is a type. When you make a call to a type such as Dict(k=>v) this is a call to a special type of function called a constructor which returns a new object of that type (this is just the same as how it works in object oriented languages). What it looks like you are trying to do is create a new instance of a Dict and add objects to it. Objects are added to Dicts using indexing semantics as described in its documentation. So, for example, you could do

data = Dict()  # this creates a new empty Dict
data["id"] = elements[1]
data["something"] = elements[2]

alternatively you could do

Dict("id"=>elements[1], "something"=>elements[2])

Taking this to the next level of abstraction you might arrive at something like

elname(i) = i == 1 ? "id" : "something"  # define a function which gets the element name
Dict(elname(i)=>elements[i] for i in elements)  # create a dict filled with all elements

Of course this is just an example to give you an idea of what commonly encountered programming structures in Julia might look like (an incomplete example, it will fail for more than 2 elements because it will try to create a Dict with multiple distinct keys of the same name).

Again, I encourage you to take a bit of a look through the available packages and look at some examples that they provide in their documentation. If you give us an idea of what you expect your final output to look like, we might be able to give you better suggestions as far as where to look to learn more.


#6

Oh ! It was so simple using this method ! Creating first an empty Dict to fill it with defined keys, great ! Thanks a lot.

Regarding the DataFrame, I’ll take a careful look about it and follow all your links since they appear to be related to my misunderstanding. Since I am not learning Julia for long, I didn’t documented myself about the different available packages ; and I was Googling for hours instead of going on JuliaObserver.

Again, thanks a lot :smile:


#7

Note that in the upcoming v0.7 version of Julia, you will also be able to use named tuples to create small collections:

julia> VERSION
v"0.7.0-alpha.20"

julia> line = "id,stuff"
"id,stuff"

julia> NamedTuple{(:id, :stuff)}(split(line, ','))
(id = "id", stuff = "stuff")