Looking for a clean way to add a row in a DataFrame

Hello,

I am new to Julia, so please have mercy.
I am reading multiple CSV files, where the rows are the same. But in order to distinguish the different files I’m reading, I have to add a row to indicate that.
Here is a minimalistic example:

header = ["Col1", "Col2", "Set_number"]
observation1 = [1, 2]
observation2 = [3, 4]
# ...
observationX = [7, 8]
relevant_data = DataFrame([[], [], []], header)

# sets is an array with filepaths
for set in sets
        filtered_data = CSV.read(set, header)
        filtered_data.Set .= set_number
        append!(relevant_data, filtered_data)
    end
end

So in order to add the missing row, I am reading the files intentionally with a given header, that is longer than the data set. The missing data is filled with β€œmissing” and I replace it later with the intentional value, depending on the current set in the loop.

This leads to the problem, that the terminal is spammed with the warning that missing data is filled in.

Is there a neater way to do what I want to do?
It feels like I am not doing it quite right, although it works for me just fine.

What happens to those observationN variables?

The clean version would be to add the column Set_number to your filtered_data data frame and append the filtered_data to relevant_data after this step.

This will solve the warning issue (and it seems compatible with the logic in your for loop).

P. S. You seem to be already on the right path, but you assign the value to the Set instead of Set_number (see the @pdeffebach comment below).

I’m a little confused, but

        filtered_data.Set .= set_number

contradicts that you named the column Set_number in the beginning.

Maybe that’s your issue?

2 Likes

Given data.csv:

a,b,c
1,2,3
4,5,6

You can do this:

CSV.read(
    ["data.csv", "data.csv"],
    DataFrame;
    source = :Set => [1, 2]
)

to get

4Γ—4 DataFrame
 Row β”‚ a      b      c      Set   
     β”‚ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 β”‚     1      2      3      1
   2 β”‚     4      5      6      1
   3 β”‚     1      2      3      2
   4 β”‚     4      5      6      2
10 Likes

Thank you everyone!
Sorry that my example was a bit confusing. I didn’t find how to edit it.
Jules gave a beautiful solution :slight_smile:

2 Likes

image

Clicking on that little pencil icon at the bottom of your post will allow you to edit it.

That’s the weird thing, I don’t have it :frowning:

julia> vcat((CSV.read.(["f1.csv", "f2.csv"],DataFrame))..., source=:sour=>[:f1,:f2])
4Γ—4 DataFrame
 Row β”‚ a      b      c      sour   
     β”‚ Int64  Int64  Int64  Symbol
─────┼─────────────────────────────
   1 β”‚     1      2      3  f1
   2 β”‚     4      5      6  f1
   3 β”‚    11     21     31  f2
   4 β”‚    41     51     61  f2
1 Like
julia> f1="""a,b,c
       1,2,3
       4,5,6"""
"a,b,c\n1,2,3\n4,5,6"

julia> f2="""a,c,d
       11,21,31
       41,51,61"""
"a,c,d\n11,21,31\n41,51,61"

julia> f3="""a,b,d
       11,21,31
       41,51,61"""
"a,b,d\n11,21,31\n41,51,61"

julia> write("f1.csv", f1)
17

julia> write("f2.csv", f2)
23

julia> write("f3.csv", f3)
23

julia> using DataFrames, CSV

julia> vcat((CSV.read.(["f1.csv", "f2.csv", "f3.csv"],DataFrame))...,cols=:union, source=:sour=>[:f1,:f2,:f3]) 
6Γ—5 DataFrame
 Row β”‚ a      b        c        d        sour   
     β”‚ Int64  Int64?   Int64?   Int64?   Symbol 
─────┼──────────────────────────────────────────
   1 β”‚     1        2        3  missing  f1
   2 β”‚     4        5        6  missing  f1
   3 β”‚    11  missing       21       31  f2
   4 β”‚    41  missing       51       61  f2
   5 β”‚    11       21  missing       31  f3
   6 β”‚    41       51  missing       61  f3


julia> CSV.read(
           ["f1.csv", "f2.csv",    "f3.csv"],
           DataFrame;
           source = :Set => Symbol.("f".*string.(1:3)))
6Γ—4 DataFrame
 Row β”‚ a      b        c        Set    
     β”‚ Int64  Int64?   Int64?   Symbol
─────┼─────────────────────────────────
   1 β”‚     1        2        3  f1
   2 β”‚     4        5        6  f1
   3 β”‚    11  missing       21  f2
   4 β”‚    41  missing       51  f2
   5 β”‚    11       21  missing  f3
   6 β”‚    41       51  missing  f3

Just to point out that in a case like this using CSV alone does not provide the expected result.
I don’t know if there is a different form of CSV.read() that can handle this situation.

I think you can’t edit when you’re this new to the community (welcome by the way!). It’s an anti-spam thing. Don’t sweat it! It was clear enough that you got to an answer :slightly_smiling_face:

Don’t worry, people here tend to be quite kind, as long as you don’t toss bombs and are making an effort :wink:

4 Likes