Looking for a clean way to add a row in a DataFrame

dobby · December 16, 2024, 3:20pm

Hello,

I am new to Julia, so please have mercy.
I am reading multiple CSV files, where the rows are the same. But in order to distinguish the different files I’m reading, I have to add a row to indicate that.
Here is a minimalistic example:

header = ["Col1", "Col2", "Set_number"]
observation1 = [1, 2]
observation2 = [3, 4]
# ...
observationX = [7, 8]
relevant_data = DataFrame([[], [], []], header)

# sets is an array with filepaths
for set in sets
        filtered_data = CSV.read(set, header)
        filtered_data.Set .= set_number
        append!(relevant_data, filtered_data)
    end
end

So in order to add the missing row, I am reading the files intentionally with a given header, that is longer than the data set. The missing data is filled with “missing” and I replace it later with the intentional value, depending on the current set in the loop.

This leads to the problem, that the terminal is spammed with the warning that missing data is filled in.

Is there a neater way to do what I want to do?
It feels like I am not doing it quite right, although it works for me just fine.

g-gundam · December 16, 2024, 3:39pm

What happens to those observationN variables?

algunion · December 16, 2024, 3:44pm

The clean version would be to add the column Set_number to your filtered_data data frame and append the filtered_data to relevant_data after this step.

This will solve the warning issue (and it seems compatible with the logic in your for loop).

P. S. You seem to be already on the right path, but you assign the value to the Set instead of Set_number (see the @pdeffebach comment below).

pdeffebach · December 16, 2024, 3:44pm

I’m a little confused, but

        filtered_data.Set .= set_number

contradicts that you named the column Set_number in the beginning.

Maybe that’s your issue?

jules · December 16, 2024, 3:47pm

Given data.csv:

a,b,c
1,2,3
4,5,6

You can do this:

CSV.read(
    ["data.csv", "data.csv"],
    DataFrame;
    source = :Set => [1, 2]
)

to get

4×4 DataFrame
 Row │ a      b      c      Set   
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      2      3      1
   2 │     4      5      6      1
   3 │     1      2      3      2
   4 │     4      5      6      2

dobby · December 17, 2024, 8:10am

Thank you everyone!
Sorry that my example was a bit confusing. I didn’t find how to edit it.
Jules gave a beautiful solution

g-gundam · December 17, 2024, 2:47pm

Clicking on that little pencil icon at the bottom of your post will allow you to edit it.

dobby · December 17, 2024, 3:29pm

That’s the weird thing, I don’t have it

rocco_sprmnt21 · December 17, 2024, 4:15pm

julia> vcat((CSV.read.(["f1.csv", "f2.csv"],DataFrame))..., source=:sour=>[:f1,:f2])
4×4 DataFrame
 Row │ a      b      c      sour   
     │ Int64  Int64  Int64  Symbol
─────┼─────────────────────────────
   1 │     1      2      3  f1
   2 │     4      5      6  f1
   3 │    11     21     31  f2
   4 │    41     51     61  f2

rocco_sprmnt21 · December 18, 2024, 1:52pm

julia> f1="""a,b,c
       1,2,3
       4,5,6"""
"a,b,c\n1,2,3\n4,5,6"

julia> f2="""a,c,d
       11,21,31
       41,51,61"""
"a,c,d\n11,21,31\n41,51,61"

julia> f3="""a,b,d
       11,21,31
       41,51,61"""
"a,b,d\n11,21,31\n41,51,61"

julia> write("f1.csv", f1)
17

julia> write("f2.csv", f2)
23

julia> write("f3.csv", f3)
23

julia> using DataFrames, CSV

julia> vcat((CSV.read.(["f1.csv", "f2.csv", "f3.csv"],DataFrame))...,cols=:union, source=:sour=>[:f1,:f2,:f3]) 
6×5 DataFrame
 Row │ a      b        c        d        sour   
     │ Int64  Int64?   Int64?   Int64?   Symbol 
─────┼──────────────────────────────────────────
   1 │     1        2        3  missing  f1
   2 │     4        5        6  missing  f1
   3 │    11  missing       21       31  f2
   4 │    41  missing       51       61  f2
   5 │    11       21  missing       31  f3
   6 │    41       51  missing       61  f3


julia> CSV.read(
           ["f1.csv", "f2.csv",    "f3.csv"],
           DataFrame;
           source = :Set => Symbol.("f".*string.(1:3)))
6×4 DataFrame
 Row │ a      b        c        Set    
     │ Int64  Int64?   Int64?   Symbol
─────┼─────────────────────────────────
   1 │     1        2        3  f1
   2 │     4        5        6  f1
   3 │    11  missing       21  f2
   4 │    41  missing       51  f2
   5 │    11       21  missing  f3
   6 │    41       51  missing  f3

Just to point out that in a case like this using CSV alone does not provide the expected result.
I don’t know if there is a different form of CSV.read() that can handle this situation.

kevbonham · December 18, 2024, 2:44pm

I think you can’t edit when you’re this new to the community (welcome by the way!). It’s an anti-spam thing. Don’t sweat it! It was clear enough that you got to an answer

Don’t worry, people here tend to be quite kind, as long as you don’t toss bombs and are making an effort

Topic		Replies	Views
How to add a unit row to a CSV file through dataframe? General Usage question , dataframes , csv	7	1693	October 24, 2021
Help with appending row in read in DataFrame (weird behavior) General Usage dataframes , csv	6	542	September 24, 2022
Adding a new row to a DataFrame General Usage dataframes	8	34176	November 15, 2021
How do you edit a DataFrame after reading it from a CSV? Data	6	1031	March 1, 2021
Appending rows to a dataframe is seemingly inconsistent and confusing Data	11	4689	December 24, 2021

Looking for a clean way to add a row in a DataFrame

Related topics