Combine multiple Arrays into one large dataframe matrix:

GeodeticR · April 24, 2024, 12:09am

I’ve attempted multiple ways to populate the “missing” values of my DataFrame with the appropriate values,

I created the shell of a DataFrame that looks like this:

ResVert_matrix
1826×372 DataFrame
  Row │ decyr    AGMT     AHID     ALAM     ALBH  ...   
      │ Float64  Any      Any      Any      Any   ...
──────┼────────────────────────────────────────────
    1 │ 2001.0   missing  missing  missing  missing ...
    2 │ 2001.0   missing  missing  missing  missing ...
    .
    .
    .

And what I’m trying to do is fill the “missing” values with the residual vertical position that I have stored in individual files that correspond to each station name at the appropriate decimal year value in the ResVert_matrix.
But, I’ve either been focusing on the problem for too long and have over-complicated and confused myself, or, I’m just going about this incorrectly in general.

I won’t copy and paste the entire code here, since I think the only issue is the loop I’m trying to implement. But, I will include the variables that I use in the loop to help avoid any confusion, where my ‘for loop’ is implemented just below where this code ends:

# glob to correct file:
glob_matrix = glob("*.tenv3", Dstat_path)

# extract decyr column from ResVert_matrix:
all_years = ResVert_matrix.decyr

# extract header names from ResVert_matrix:
stations = names(ResVert_matrix)
stations = DataFrame(stations[2:end, :], [:station]);
stations = stations.station;

# create (maybe nested?) loop to iterate though files and append to ResVert_matrix
for file in glob_matrix
    file_hold = readdlm(file)
    file_df = DataFrame(file_hold, [
	    :station,     #1
	    :date,	      #2	
	    :decyr,       #3
	    :MJD,         #4
	    :east_m_frac, #5
	    :vert_m_frac, #6
	    :east_sigma_m, #7
	    :vert_sigma_m, #8
	    :d_pred_east, #9
	    :d_pred_vert, #10
	    :res_east,    #11
	    :res_vert,    #12
	    :ol_east,     #13 
	    :ol_vert,     #14
	    :YYYY,        #15
	    :mm,	      #16	
	    :dd,	      #17	
	    :hh,	      #18
	    :day,         #19
	    :week,	      #20	
	    :day_of_week, #21
	    :J20000_sec]  #22
	)
    file_decyr = file_df.decyr
    file_resvert = file_df.res_vert
    file_stat = file_df.station

There are a few versions I’ve tried to go about, but, I keep getting either:

ERROR: MethodError: no method matching keys(::DataFrame)

Closest candidates are:
  keys(::Pkg.Registry.RegistryInstance)
   @ Pkg ~/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Registry/registry_instance.jl:447
  keys(::IndexLinear, ::GroupedDataFrame)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/groupeddataframe.jl:574
  keys(::Attributes)
   @ MakieCore ~/.julia/packages/MakieCore/UAwps/src/attributes.jl:36
  ...

Stacktrace:
 [1] pairs(collection::DataFrame)
   @ Base ./abstractdict.jl:172
 [2] findfirst(testf::var"#147#148"{String}, A::DataFrame)
   @ Base ./array.jl:2199
 [3] top-level scope
   @ ~/Documents/julia/epochs.jl:84

when I try this method:

for (i, stat) in enumerate(stations)
        match_Vidx = findfirst(x -> x[:decyr] == stat, ResVert_matrix) 
        
        if match_Vidx !== nothing
            set!(ResVert_matrix, :res_vert, match_Vidx, file_resvert[i])
        end
    end

And I will get this error:

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Vector{Any}, ::typeof(!), ::Vector{Symbol})

Closest candidates are:
  setindex!(::DataFrame, ::AbstractMatrix, ::typeof(!), ::AbstractVector)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:777
  setindex!(::DataFrame, ::AbstractVector, ::typeof(!), ::Union{AbstractString, Signed, Symbol, Unsigned})
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:674
  setindex!(::DataFrame, ::AbstractDataFrame, ::typeof(!), ::AbstractVector)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:757
  ...

Stacktrace:
 [1] top-level scope
   @ ~/Documents/julia/epochs.jl:85

If I try this method:

  for (i, stat) in enumerate(stations)
        vIdx = [tt in file_decyr for tt in all_years];
        ResVert_matrix[!, [:decyr, :stat, :vIdx]] = file_resvert
    end
end

The trouble-shooting that I have attempted as been variations on those two methods. I would convert certain variables, or switch out variables for :

 match_Vidx = findfirst(x -> x[:decyr] == stat, ResVert_matrix)

Since that is where the code keeps getting caught up on in that method. It would provide an output if I remove findfirst() so, I think my understanding isn’t quite up to par with how to use it, even though I have read the documentation on it.

I don’t remember exactly what I changed to get this error:

ERROR: MethodError: no method matching getindex(::Float64, ::Vector{Any})

Closest candidates are:
  getindex(::Number, ::Integer)
   @ Base number.jl:96
  getindex(::Number)
   @ Base number.jl:95
  getindex(::Number, ::Integer...)
   @ Base number.jl:101
  ...

Stacktrace:
 [1] (::var"#141#143"{DataFrame, String})(x::Float64)
   @ Main ~/Documents/julia/epochs.jl:85
 [2] findnext(testf::var"#141#143"{DataFrame, String}, A::Vector{Float64}, start::Int64)
   @ Base ./array.jl:2155
 [3] findfirst(testf::Function, A::Vector{Float64})
   @ Base ./array.jl:2206
 [4] top-level scope
   @ ~/Documents/julia/epochs.jl:85

But, I tried changing converting stat using convert(Vector{Any}, stat) but, that didn’t change anything.

So, I’m kind of at a loss with how to accomplish this. Any insight and help would be greatly appreciated. I’m sure there is some simple solution and I just haven’t come across it in my attempts to find a solution.

Thank you in advance from a weary grad student…

GeodeticR · April 24, 2024, 4:55am

Okay, I feel like I’m a bit closer now, but now I get an “Argument Error”:

ERROR: ArgumentError: New columns must have the same length as old columns
Stacktrace:
 [1] insert_single_column!(df::DataFrame, v::Vector{Any}, col_ind::Symbol)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:643
 [2] setindex!(df::DataFrame, v::Vector{Any}, ::typeof(!), col_ind::Symbol)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:675
 [3] hcat!(df1::DataFrame, df2::DataFrame; makeunique::Bool, copycols::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:1211
 [4] hcat!
   @ ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:1204 [inlined]
 [5] #hcat#134
   @ ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/abstractdataframe.jl:1607 [inlined]
 [6] top-level scope
   @ ~/Documents/julia/epochs.jl:61

When I try to use this method - which is just an adaptation from the previous methods:

for (i, stat) in enumerate(stations)
    file = "/home/rob/Documents/julia/dy_comb_tenv3/" * stat * "_with_decyr.tenv3"
    data_hold = readdlm(file)
    file_resvert = data_hold[:, 12]
    file_decyr = data_hold[:, 3]
    Vidx = [tt in all_years for tt in file_decyr]
    
    # append to ResVert_matrix
    ResVert_matrix = hcat(ResVert_matrix, DataFrame(file_resvert[Vidx, :], [stat]) ,makeunique=true)
    

end

But, I think I’m almost there…

TheLateKronos · April 24, 2024, 11:33am

It seems like the errors you show generally arise from simply attempting invalid operations. For example the last one, where you appear to have tried to set a column of a dataframe to a vector that is longer than the dataframe.

I can not help you with the specifics, but I think the following advice might be a little helpful:

Read the errors carefully. Understand why the error was thrown, and on which line. It is all in the stacktrace.
In general, one should build up a DataFrame either row by row, or column by column. Perhaps as individual vectors that are assembled into a dataframe all at once. Ensure that the lengths of all vectors match.
If you are still facing problems, ask questions here. The questions should be “boiled down” as much as possible to a specific operation/problem, and you should always provide example code that demonstrates the problem, and tell us what you wanted/expected to happen. At least, that is the best way to get actionable help quickly, instead of general reccomendations like the points above.

GeodeticR · April 24, 2024, 7:20pm

Thanks for the advice!

In general, one should build up a DataFrame either row by row, or column by column. Perhaps as individual vectors that are assembled into a dataframe all at once. Ensure that the lengths of all vectors match.

Overall, this is what I was trying to do, but, it became a little convoluted since my data is all of various lengths due to collection rate over the time frame I’m working with.

BUT! About 5-minutes ago I finally freaking figured it out and have my matrix with all the values in accordance to their date of collection and station ID.

I hate how long it takes me to get these things right, but, at least I know how to do it moving forward.
I was on the right track, but in the end, it only took a few lines of code to accomplish - which is also frustrating to me. lol

Anyway, here is the code if anyone runs into a similar issue of populating a DataFrame with individual vectors from different files:

# funciton to extract data: 
function read_stat(file)
    data = readdlm(file)
    return data[:, 3], data[:, 12]
end

for (i, stat) in enumerate(stations)
    
    station_file = "/path/to/files/" * stat * ".filetype"
    station_year, station_vert= read_stat(station_file)
    
    idx = [tt in station_year for tt in all_years]

    ResVert_matrix[idx, stat] = station_vert
end

Which returned exactly what I wanted:

 ResVert_matrix
1826×372 DataFrame
  Row │ decyr    AGMT          AHID          ALAM          ALBH          ALGO         AMC2          AOA1          APEX          ARGU         ARP7          AVRY         AZCN          AZRY          BALD          BARH          BBDM        ⋯
      │ Any      Any           Any           Any           Any           Any          Any           Any           Any           Any          Any           Any          Any           Any           Any           Any           Any         ⋯
──────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    1 │ 2001.0   -0.000411744  0.0016273     -0.000398521  -0.00154369   -0.00367479  -0.00377606   0.000589432   0.000119521   0.000294808  -0.00626774   -0.00458825  -0.00100321   -0.000928375  0.00160554    -0.00497022   0.00193155  ⋯
    2 │ 2001.0   -0.00587653   missing       0.000811201   -0.00580306   -0.00445074  -0.00780656   -0.00304909   -0.0048819    -0.00348866  -0.00227948   -0.00847696  0.00251856    0.000304095   -0.00155505   -0.00807116   -0.00098337
    3 │ 2001.01  -0.00250843   -0.00830314   -0.00640587   -0.00445854   -0.00628841  -0.00446863   -0.00831175   -0.00622012   -0.00754699  -0.000226333  -0.00480355  -0.00336224   -0.00424658   -0.00208271   -0.00481267   -0.00020949
    4 │ 2001.01  -0.00335193   -0.00411124   -0.00830572   -0.00246617   -0.00299993  -0.00302353   -0.000524398  -0.00503077   -0.00398079  0.0026784     -0.0082891   -0.00412552   -0.0047916    -0.00087102   -0.00139525   0.00189696
    5 │ 2001.01  -0.000724291  -0.00113138   -0.00221356   -0.00948098   -0.00768421  -0.00455579   0.00365335    -0.000681856  -0.00105213  0.00489513    -0.00761672  -0.00574064   -0.00232982   0.00213982    0.000119854   0.00454819  ⋯
    6 │ 2001.02  -0.00366692   0.00122687    -0.00376233   -0.00386293   -0.0076843   -0.00573267   0.000592682   -0.000672925  -0.00225953  0.00617666    -0.00770947  -0.00382143   -0.00347736   0.00170033    0.00113218    0.00400803
    7 │ 2001.02  0.00202142    -0.000370334  0.00127478    0.00212072    -0.00191051  -0.00345651   -0.0012234    0.00127418    0.00144519   -0.00283256   -0.00372445  -0.000904967  -0.00259769   0.00461467    -0.00454286   0.00242802
    8 │ 2001.02  0.002528      0.00681549    0.005107      0.00855641    -0.00354736  0.00468631    0.0218228     0.00289191    0.00930342   0.0111294     0.0132683    0.00219077    0.00509488    0.0120312     -0.00647434   0.00452286
  ⋮   │    ⋮          ⋮             ⋮             ⋮             ⋮             ⋮            ⋮             ⋮             ⋮             ⋮            ⋮             ⋮            ⋮             ⋮             ⋮             ⋮             ⋮      ⋱
 1820 │ 2005.98  -0.00278326   -0.000849939  -0.00194529   -0.000584743  0.00940156   0.00434791    0.000536809   -0.00307833   -0.00165011  0.00413097    0.00324508   -0.000559732  0.00297       0.000897028   -0.00371917   0.00367155  ⋯
 1821 │ 2005.98  0.00158751    -0.00139395   -0.00376292   0.000999944   missing      -0.000284147  -0.000457211  -0.00602245   -0.013211    0.00193471    -0.0037809   -0.00391347   -0.00143682   -0.00151329   -0.0029101    0.00210767
 1822 │ 2005.99  -0.00213419   -0.00300425   -0.0055404    -0.00378373   0.0197481    0.00539431    -0.00210714   -0.00282093   -0.00239019  0.007889      -0.00856227  0.0016841     0.000547202   0.000295151   0.00601662    -0.00213741
 1823 │ 2005.99  -0.0030068    -0.00705258   -0.000928956  -0.000576093  0.0187108    0.000278443   -0.00807216   -0.0053058    -0.00563386  -0.00453603   -0.00234492  -0.00314532   -0.00504702   -0.00117701   0.00225252    0.00335422
 1824 │ 2005.99  0.00228285    -0.000625412  -0.00605059   0.000920983   0.00956733   0.001654      -0.00277833   -0.000958274  -9.41359e-5  0.00132424    -0.00843808  0.00114439    -0.000827684  -0.00127774   -0.00147909   0.000944353 ⋯
 1825 │ 2006.0   -0.00237043   -0.0039793    -0.00508331   0.00410247    0.00760786   -0.0044748    -0.00982475   -0.00200118   -0.00413993  0.00147126    0.00211928   -0.000359883  0.00207839    -0.00749317   0.00220135    missing    
 1826 │ 2006.0   -0.00197461   -0.00434953   -0.00374166   0.0038909     0.00654577   0.00100091    -0.004524     -0.000261687  -0.00585236  0.00376626    -0.00731099  -0.000302615  0.000627537   0.00646965    -0.00630255   missing    
                                                                                                                                                                                                            356 columns and 1811 rows omitted

Now I can just replace my missing values with NaN or, something that won’t affect my calculations.

Topic		Replies	Views
How to send data to a dataframe when some values are NaN or empty? New to Julia	0	420	February 14, 2020
Initializing a dataframe New to Julia	23	10832	March 15, 2020
Populate a Dataframe from an array New to Julia question	4	935	March 4, 2020
Error: create a column from an existing array New to Julia question , dataframes	2	2897	July 21, 2020
Concatenating vectors of different length into a dataframe or table General Usage question	9	4883	October 6, 2020

Combine multiple Arrays into one large dataframe matrix:

Related topics