Indices in file names?

Hi,

If I’m sending data to a CSV file in a for loop function, this means that I would need multiple files for each run, but how do I ensure that the file names would be different so that overwriting is avoided? I thought maybe an index in the file for each loop, but I don’t know how. For example,

for i = 1:n
    .
    .
    .
    CSV.write("Solution Table.csv", df)
end

So for each loop I want a separate solution file, but right now it just gets overwritten and only one file is generated. Any help would be appreciated, thanks!

You can interpolate the index:

for i in 1:n 
  .
CSV.write("Solution Table$i.csv",df)
end
2 Likes

Sweet! Thank you sir

another approach

const filecount = Ref(0)

basename = "data"
extension  = "csv"

function nextfilename(fileprefix=basename, ext=extension)
    newcount = filecount[] + 1
    filecount[] = newcount
    newfilename = string(fileprefix, "_", newcount, ".", ext)
    return newfilename
end

It runs like this


julia> nextfile = nextfilename()
"data_1.csv"

julia> nextfile = nextfilename()
"data_2.csv"

julia> nextfile = nextfilename()
"data_3.csv"

To include the full directory path, use

const filecount = Ref(0)

basename = "data"
extension  = "csv"
dirpath = "/home/working/data"

function nextfilename(fileprefix=basename, ext=extension, dir=dirpath)
    newcount = filecount[] + 1
    filecount[] = newcount
    newfilename = string(fileprefix, "_", newcount, ".", ext)
    return joinpath(dir,newfilename)
end

julia> nextfile = nextfilename()
"/home/working/data/data_1.csv"

julia> nextfile = nextfilename()
"/home/working/data/data_2.csv"

julia> nextfile = nextfilename()
"/home/working/data/data_3.csv"

And, of course, you may initialize the filecount to wherever you may have left off when generating many files over several sessions.

2 Likes

That’s pretty interesting, thanks for the alternative.

Just a quick question, would it be possible to do this for variables and other names as well?
I know you can just use regular indices like a[i] = x + i etc, but for some names such as CSV or Dataframe variables, they don’t work?

For example

for i = 1:2

 csv[i] = CSV.File(File_Df[i,1]) 

The above code doesn’t work however, or at least the way I did it doesn’t work. The $ symbol only works in strings, and I can’t use them for these names. Any advice would be appreciated

I am not sure what you want to achieve with the filenames but you can access DataFrames columns via interpolation:

df = DataFrame(A1 = 1:4, A2 = 1:4)
for i in 1:2 
 println(df[!,Symbol("A$i")])
end
1 Like

Oh okay, basically what I’m trying to do is name DataFrames with indices. So in a for loop, I want to name the DataFrames with the corresponding indices, for example DataFrame1 when i = 1 ,
DataFrame2 when i = 2 etc. So it’s the exact same initial question in this topic, but instead of naming an actual file, I’m trying to name a DataFrame. I tried the $ interpolation method but it didn’t work.

Hope that somewhat makes it clearer. Thanks!

It cannot be done directly, as DataFrame only supports symbols for column names.

You can convert to string, then Symbol, eg

julia> DataFrame(Symbol("1") => 1:2, Symbol("2") => 3:4)
2×2 DataFrame
│ Row │ 1     │ 2     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 3     │
│ 2   │ 2     │ 4     │

but I would advise against this, there is probably a less convoluted way.

Why do you need this?

Does the object need to be a DataFrame, or would a Dict{Int,Vector} or Vector{Vector} work better?

There is an old (given Julia’s age) discussion here: https://groups.google.com/forum/#!topic/julia-users/us7abUVhDHw

You can read the pros and cons and some suggestions for alternative approaches. I agree with @Tamas_Papp that one should think why is this preferred compared to a Dict or even an Array. In any case, you can try something like this:

for n in 1:10
    @eval $(Symbol("DataFrame_$n")) = $n
end
1 Like

I’m quite new to Julia, so right now I honestly am not sure what is better, but I’ll tell you what I’m trying to do, and forgive me if it’s not concise. I have a set of input CSV files that I want the Julia to read, a pair of files for each iteration. So I put the directories of these files into another CSV file (Input files.csv). I then created a DataFrame with a string column of all these input files (File_Df), and then in the for loop, using that DataFrame I created more dataframes for variable assignment later. I know this probably sounds confusing and inefficient, so I’ll provide part of the code and hopefully it’ll make sense, and I hope there is a better way to do it.

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false)
File_Df = DataFrame(Input_Files)

for i = 1:length(Input_Files)
    Df1 = DataFrame(CSV.File(File_Df[i,1]))
	Df2= DataFrame(CSV.File(File_Df[i,2]))

    kE = Df1[1,2]
    d = Df1[2,2]

I’m guessing this is a very inefficient way, so please advise if there is a better way to do it. Regarding the file names, this is where I wanted to name the DataFrames in the for loop with indices so that I can have separate DataFrames without being overwritten.

Sorry for the long post once again, any advice would be greatly appreciated.

You example code was probably cut off while pasting. Do you need the kE and the d as pairs?

The key issue here is not really efficiency, but in what format you want your data eventually. This depends on what you want to do with it.

Oh no they’re just variables that are being assigned values from the DataFrame. I didn’t paste everything as I thought it was not relevant, but here it is.

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false)
File_Df = DataFrame(Input_Files)

for i = 1:length(Input_Files)
    Df1 = DataFrame(CSV.File(File_Df[i,1]))
	Df2= DataFrame(CSV.File(File_Df[i,2]))

    kE = Df1[1,2]
    d = Df1[2,2]
    x = Df1[3,2]
	T = round(Int, Df1[4,2])
	CT = Df1[5,2]
	D = Df1[6,2]
	k_yearly = Df1[7,2]

So these are all variables that take values from the CSV files via the DataFrame. The objective of my code is an optimization model, so these are just variables that go into constraints.

If Input files.csv contains the paths like this:

/..../path1.csv
/..../path2.csv
/..../path.csv
...

then you can try:

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false) |> DataFrame
df = DataFrame.(CSV.File.(Input_Files[!,:Column1]))

you obtain a n-element Array{DataFrame,1}: of DataFrames and you can access it via
df[1], df[2], …df[n]. There is no need to create all these variable names dynamically.

1 Like

I tried that, but I’m getting

ERROR: LoadError: MethodError: no method matching getindex(::DataFrame, ::typeof(!), ::Symbol)
Closest candidates are:
  getindex(::DataFrame, ::Integer, ::Symbol) at C:\Users\Muadh\.julia\packages\DataFrames\0Em9Q\src\dataframe\dataframe.jl:327
  getindex(::DataFrame, ::AbstractArray{T,1} where T, ::Union{Signed, Symbol, Unsigned}) at C:\Users\Muadh\.julia\packages\DataFrames\0Em9Q\src\dataframe\dataframe.jl:337
  getindex(::DataFrame, ::Colon, ::Union{Signed, Symbol, Unsigned}) at C:\Users\Muadh\.julia\packages\DataFrames\0Em9Q\src\dataframe\dataframe.jl:358
  ...
Stacktrace:
 [1] top-level scope at none:0
in expression starting at untitled-811a74ece6c83e1b6ec45450a45a95fe:5

Try this one:

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false) |> DataFrame
df = DataFrame.(CSV.File.(Input_Files[:Column1]))

Maybe you have an older version of DataFrames? Try to up because there is a slight change in the DataFrames syntax very recently.

Thank you very much! I updated the pkg and it worked.