Indices in file names?

MuadhF · July 15, 2019, 8:05am

Hi,

If I’m sending data to a CSV file in a for loop function, this means that I would need multiple files for each run, but how do I ensure that the file names would be different so that overwriting is avoided? I thought maybe an index in the file for each loop, but I don’t know how. For example,

for i = 1:n
    .
    .
    .
    CSV.write("Solution Table.csv", df)
end

So for each loop I want a separate solution file, but right now it just gets overwritten and only one file is generated. Any help would be appreciated, thanks!

pmarg · July 15, 2019, 8:13am

You can interpolate the index:

for i in 1:n 
  .
CSV.write("Solution Table$i.csv",df)
end

MuadhF · July 15, 2019, 8:21am

Sweet! Thank you sir

JeffreySarnoff · July 15, 2019, 8:22am

another approach

const filecount = Ref(0)

basename = "data"
extension  = "csv"

function nextfilename(fileprefix=basename, ext=extension)
    newcount = filecount[] + 1
    filecount[] = newcount
    newfilename = string(fileprefix, "_", newcount, ".", ext)
    return newfilename
end

It runs like this


julia> nextfile = nextfilename()
"data_1.csv"

julia> nextfile = nextfilename()
"data_2.csv"

julia> nextfile = nextfilename()
"data_3.csv"

To include the full directory path, use

const filecount = Ref(0)

basename = "data"
extension  = "csv"
dirpath = "/home/working/data"

function nextfilename(fileprefix=basename, ext=extension, dir=dirpath)
    newcount = filecount[] + 1
    filecount[] = newcount
    newfilename = string(fileprefix, "_", newcount, ".", ext)
    return joinpath(dir,newfilename)
end

julia> nextfile = nextfilename()
"/home/working/data/data_1.csv"

julia> nextfile = nextfilename()
"/home/working/data/data_2.csv"

julia> nextfile = nextfilename()
"/home/working/data/data_3.csv"

And, of course, you may initialize the filecount to wherever you may have left off when generating many files over several sessions.

MuadhF · July 15, 2019, 8:44am

That’s pretty interesting, thanks for the alternative.

MuadhF · July 17, 2019, 11:59am

Just a quick question, would it be possible to do this for variables and other names as well?
I know you can just use regular indices like a[i] = x + i etc, but for some names such as CSV or Dataframe variables, they don’t work?

For example

for i = 1:2

 csv[i] = CSV.File(File_Df[i,1])

The above code doesn’t work however, or at least the way I did it doesn’t work. The $ symbol only works in strings, and I can’t use them for these names. Any advice would be appreciated

pmarg · July 17, 2019, 1:03pm

I am not sure what you want to achieve with the filenames but you can access DataFrames columns via interpolation:

df = DataFrame(A1 = 1:4, A2 = 1:4)
for i in 1:2 
 println(df[!,Symbol("A$i")])
end

MuadhF · July 18, 2019, 5:13am

Oh okay, basically what I’m trying to do is name DataFrames with indices. So in a for loop, I want to name the DataFrames with the corresponding indices, for example DataFrame1 when i = 1 ,
DataFrame2 when i = 2 etc. So it’s the exact same initial question in this topic, but instead of naming an actual file, I’m trying to name a DataFrame. I tried the $ interpolation method but it didn’t work.

Hope that somewhat makes it clearer. Thanks!

Tamas_Papp · July 18, 2019, 5:25am

It cannot be done directly, as DataFrame only supports symbols for column names.

You can convert to string, then Symbol, eg

julia> DataFrame(Symbol("1") => 1:2, Symbol("2") => 3:4)
2×2 DataFrame
│ Row │ 1     │ 2     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 3     │
│ 2   │ 2     │ 4     │

but I would advise against this, there is probably a less convoluted way.

Why do you need this?

Does the object need to be a DataFrame, or would a Dict{Int,Vector} or Vector{Vector} work better?

pmarg · July 18, 2019, 5:36am

There is an old (given Julia’s age) discussion here: Redirecting to Google Groups

You can read the pros and cons and some suggestions for alternative approaches. I agree with @Tamas_Papp that one should think why is this preferred compared to a Dict or even an Array. In any case, you can try something like this:

for n in 1:10
    @eval $(Symbol("DataFrame_$n")) = $n
end

MuadhF · July 18, 2019, 6:13am

I’m quite new to Julia, so right now I honestly am not sure what is better, but I’ll tell you what I’m trying to do, and forgive me if it’s not concise. I have a set of input CSV files that I want the Julia to read, a pair of files for each iteration. So I put the directories of these files into another CSV file (Input files.csv). I then created a DataFrame with a string column of all these input files (File_Df), and then in the for loop, using that DataFrame I created more dataframes for variable assignment later. I know this probably sounds confusing and inefficient, so I’ll provide part of the code and hopefully it’ll make sense, and I hope there is a better way to do it.

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false)
File_Df = DataFrame(Input_Files)

for i = 1:length(Input_Files)
    Df1 = DataFrame(CSV.File(File_Df[i,1]))
	Df2= DataFrame(CSV.File(File_Df[i,2]))

    kE = Df1[1,2]
    d = Df1[2,2]

I’m guessing this is a very inefficient way, so please advise if there is a better way to do it. Regarding the file names, this is where I wanted to name the DataFrames in the for loop with indices so that I can have separate DataFrames without being overwritten.

Sorry for the long post once again, any advice would be greatly appreciated.

Tamas_Papp · July 18, 2019, 6:20am

You example code was probably cut off while pasting. Do you need the kE and the d as pairs?

The key issue here is not really efficiency, but in what format you want your data eventually. This depends on what you want to do with it.

MuadhF · July 18, 2019, 6:26am

Oh no they’re just variables that are being assigned values from the DataFrame. I didn’t paste everything as I thought it was not relevant, but here it is.

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false)
File_Df = DataFrame(Input_Files)

for i = 1:length(Input_Files)
    Df1 = DataFrame(CSV.File(File_Df[i,1]))
	Df2= DataFrame(CSV.File(File_Df[i,2]))

    kE = Df1[1,2]
    d = Df1[2,2]
    x = Df1[3,2]
	T = round(Int, Df1[4,2])
	CT = Df1[5,2]
	D = Df1[6,2]
	k_yearly = Df1[7,2]

So these are all variables that take values from the CSV files via the DataFrame. The objective of my code is an optimization model, so these are just variables that go into constraints.

pmarg · July 18, 2019, 7:20am

If Input files.csv contains the paths like this:

/..../path1.csv
/..../path2.csv
/..../path.csv
...

then you can try:

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false) |> DataFrame
df = DataFrame.(CSV.File.(Input_Files[!,:Column1]))

you obtain a n-element Array{DataFrame,1}: of DataFrames and you can access it via
df[1], df[2], …df[n]. There is no need to create all these variable names dynamically.

MuadhF · July 18, 2019, 7:27am

I tried that, but I’m getting

ERROR: LoadError: MethodError: no method matching getindex(::DataFrame, ::typeof(!), ::Symbol)
Closest candidates are:
  getindex(::DataFrame, ::Integer, ::Symbol) at C:\Users\Muadh\.julia\packages\DataFrames\0Em9Q\src\dataframe\dataframe.jl:327
  getindex(::DataFrame, ::AbstractArray{T,1} where T, ::Union{Signed, Symbol, Unsigned}) at C:\Users\Muadh\.julia\packages\DataFrames\0Em9Q\src\dataframe\dataframe.jl:337
  getindex(::DataFrame, ::Colon, ::Union{Signed, Symbol, Unsigned}) at C:\Users\Muadh\.julia\packages\DataFrames\0Em9Q\src\dataframe\dataframe.jl:358
  ...
Stacktrace:
 [1] top-level scope at none:0
in expression starting at untitled-811a74ece6c83e1b6ec45450a45a95fe:5

pmarg · July 18, 2019, 7:29am

Try this one:

Input_Files = CSV.File("C:\\Users\\Muadh\\.atom\\Input files.csv", header = false) |> DataFrame
df = DataFrame.(CSV.File.(Input_Files[:Column1]))

Maybe you have an older version of DataFrames? Try to up because there is a slight change in the DataFrames syntax very recently.

MuadhF · July 18, 2019, 7:51am

Thank you very much! I updated the pkg and it worked.

Topic		Replies	Views
Write function in loop New to Julia question	5	474	January 13, 2022
File name depending on the index New to Julia	2	619	March 23, 2020
Renaming multiple csv files New to Julia filesystem	5	548	October 19, 2021
Iteration in for loop New to Julia question	3	331	June 1, 2021
Looping Over Two Variables (indexes) in a DataFrame New to Julia question	1	280	June 29, 2022

Indices in file names?

Related topics