DataFrame : eltypes with variable length

Fred · September 21, 2017, 8:39am

Hi,

The function
readtable(filename, [keyword options])

as an optional keyword

eltypes::Vector – Specify the types of all columns. Defaults to [].

When the exact size of the table, it is possible to specify the types of the columns

 x = readtable("data/data.csv", separator = '\t' , eltypes = [String, Float64, Float64, Float64, Float64])
4×5 DataFrames.DataFrame
│ Row │ id   │ s1       │ s2       │ s3       │ s4       │
├─────┼──────┼──────────┼──────────┼──────────┼──────────┤
│ 1   │ "g1" │ 0.134978 │ 0.231912 │ 0.479582 │ 0.134978 │
│ 2   │ "g2" │ 0.972158 │ 0.437821 │ NA       │ 0.848548 │
│ 3   │ "g3" │ 0.152925 │ NA       │ 0.848548 │ 0.152925 │
│ 4   │ "g4" │ 0.813864 │ 0.972158 │ 0.917429 │ 0.813864 │

But if the size of the table is not known and I only know that the first column is of type String how I can set eltypes ?

eltypes = [String, Float64...]]

Thanks !

quinnj · September 21, 2017, 9:29am

With the CSV.jl package, you can just do

CSV.read("data/data.csv", delim='\t', types=Dict(1=>String))

this will return a DataFrame by default.

Fred · September 21, 2017, 10:02am

Thank you very much Quinnj !

The reason I stay with DataFrames and readtable is that I have a better speed with readtable even if I don’t specify the eltypes. But it is possible that I have done something wrong.

$ julia DataCSV.jl 

WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /home/fred/.julia/v0.6/NullableArrays/src/operators.jl:99.
Reading...	data.csv
Reading...	data2.csv
elapsed time: 3.95377051 seconds

$ julia DataFrames.jl 
Reading...	data.csv
Reading...	data2.csv
elapsed time: 1.566749483 seconds

DataCSV

using CSV

##########################################
# read dataframe
function readTable(file, sep, h)
    println("Reading...\t", file)
    x = CSV.read(file ; delim = sep, types=Dict(1=>String), header = h, null="NA") # read data file
    return x
end

function main()
    sep = '\t'       # table separator
    h = true         # table header
  
    # process data
    f = ["data.csv", "data2.csv"]
    
    for file in f
        tab = readTable(file, sep, h)
    end
end

##########################################

tic()
main()
toc()

DataFrames

using DataFrames

# read dataframe
function readTable(file, sep, h)
    println("Reading...\t", file)
    x = readtable(file , separator = sep, header = h) # read data file
    return x
end

function main()
    sep = '\t'       # table separator
    h = true         # table header
  
    # process data
    f = ["data.csv", "data2.csv"]
    
    for file in f
        tab = readTable(file, sep, h)
    end
end

##########################################

tic()
main()
toc()

data.csv (tab separator)
id	s1	s2	s3	s4
g1	0.1349779443	0.2319120248	0.4795815343	0.1349779443
g2	0.9721584522	0.4378209082	0.8485481786
g3	0.1529253099	0.8485481786	0.1529253099
g4	0.8138636984	0.9721584522	0.9174289651	0.8138636984

data2.csv
id	s1	s2	s3	s4
g1	0.2235082715	0.726445808	0.3964289063	0.2169791684
g2	0.6151192371	0.7863019568	0.6236194363	
g3	0.9810212048	0.2967554158	0.5556356032
g4	0.0347811024	0.5602313542	0.1317892775	0.4228049423

Topic		Replies	Views
Is there a way to read a DataFrame from file specifying the type of each column? New to Julia question	7	109	November 1, 2024
Specifying column type efficiently in CSV.read for large datasets General Usage	4	616	June 22, 2020
How to specify `CSV.read` column types? General Usage question , type , csv	4	2132	August 7, 2018
XLSX and DataFrame column type General Usage dataframes , makie , xlsx , cairomakie	3	448	February 25, 2024
How to directly specify datatype when reading from excel? Matrix{Any} to Vector{Int64} General Usage question , type , excel	4	302	October 18, 2022

DataFrame : eltypes with variable length

Related topics