Hello,
I would like to know if there is better way to stack multidimensional array like vcat Below is the code which i am working with.
data1 = rand(10000,3072);
data2 = rand(10000,3072);
data3 = rand(10000,3072);
data4 = rand(10000,3072);
data5 = rand(10000,3072);
function stack_Array_1()
Xaxis = [];
for i=1:5
push!(Xaxis,eval(Symbol("data$i")))
end
Xaxis
end
function stack_Array_2()
Yaxis = Array{Float64,2}(0,3072)
for i=1:5
Yaxis = vcat(Yaxis,eval(Symbol("data$i")))
end
Yaxis
end
@btime stack_Array_1();
2.319 μs (33 allocations: 1.61 KiB)
Any[5]
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
@btime stack_Array_2();
1.973 s (75 allocations: 3.43 GiB)
50000×3072 Array{Float64,2}:
0.635014 0.462685 0.185559 0.295888 … 0.123956 0.971991 0.559269
0.409272 0.57993 0.820811 0.993251 0.12653 0.553527 0.577177
0.660994 0.114057 0.702278 0.119705 0.354153 0.681063 0.057957
0.11004 0.729124 0.25563 0.717678 … 0.554839 0.800087 0.779025
0.854646 0.217248 0.834483 0.49127 0.245325 0.748648 0.725246
0.733918 0.799065 0.349517 0.917985 0.619041 0.0812406 0.321144
⋮ ⋱ ⋮
0.397121 0.578101 0.832732 0.508987 0.85815 0.61081 0.735447
0.355548 0.771022 0.584872 0.0710232 0.901111 0.567234 0.735604
0.281855 0.117889 0.164787 0.719332 0.359149 0.668798 0.570658
0.0783613 0.947521 0.327537 0.722403 … 0.152016 0.173811 0.346503
0.905858 0.611356 0.158429 0.0897009 0.788216 0.790752 0.968152
0.293524 0.558019 0.123042 0.221605 0.325241 0.666398 0.310829
I would like the Arrays to be vertically concatenated which is done by function stack_Array_2() but the performance is poor. stack_Array_1() does perform good but iam not sure if there is a way to stack the arrays. Kindly let me know if there is any way to get the desired result.
Thank You.
dataarrays = [rand(10000, 3072) for _ in 1:5]
vcat(dataarrays...)
Don’t use eval here. Don’t vcat recursively, use the ... operator.
2 Likes
@Tamas_Papp
dataarrays = [rand(10000, 3072) for _ in 1:5]
I am working on CIFAR10 python pickle dataset. Which has 5 batch (type Array{UInt8,2}(10000,3072) each) files with 10000 records in each. Hence had use 5 different rand arrays as an example. But Thank you for the above list comprehension method it will be useful for testing.
Don’t use eval here
Kindly elaborate. since i had to loop over 5 different datasets, i thought that was my option to loop over all of it. I will be glad to learn if thats the wrong way to go about it.
use the … operator.
Thank you for the splat operator. Performance has improved a bit for the below code Please let me know if i am doing anything wrong.
@btime final_data1 = stack_Array_1();
2.353 μs (33 allocations: 1.61 KiB)
@btime final_data2 = stack_Array_2();
2.274 s (75 allocations: 3.43 GiB)
@btime final_data3 = stack_Array_3();
668.435 ms (49 allocations: 1.14 GiB)
function stack_Array_1()
Xaxis = [];
for i=1:5
push!(Xaxis,eval(Symbol("data$i")))
end
Xaxis
end
function stack_Array_2()
Yaxis = Array{Float64,2}(0,3072)
for i=1:5
Yaxis = vcat(Yaxis,eval(Symbol("data$i")))
end
Yaxis
end
function stack_Array_3()
Xaxis = [];
for i=1:5
push!(Xaxis,eval(Symbol("data$i")))
end
vcat(Xaxis...)
end
I told you above, yet you are posting the same code (with eval and recursive vcat).
eval is not necessary here. Generally, you should not touch eval unless for generated code.
I understand that rand is for the MWE, but just read whatever data structure you have into a vector of arrays, if that’s the most convenient.
@Tamas_Papp
you should not touch eval unless for generated code.
Understood. I will keep this in mind. I didn’t know any other way to iterate to variables in for loop hence i had to use eval
Below is my code which i am working on. And trying to fix the performance issue with Stacking the arrays.
using PyCall
@pyimport pickle
function load_pickle_data(ROOT)
xs=[]
ys=[]
for b=1:5
f=joinpath(ROOT, "data_batch_$b")
X,Y = pickle_batch(f)
push!(xs,X)
push!(ys,Y)
end
(vcat(xs...),ys)
end
function pickle_batch(file)
fo=open(file,"r")
datadict = pickle.loads(pybytes(read(fo)))
X=datadict["data"]
Y=datadict["labels"]
(X,Y)
end
Don’t iteratively create variables in a loop in normal code. Use some other data structure (an array, a dictionary, etcetera).
1 Like