Stack multidimensional Array

Hello,

I would like to know if there is better way to stack multidimensional array like vcat Below is the code which i am working with.

data1 = rand(10000,3072);
data2 = rand(10000,3072);
data3 = rand(10000,3072);
data4 = rand(10000,3072);
data5 = rand(10000,3072);

function stack_Array_1()
  Xaxis = [];
  for i=1:5
    push!(Xaxis,eval(Symbol("data$i")))
  end
  Xaxis
end

function stack_Array_2()
  Yaxis = Array{Float64,2}(0,3072)
  for i=1:5
    Yaxis = vcat(Yaxis,eval(Symbol("data$i")))
  end
  Yaxis
end
@btime stack_Array_1();
  2.319 μs (33 allocations: 1.61 KiB)

Any[5]
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
10000×3072 Array{Float64,2}:
@btime stack_Array_2();
   1.973 s (75 allocations: 3.43 GiB)

50000×3072 Array{Float64,2}:
 0.635014   0.462685   0.185559   0.295888   …  0.123956  0.971991   0.559269
 0.409272   0.57993    0.820811   0.993251      0.12653   0.553527   0.577177
 0.660994   0.114057   0.702278   0.119705      0.354153  0.681063   0.057957
 0.11004    0.729124   0.25563    0.717678   …  0.554839  0.800087   0.779025
 0.854646   0.217248   0.834483   0.49127       0.245325  0.748648   0.725246
 0.733918   0.799065   0.349517   0.917985      0.619041  0.0812406  0.321144
 ⋮                                           ⋱            ⋮                  
 0.397121   0.578101   0.832732   0.508987      0.85815   0.61081    0.735447
 0.355548   0.771022   0.584872   0.0710232     0.901111  0.567234   0.735604
 0.281855   0.117889   0.164787   0.719332      0.359149  0.668798   0.570658
 0.0783613  0.947521   0.327537   0.722403   …  0.152016  0.173811   0.346503
 0.905858   0.611356   0.158429   0.0897009     0.788216  0.790752   0.968152
 0.293524   0.558019   0.123042   0.221605      0.325241  0.666398   0.310829

I would like the Arrays to be vertically concatenated which is done by function stack_Array_2() but the performance is poor. stack_Array_1() does perform good but iam not sure if there is a way to stack the arrays. Kindly let me know if there is any way to get the desired result.

Thank You.

dataarrays = [rand(10000, 3072) for _ in 1:5]

vcat(dataarrays...)

Don’t use eval here. Don’t vcat recursively, use the ... operator.

2 Likes

@Tamas_Papp

dataarrays = [rand(10000, 3072) for _ in 1:5]

I am working on CIFAR10 python pickle dataset. Which has 5 batch (type Array{UInt8,2}(10000,3072) each) files with 10000 records in each. Hence had use 5 different rand arrays as an example. But Thank you for the above list comprehension method it will be useful for testing.

Don’t use eval here

Kindly elaborate. since i had to loop over 5 different datasets, i thought that was my option to loop over all of it. I will be glad to learn if thats the wrong way to go about it.

use the … operator.

Thank you for the splat operator. Performance has improved a bit for the below code Please let me know if i am doing anything wrong.

@btime final_data1 = stack_Array_1();
  2.353 μs (33 allocations: 1.61 KiB)
@btime final_data2 = stack_Array_2();
  2.274 s (75 allocations: 3.43 GiB)
@btime final_data3 = stack_Array_3();
  668.435 ms (49 allocations: 1.14 GiB)
function stack_Array_1()
  Xaxis = [];
  for i=1:5
    push!(Xaxis,eval(Symbol("data$i")))
  end
  Xaxis
end

function stack_Array_2()
  Yaxis = Array{Float64,2}(0,3072)
  for i=1:5
    Yaxis = vcat(Yaxis,eval(Symbol("data$i")))
  end
  Yaxis
end

function stack_Array_3()
  Xaxis = [];
  for i=1:5
    push!(Xaxis,eval(Symbol("data$i")))
  end
  vcat(Xaxis...)
end

I told you above, yet you are posting the same code (with eval and recursive vcat).

eval is not necessary here. Generally, you should not touch eval unless for generated code.

I understand that rand is for the MWE, but just read whatever data structure you have into a vector of arrays, if that’s the most convenient.

@Tamas_Papp

you should not touch eval unless for generated code.

Understood. I will keep this in mind. I didn’t know any other way to iterate to variables in for loop hence i had to use eval

Below is my code which i am working on. And trying to fix the performance issue with Stacking the arrays.

using PyCall
@pyimport pickle

function load_pickle_data(ROOT)
	xs=[]
	ys=[]

 	for b=1:5
		f=joinpath(ROOT, "data_batch_$b")
		X,Y = pickle_batch(f)

		push!(xs,X)
		push!(ys,Y)

	end
	(vcat(xs...),ys)
end

function pickle_batch(file)
	fo=open(file,"r")
	datadict = pickle.loads(pybytes(read(fo)))
	X=datadict["data"]
	Y=datadict["labels"]
	(X,Y)
end

Don’t iteratively create variables in a loop in normal code. Use some other data structure (an array, a dictionary, etcetera).

1 Like