Question about Float64 and Int?

Dear all,

I have an Int array and Float64 array. And I want to merge them.

julia> a =[1, 2, 3];

julia> b =[1.0, 2.0, 3.0];

julia> A =hcat(a, b)
3×2 Matrix{Float64}:
 1.0  1.0
 2.0  2.0
 3.0  3.0

Here I get a Float64. But I want the first column to be Int64. The result I want is

1  1.0
2  2.0
3  3.0

How can I write the julia code?

Thanks in advance.

julia> A = Union{Int,Float64}[a b]
3×2 Matrix{Union{Float64, Int64}}:
 1  1.0
 2  2.0
 3  3.0
1 Like

It works well. Thanks so much.

The Union array is going to be slower, since it’s heterogeneous, so if you can do with what hcat gave you I would consider it. You might want to look into a Dataframe, it’s a kind of Matrix structure, where each column has its own type.

I did try to make a fully heterogeneous array (but failed), just to find out how I would do it:

julia> A = hcat(Vector{Any}(a), Vector{Any}(b))
3×2 Matrix{Real}:
 1  1.0
 2  2.0
 3  3.0

That, I believe, would by as slow as the Any array had I succeeded (i.e. with Any instead of Real there). I’m not sure if the Union array will be slightly faster, likely, since with Real is a bit more general.

2 Likes

Thanks for your help. I try to compare two methods.

julia> @time begin
       dataset = Union{Int,Float64}[Tshock_dataset idn time_dataset k_dataset a_dataset zt_dataset zp_dataset l_dataset y_dataset mrpk_dataset rev_dataset constrained_dataset r_dataset logTFPobserved_dataset MIS_dataset RR_dataset yy_dataset YY_dataset KK_dataset LL_dataset AA_dataset mrpl_dataset Psi_dataset c_dataset effective_lambda_dataset CC_dataset COSTRAINED_dataset STD_MRPK_dataset q_dataset]
       end
  0.062220 seconds (666 allocations: 92.163 MiB)
370000×29 Matrix{Union{Float64, Int64}}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

julia> @time begin
       dataset = hcat(Matrix{Any}(Tshock_dataset), idn, time_dataset, k_dataset, a_dataset, zt_dataset, zp_dataset, l_dataset, y_dataset, mrpk_dataset, rev_dataset, constrained_dataset, r_dataset, logTFPobserved_dataset, MIS_dataset, RR_dataset, yy_dataset, YY_dataset, KK_dataset, LL_dataset, AA_dataset, mrpl_dataset, Psi_dataset, c_dataset, effective_lambda_dataset, CC_dataset, COSTRAINED_dataset, STD_MRPK_dataset, q_dataset)
       end
  0.254061 seconds (9.60 M allocations: 231.259 MiB)
370000×29 Matrix{Any}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

In my codes, It seems that Union is faster than hcat.

What about:

Real[a b]
julia> Any[a b]
3×2 Matrix{Any}:
 1  1.0
 2  2.0
 3  3.0

EDIT: Alternatively (if the hcat syntactic suger needs to be avoided for some reason):

julia> A = Matrix{Any}(undef, 3, 2); A[:,1] = a; A[:,2] = b; A
3×2 Matrix{Any}:
 1  1.0
 2  2.0
 3  3.0
1 Like
julia> @time begin
           dataset = Real[Tshock_dataset idn time_dataset k_dataset a_dataset zt_dataset zp_dataset l_dataset y_dataset mrpk_dataset rev_dataset constrained_dataset r_dataset logTFPobserved_dataset MIS_dataset RR_dataset yy_dataset YY_dataset KK_dataset LL_dataset AA_dataset mrpl_dataset Psi_dataset c_dataset effective_lambda_dataset CC_dataset COSTRAINED_dataset STD_MRPK_dataset q_dataset]
       end
  0.490252 seconds (9.60 M allocations: 228.431 MiB, 59.31% gc time)
370000×29 Matrix{Real}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

julia> @time begin
           dataset = Any[Tshock_dataset idn time_dataset k_dataset a_dataset zt_dataset zp_dataset l_dataset y_dataset mrpk_dataset rev_dataset constrained_dataset r_dataset logTFPobserved_dataset MIS_dataset RR_dataset yy_dataset YY_dataset KK_dataset LL_dataset AA_dataset mrpl_dataset Psi_dataset c_dataset effective_lambda_dataset CC_dataset COSTRAINED_dataset STD_MRPK_dataset q_dataset]
       end
  0.188767 seconds (9.60 M allocations: 228.431 MiB)
370000×29 Matrix{Any}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

It seems that Union{Int, Float64} is the fastest way.

True, but I was under the impression that small type unions were still pretty fast these days. See, e.g., Union-splitting: what it is, and why you should care.

By the way @Raymond, it appears you are timing the creation of dataset, whereas I believe @Palli was talking about how fast methods that operate on dataset would be.

EDIT: I’m referring to this quote:

EDIT: Here’s an illustration.

julia> A = randn(1000, 100); typeof(A)
Matrix{Float64} (alias for Array{Float64, 2})

julia> B = Matrix{Union{Int,Float64}}(A); typeof(B)
Matrix{Union{Float64, Int64}} (alias for Array{Union{Float64, Int64}, 2})

julia> C = Matrix{Real}(A); typeof(C)
Matrix{Real} (alias for Array{Real, 2})

julia> D = Matrix{Any}(A); typeof(D)
Matrix{Any} (alias for Array{Any, 2})

julia> A == B == C == D
true

julia> function f(mat)
           map(mat) do x
               x^2 + sin(13 * x)
           end
       end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> X1 = @btime f($A);
  1.678 ms (2 allocations: 781.30 KiB)

julia> X2 = @btime f($B);
  1.798 ms (2 allocations: 781.30 KiB)

julia> X3 = @btime f($C);
  3.503 ms (100005 allocations: 2.29 MiB)

julia> X4 = @btime f($D);
  3.487 ms (100005 allocations: 2.29 MiB)

julia> X1 == X2 == X3 == X4
true

So, f is faster when using Matrix{Float64} or Matrix{Union{Int,Float64}} inputs but slower when using Matrix{Real} or Matrix{Any} inputs.

EDIT (this will be the last one!): If I use a for loop instead of map, the timing of f(C) and f(D) is about 3 times worse, and if I use broadcasting instead of a loop the timing of f(C) and f(D) is an additional 2 times worse! Meanwhile, f(A) and f(B) take approximately the same amount of time in each case. My guess as to why using map is faster for f(C) and f(D) is because I am introducing a function barrier, though I’m not entirely sure that’s the reason.

3 Likes