Question about Float64 and Int?

Raymond · May 24, 2022, 5:14pm

Dear all,

I have an Int array and Float64 array. And I want to merge them.

julia> a =[1, 2, 3];

julia> b =[1.0, 2.0, 3.0];

julia> A =hcat(a, b)
3×2 Matrix{Float64}:
 1.0  1.0
 2.0  2.0
 3.0  3.0

Here I get a Float64. But I want the first column to be Int64. The result I want is

1  1.0
2  2.0
3  3.0

How can I write the julia code?

Thanks in advance.

StevenWhitaker · May 24, 2022, 5:22pm

julia> A = Union{Int,Float64}[a b]
3×2 Matrix{Union{Float64, Int64}}:
 1  1.0
 2  2.0
 3  3.0

Raymond · May 24, 2022, 5:24pm

It works well. Thanks so much.

Palli · May 24, 2022, 5:40pm

The Union array is going to be slower, since it’s heterogeneous, so if you can do with what hcat gave you I would consider it. You might want to look into a Dataframe, it’s a kind of Matrix structure, where each column has its own type.

I did try to make a fully heterogeneous array (but failed), just to find out how I would do it:

julia> A = hcat(Vector{Any}(a), Vector{Any}(b))
3×2 Matrix{Real}:
 1  1.0
 2  2.0
 3  3.0

That, I believe, would by as slow as the Any array had I succeeded (i.e. with Any instead of Real there). I’m not sure if the Union array will be slightly faster, likely, since with Real is a bit more general.

Raymond · May 24, 2022, 7:37pm

Thanks for your help. I try to compare two methods.

julia> @time begin
       dataset = Union{Int,Float64}[Tshock_dataset idn time_dataset k_dataset a_dataset zt_dataset zp_dataset l_dataset y_dataset mrpk_dataset rev_dataset constrained_dataset r_dataset logTFPobserved_dataset MIS_dataset RR_dataset yy_dataset YY_dataset KK_dataset LL_dataset AA_dataset mrpl_dataset Psi_dataset c_dataset effective_lambda_dataset CC_dataset COSTRAINED_dataset STD_MRPK_dataset q_dataset]
       end
  0.062220 seconds (666 allocations: 92.163 MiB)
370000×29 Matrix{Union{Float64, Int64}}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

julia> @time begin
       dataset = hcat(Matrix{Any}(Tshock_dataset), idn, time_dataset, k_dataset, a_dataset, zt_dataset, zp_dataset, l_dataset, y_dataset, mrpk_dataset, rev_dataset, constrained_dataset, r_dataset, logTFPobserved_dataset, MIS_dataset, RR_dataset, yy_dataset, YY_dataset, KK_dataset, LL_dataset, AA_dataset, mrpl_dataset, Psi_dataset, c_dataset, effective_lambda_dataset, CC_dataset, COSTRAINED_dataset, STD_MRPK_dataset, q_dataset)
       end
  0.254061 seconds (9.60 M allocations: 231.259 MiB)
370000×29 Matrix{Any}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

Raymond · May 24, 2022, 7:39pm

In my codes, It seems that Union is faster than hcat.

rafael.guerra · May 24, 2022, 8:07pm

What about:

Real[a b]

StevenWhitaker · May 24, 2022, 8:22pm

julia> Any[a b]
3×2 Matrix{Any}:
 1  1.0
 2  2.0
 3  3.0

EDIT: Alternatively (if the hcat syntactic suger needs to be avoided for some reason):

julia> A = Matrix{Any}(undef, 3, 2); A[:,1] = a; A[:,2] = b; A
3×2 Matrix{Any}:
 1  1.0
 2  2.0
 3  3.0

Raymond · May 24, 2022, 8:23pm

julia> @time begin
           dataset = Real[Tshock_dataset idn time_dataset k_dataset a_dataset zt_dataset zp_dataset l_dataset y_dataset mrpk_dataset rev_dataset constrained_dataset r_dataset logTFPobserved_dataset MIS_dataset RR_dataset yy_dataset YY_dataset KK_dataset LL_dataset AA_dataset mrpl_dataset Psi_dataset c_dataset effective_lambda_dataset CC_dataset COSTRAINED_dataset STD_MRPK_dataset q_dataset]
       end
  0.490252 seconds (9.60 M allocations: 228.431 MiB, 59.31% gc time)
370000×29 Matrix{Real}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

Raymond · May 24, 2022, 8:25pm

julia> @time begin
           dataset = Any[Tshock_dataset idn time_dataset k_dataset a_dataset zt_dataset zp_dataset l_dataset y_dataset mrpk_dataset rev_dataset constrained_dataset r_dataset logTFPobserved_dataset MIS_dataset RR_dataset yy_dataset YY_dataset KK_dataset LL_dataset AA_dataset mrpl_dataset Psi_dataset c_dataset effective_lambda_dataset CC_dataset COSTRAINED_dataset STD_MRPK_dataset q_dataset]
       end
  0.188767 seconds (9.60 M allocations: 228.431 MiB)
370000×29 Matrix{Any}:
 400      1  390  0.132245  0.132041   -0.0129621  …    1.08316  2027.64  0.8913  0.257829  0
 400      1  391  0.132245  0.132041    0.0836439       1.08316  2025.47  0.892   0.25472   0
 400      1  392  0.132245  0.132041   -0.0129621       1.08316  2019.69  0.8967  0.256161  0
 400      1  393  0.132245  0.132041    0.0836439       1.08316  2021.93  0.896   0.257575  0
 400      1  394  0.132245  0.132041    0.0836439       1.08316  2021.36  0.893   0.258118  0
   ⋮                                    ⋮          ⋱                ⋮                       
 400  10000  423  5.02204   0.01       -0.109568      802.09     2358.2   0.8324  0.463745  0
 400  10000  424  5.02204   0.01        0.0836439     128.534    2366.12  0.8302  0.467081  0
 400  10000  425  5.14429   0.0710204  -0.0129621     145.205    2372.26  0.8277  0.468225  0
 400  10000  426  5.26653   0.0710204   0.18025        60.966    2366.87  0.826   0.472988  0

Raymond · May 24, 2022, 8:26pm

It seems that Union{Int, Float64} is the fastest way.

StevenWhitaker · May 24, 2022, 8:30pm

True, but I was under the impression that small type unions were still pretty fast these days. See, e.g., Union-splitting: what it is, and why you should care.

StevenWhitaker · May 24, 2022, 8:35pm

By the way @Raymond, it appears you are timing the creation of dataset, whereas I believe @Palli was talking about how fast methods that operate on dataset would be.

EDIT: I’m referring to this quote:

EDIT: Here’s an illustration.

julia> A = randn(1000, 100); typeof(A)
Matrix{Float64} (alias for Array{Float64, 2})

julia> B = Matrix{Union{Int,Float64}}(A); typeof(B)
Matrix{Union{Float64, Int64}} (alias for Array{Union{Float64, Int64}, 2})

julia> C = Matrix{Real}(A); typeof(C)
Matrix{Real} (alias for Array{Real, 2})

julia> D = Matrix{Any}(A); typeof(D)
Matrix{Any} (alias for Array{Any, 2})

julia> A == B == C == D
true

julia> function f(mat)
           map(mat) do x
               x^2 + sin(13 * x)
           end
       end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> X1 = @btime f($A);
  1.678 ms (2 allocations: 781.30 KiB)

julia> X2 = @btime f($B);
  1.798 ms (2 allocations: 781.30 KiB)

julia> X3 = @btime f($C);
  3.503 ms (100005 allocations: 2.29 MiB)

julia> X4 = @btime f($D);
  3.487 ms (100005 allocations: 2.29 MiB)

julia> X1 == X2 == X3 == X4
true

So, f is faster when using Matrix{Float64} or Matrix{Union{Int,Float64}} inputs but slower when using Matrix{Real} or Matrix{Any} inputs.

EDIT (this will be the last one!): If I use a for loop instead of map, the timing of f(C) and f(D) is about 3 times worse, and if I use broadcasting instead of a loop the timing of f(C) and f(D) is an additional 2 times worse! Meanwhile, f(A) and f(B) take approximately the same amount of time in each case. My guess as to why using map is faster for f(C) and f(D) is because I am introducing a function barrier, though I’m not entirely sure that’s the reason.

Topic		Replies	Views
How to force mixed-type array of just ints and floats? General Usage question	4	2696	May 16, 2018
How to hcat a vector of vectors to produce a matrix with a specified eltype? General Usage array	9	912	July 26, 2022
Declaring types: Vector{Union{Int64, Matrix{Int64}}} is much slower than Vector{Any}, why is that? Performance question	10	240	July 2, 2024
Drastic performance hit matrix multiply different types. Internal cast julia vs numpy? Numerics	15	2442	November 4, 2018
Julia typesystem New to Julia	22	339	October 29, 2024

Question about Float64 and Int?

Related topics