Julia multithreading is running slower than serial, can someone please explain whyโ€ฆ? General Usage Performance multithreading floops

Dear Julia community this is my third post after my previous post1, post2.

I was unable to replicate my problem in my previous posts. Here, I am going to share my entire algorithm. Please let me know what the problem is.

The algorithm assigns orders to vehicles to minimize the total distance while satisfying the capacity constraints for each vehicle.

Explaining the function definitions

  • function generate_data Creates coordinates randomly in 2d space for the orders placed by customers and the volume of each order. The first coordinate corresponds to the depot location.

  • function main takes in the generated data and calls switch number_of_turns number of times

  • function initial_ans! Assigns orders to vehicles in such a way that it satisfies the capacity constraint

  • function switch! We take an order from one vehicle and put it into another vehicle and accept the solution based on an if condition

Detailed explanation:

  1. All the orders locations are indexed from 1:num_of_customers
  2. We declare a matrix vehicle_stops of maximum size num_of_customers x num_of_vehicles
  3. function initial_ans! fills vehicle_stops with stops.
  4. We randomly select vehicle_pairs randomly without replacement and function switch! takes an order from one vehicle(one column of vehicle_stops) and puts it in another vehicle(another column of vehicle_stops) . For example, if we have 10 vehicles we can do maximum of 5 switches between all the possible pairs.

Our focus is completely on function switch! where the number of switches among all vehicle pairs can be parallelized(one thread for each pair).

Parallel Results with 5 threads

BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 11.582 s (6.35% GC) to evaluate,
 with a memory estimate of 3.84 GiB, over 36451492 allocations.

Serial Results

BenchmarkTools.Trial: 7 samples with 1 evaluation.
 Range (min โ€ฆ max):  745.871 ms โ€ฆ 865.928 ms  โ”Š GC (min โ€ฆ max): 6.14% โ€ฆ 8.47%
 Time  (median):     790.421 ms               โ”Š GC (median):    8.14%
 Time  (mean ยฑ ฯƒ):   804.462 ms ยฑ  45.499 ms  โ”Š GC (mean ยฑ ฯƒ):  7.88% ยฑ 1.18%

  โ–ˆ      โ–ˆ             โ–ˆโ–ˆ                 โ–ˆ              โ–ˆ    โ–ˆ  
  โ–ˆโ–โ–โ–โ–โ–โ–โ–ˆโ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–ˆโ–ˆโ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–ˆโ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–ˆโ–โ–โ–โ–โ–ˆ โ–
  746 ms           Histogram: frequency by time          866 ms <

 Memory estimate: 389.25 MiB, allocs estimate: 5000432.

There is no data race, I included the changes suggested in my previous posts. Can someone please let me know what the problem isโ€ฆ?

Complete code

using DataFrames, DelimitedFiles, Distances, LinearAlgebra, StatsBase, Random, Distributions, Plots, BenchmarkTools, CSV
using FLoops, FoldsThreads, ThreadsX, Revise
###########################################################################

using DataFrames, Distances, LinearAlgebra, Random, Distributions, CSV

using FLoops, FoldsThreads, ThreadsX, Revise

export generate_data, initial_ans!, switch!, main

function generate_data(num_of_customers, num_of_vehicles, max_positions, volume, min_demand_volume, max_demand_volume)
    num_of_positions = num_of_customers + 1

    positions = Array{Float64,2}(undef, 2, num_of_positions)
    foreach(x -> positions[x, :] = sample(0:max_positions, num_of_positions, replace=true), axes(positions, 1))

    demand_volume = sample(min_demand_volume:max_demand_volume, num_of_positions, replace=true)

    demand_volume[1] = 0

    return positions, demand_volume
end

function initial_ans!(num_of_vehicles, num_of_customers, demand_volume, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, distance_matrix)

    temp_vehicle_num_stops = deepcopy(vehicle_num_stops)
    temp_vehicles_present_inventory = deepcopy(vehicles_present_inventory)
    temp_vehicle_stops = deepcopy(vehicle_stops)

    @label start_again
    for i in 1:num_of_customers
        temp = vehicles_present_inventory .- demand_volume[order_index[i]]
        free_vehicles = findall(temp .>= 0)

        if sizeof(free_vehicles) > 0
            chosen_vehicle = rand(free_vehicles)

            vehicle_num_stops[chosen_vehicle] = vehicle_num_stops[chosen_vehicle] + 1
            vehicles_present_inventory[chosen_vehicle] = vehicles_present_inventory[chosen_vehicle] - demand_volume[order_index[i]]
            vehicle_stops[vehicle_num_stops[chosen_vehicle],chosen_vehicle] = i
        else

            vehicle_num_stops = deepcopy(temp_vehicle_num_stops)
            vehicles_present_inventory = deepcopy(temp_vehicles_present_inventory)
            vehicle_stops = deepcopy(temp_vehicle_stops)

            @goto start_again
        end
    end

    for i in axes(vehicle_stops, 2)
        len = vehicle_num_stops[i]
        for j in 2:len
            vehicles_distances[i] = vehicles_distances[i] + distance_matrix[order_index[vehicle_stops[j-1,i]], order_index[vehicle_stops[j,i]]]
        end

        #############################################################
        if len > 0
            vehicles_distances[i] = vehicles_distances[i] + distance_matrix[1, order_index[vehicle_stops[1,i]]]
            vehicles_distances[i] = vehicles_distances[i] + distance_matrix[1, order_index[vehicle_stops[len,i]]]
        end
        #############################################################
    end

end

function switch!(random_vehicle_pairs,vehicle_stops_temp, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)

    vehicle_stops_temp .= vehicle_stops

    @floop for idx in axes(random_vehicle_pairs, 2)
        truck1 = random_vehicle_pairs[1, idx]
        truck2 = random_vehicle_pairs[2, idx]

        len1 = vehicle_num_stops[truck1]
        len2 = vehicle_num_stops[truck2]

        vol1 = vehicles_present_inventory[truck1]
        vol2 = vehicles_present_inventory[truck2]

        if (len1 > 0)

            temp_index = rand(1:len1)

            if (vol2 >= demand_volume[order_index[vehicle_stops_temp[temp_index,truck1]]])
                temp = vehicle_stops_temp[temp_index,truck1]

                for j in temp_index:len1
                    vehicle_stops_temp[j,truck1] = vehicle_stops_temp[j+1,truck1]
                end

                len1 = len1 - 1
                vol1 = vol1 + demand_volume[order_index[temp]]

                temp_index2 = 1

                if len2 != 0
                    temp_index2 = rand(1:len2)
                end

                for j in len2:-1:temp_index2
                    vehicle_stops_temp[j+1,truck2] = vehicle_stops_temp[j,truck2]
                end
                
                len2 = len2 + 1

                vehicle_stops_temp[temp_index2,truck2] = temp

                vol2 = vol2 - demand_volume[order_index[temp]]

                distance1 = 0.0
                for i in 2:len1
                    distance1 = distance1 + distance_matrix[order_index[vehicle_stops_temp[i-1,truck1]], order_index[vehicle_stops_temp[i,truck1]]]
                end

                distance2 = 0.0
                for i in 2:len2
                    distance2 = distance2 + distance_matrix[order_index[vehicle_stops_temp[i-1,truck2]], order_index[vehicle_stops_temp[i,truck2]]]
                end

                ##############################################################
                if len1 > 0
                    distance1 = distance1 + distance_matrix[1, order_index[vehicle_stops_temp[1,truck1]]]
                    distance1 = distance1 + distance_matrix[1, order_index[vehicle_stops_temp[len1,truck1]]]
                end

                if len2 > 0
                    distance2 = distance2 + distance_matrix[1, order_index[vehicle_stops_temp[1,truck2]]]
                    distance2 = distance2 + distance_matrix[1, order_index[vehicle_stops_temp[len2,truck2]]]
                end
                ##############################################################

                change = (distance1 + distance2) - (vehicles_distances[truck1] + vehicles_distances[truck2])

                if rand() <= min(1, exp(-(change / T_c)))

                    vehicle_num_stops[truck1] = len1
                    vehicle_num_stops[truck2] = len2

                    vehicles_present_inventory[truck1] = vol1
                    vehicles_present_inventory[truck2] = vol2

                    vehicle_stops[1:len1+1, truck1] .= @view(vehicle_stops_temp[1:len1+1,truck1])
                    vehicle_stops[1:len2, truck2] .= @view(vehicle_stops_temp[1:len2,truck2])

                    vehicles_distances[truck1] = distance1
                    vehicles_distances[truck2] = distance2
                end
            end
        end
    end
end

function main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)

    ################# Declaring the Truck Variables#############################
    vehicle_stops = -ones(Int64, num_of_customers, num_of_vehicles)
    vehicles_distances = zeros(Float64, num_of_vehicles)
    vehicle_num_stops = zeros(Int, num_of_vehicles)
    vehicles_present_inventory = fill(volume, num_of_vehicles)

    order_index = collect(1:num_of_customers) .+ 1

    ########Calculating Distance Matrix####################

    distance_matrix = Symmetric(pairwise(Euclidean(), positions, dims=2))

    ################Creating Initial Solution###################################

    initial_ans!(num_of_vehicles, num_of_customers, demand_volume, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, distance_matrix)

    ################Annealing Parameters########################################

    T_f = 0
    T_c = T_i

    costs = zeros(AbstractFloat, number_of_turns)

    ฮณ = (T_i - T_f) / number_of_turns
    ################Loop###########################################

    vehicle_stops2 = deepcopy(vehicle_stops)

    println("Solver Started")

    for j in 1:number_of_turns

        random_vehicle_pairs = reshape(sample(1:num_of_vehicles, number_of_switches * 2; replace=false), 2, number_of_switches)

        switch!(random_vehicle_pairs, vehicle_stops2, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)

        T_c = T_c - ฮณ

        costs[j] = sum(vehicles_distances)
    end

    return costs, vehicle_num_stops, vehicle_stops
end

################Reading Data###############################################

num_of_customers :: Int =100
num_of_vehicles :: Int = 10
max_positions :: Int = 100
volume :: Int = 100
min_demand_volume, max_demand_volume = 1, 10

positions, demand_volume = generate_data(num_of_customers, num_of_vehicles, max_positions, volume, min_demand_volume, max_demand_volume)

###########Input Parameters####################################
number_of_turns::Int = 1000000
T_i::Int = 15
number_of_switches = floor(Int, num_of_vehicles / 2)
##############################################################

@benchmark begin
    costs, vehicle_num_stops, vehicle_stops = main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
end

Multithreading efficiency can be severely affected by high memory use. Check out the performance tips and try to perform more operations in-place instead of allocating new arrays, that should already improve the situation.

Profiling is a good idea to find the memory and CPU bottlenecks

1 Like

Of course @gdalle , in function switch! there are no new arrays allocated, in fact, I am using the vehicle_stops_temp variable and @view() macro for the same purpose. If I am wrong please kindly let me know which line I should be looking at.

Could you profile the code to see if the switch! function has any red descendents in the flame graph? Indeed it seems like it doesnโ€™t allocate, so the problem must come from elsewhere. I have this vague memory that multithreading with @threads or @floops can introduce type instabilities (although IIRC @floops at least warns you about it). The thing that put me on the track is that type instability can often result in excessive allocations

See Type-instability because of @threads boxing variables - #11 by lmiq

From the amount of allocations that smells like a type instability. Check what @code_warntype says about it.

2 Likes

Dear @Imiq,

I used the @code_warntype on switch! function call for number_of_turns=1 times and the results are as follows

for j in 1:number_of_turns

        random_vehicle_pairs = reshape(sample(1:num_of_vehicles, number_of_switches * 2; replace=false), 2, number_of_switches)

        @code_warntype switch!(random_vehicle_pairs, vehicle_stops2, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)

        T_c = T_c - ฮณ

        costs[j] = sum(vehicles_distances)
    end

For Serial

MethodInstance for switch!(::Matrix{Int64}, ::Matrix{Int64}, ::Matrix{Int64}, ::Vector{Float64}, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Vector{Int64}, ::Symmetric{Float64, Matrix{Float64}}, ::Float64)
  from switch!(random_vehicle_pairs, vehicle_stops_temp, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c) in Main at c:\Users\Public\Documents\logistics\CVRPVARCHANGED\CVRP2.jl:67
Arguments
  #self#::Core.Const(switch!)
  random_vehicle_pairs::Matrix{Int64}
  vehicle_stops_temp::Matrix{Int64}
  vehicle_stops::Matrix{Int64}
  vehicles_distances::Vector{Float64}
  vehicle_num_stops::Vector{Int64}
  vehicles_present_inventory::Vector{Int64}
  order_index::Vector{Int64}
  volume::Int64
  demand_volume::Vector{Int64}
  distance_matrix::Symmetric{Float64, Matrix{Float64}}
  T_c::Float64
Locals
  @_13::Union{Nothing, Tuple{Int64, Int64}}
  @_14::Union{Nothing, Tuple{Int64, Int64}}
  @_15::Union{Nothing, Tuple{Int64, Int64}}
  @_16::Union{Nothing, Tuple{Int64, Int64}}
  @_17::Union{Nothing, Tuple{Int64, Int64}}
  idx::Int64
  change::Float64
  distance2::Float64
  distance1::Float64
  vol2::Int64
  len2::Int64
  temp_index2::Int64
  vol1::Int64
  len1::Int64
  temp::Int64
  temp_index::Int64
  truck2::Int64
  truck1::Int64
  j@_31::Int64
  j@_32::Int64
  i@_33::Int64
  i@_34::Int64
  @_35::SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}
  @_36::SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}
Body::Nothing
1 โ”€โ”€ %1   = Base.broadcasted(Base.identity, vehicle_stops)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(identity), Tuple{Matrix{Int64}}}
โ”‚           Base.materialize!(vehicle_stops_temp, %1)
โ”‚    %3   = Main.axes(random_vehicle_pairs, 2)::Base.OneTo{Int64}
โ”‚           (@_13 = Base.iterate(%3))
โ”‚    %5   = (@_13 === nothing)::Bool
โ”‚    %6   = Base.not_int(%5)::Bool
โ””โ”€โ”€โ”€        goto #30 if not %6
2 โ”„โ”€        Core.NewvarNode(:(@_14))
โ”‚           Core.NewvarNode(:(@_15))
โ”‚           Core.NewvarNode(:(@_16))
โ”‚           Core.NewvarNode(:(@_17))
โ”‚           Core.NewvarNode(:(change))
โ”‚           Core.NewvarNode(:(distance2))
โ”‚           Core.NewvarNode(:(distance1))
โ”‚           Core.NewvarNode(:(temp_index2))
โ”‚           Core.NewvarNode(:(temp))
โ”‚           Core.NewvarNode(:(temp_index))
โ”‚    %18  = @_13::Tuple{Int64, Int64}
โ”‚           (idx = Core.getfield(%18, 1))
โ”‚    %20  = Core.getfield(%18, 2)::Int64
โ”‚           (truck1 = Base.getindex(random_vehicle_pairs, 1, idx))
โ”‚           (truck2 = Base.getindex(random_vehicle_pairs, 2, idx))
โ”‚           (len1 = Base.getindex(vehicle_num_stops, truck1))
โ”‚           (len2 = Base.getindex(vehicle_num_stops, truck2))
โ”‚           (vol1 = Base.getindex(vehicles_present_inventory, truck1))
โ”‚           (vol2 = Base.getindex(vehicles_present_inventory, truck2))
โ”‚    %27  = (len1 > 0)::Bool
โ””โ”€โ”€โ”€        goto #28 if not %27
3 โ”€โ”€ %29  = (1:len1)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
โ”‚    %30  = Main.rand(%29)::Int64
โ”‚    %31  = Base.convert(Main.Int, %30)::Int64
โ”‚           (temp_index = Core.typeassert(%31, Main.Int))
โ”‚    %33  = vol2::Int64
โ”‚    %34  = Base.getindex(vehicle_stops_temp, temp_index, truck1)::Int64
โ”‚    %35  = Base.getindex(order_index, %34)::Int64
โ”‚    %36  = Base.getindex(demand_volume, %35)::Int64
โ”‚    %37  = (%33 >= %36)::Bool
โ””โ”€โ”€โ”€        goto #28 if not %37
4 โ”€โ”€        (temp = Base.getindex(vehicle_stops_temp, temp_index, truck1))
โ”‚    %40  = (temp_index:len1)::UnitRange{Int64}
โ”‚           (@_17 = Base.iterate(%40))
โ”‚    %42  = (@_17 === nothing)::Bool
โ”‚    %43  = Base.not_int(%42)::Bool
โ””โ”€โ”€โ”€        goto #7 if not %43
5 โ”„โ”€ %45  = @_17::Tuple{Int64, Int64}
โ”‚           (j@_31 = Core.getfield(%45, 1))
โ”‚    %47  = Core.getfield(%45, 2)::Int64
โ”‚    %48  = (j@_31 + 1)::Int64
โ”‚    %49  = Base.getindex(vehicle_stops_temp, %48, truck1)::Int64
โ”‚           Base.setindex!(vehicle_stops_temp, %49, j@_31, truck1)
โ”‚           (@_17 = Base.iterate(%40, %47))
โ”‚    %52  = (@_17 === nothing)::Bool
โ”‚    %53  = Base.not_int(%52)::Bool
โ””โ”€โ”€โ”€        goto #7 if not %53
6 โ”€โ”€        goto #5
7 โ”„โ”€        (len1 = len1 - 1)
โ”‚    %57  = vol1::Int64
โ”‚    %58  = Base.getindex(order_index, temp)::Int64
โ”‚    %59  = Base.getindex(demand_volume, %58)::Int64
โ”‚           (vol1 = %57 + %59)
โ”‚           (temp_index2 = 1)
โ”‚    %62  = (len2 != 0)::Bool
โ””โ”€โ”€โ”€        goto #9 if not %62
8 โ”€โ”€ %64  = (1:len2)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
โ””โ”€โ”€โ”€        (temp_index2 = Main.rand(%64))
9 โ”„โ”€ %66  = (len2:-1:temp_index2)::Core.PartialStruct(StepRange{Int64, Int64}, Any[Int64, Core.Const(-1), Int64])
โ”‚           (@_16 = Base.iterate(%66))
โ”‚    %68  = (@_16 === nothing)::Bool
โ”‚    %69  = Base.not_int(%68)::Bool
โ””โ”€โ”€โ”€        goto #12 if not %69
10 โ”„ %71  = @_16::Tuple{Int64, Int64}
โ”‚           (j@_32 = Core.getfield(%71, 1))
โ”‚    %73  = Core.getfield(%71, 2)::Int64
โ”‚    %74  = Base.getindex(vehicle_stops_temp, j@_32, truck2)::Int64
โ”‚    %75  = (j@_32 + 1)::Int64
โ”‚           Base.setindex!(vehicle_stops_temp, %74, %75, truck2)
โ”‚           (@_16 = Base.iterate(%66, %73))
โ”‚    %78  = (@_16 === nothing)::Bool
โ”‚    %79  = Base.not_int(%78)::Bool
โ””โ”€โ”€โ”€        goto #12 if not %79
11 โ”€        goto #10
12 โ”„        (len2 = len2 + 1)
โ”‚           Base.setindex!(vehicle_stops_temp, temp, temp_index2, truck2)
โ”‚    %84  = vol2::Int64
โ”‚    %85  = Base.getindex(order_index, temp)::Int64
โ”‚    %86  = Base.getindex(demand_volume, %85)::Int64
โ”‚           (vol2 = %84 - %86)
โ”‚           (distance1 = 0.0)
โ”‚    %89  = (2:len1)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(2), Int64])
โ”‚           (@_15 = Base.iterate(%89))
โ”‚    %91  = (@_15 === nothing)::Bool
โ”‚    %92  = Base.not_int(%91)::Bool
โ””โ”€โ”€โ”€        goto #15 if not %92
13 โ”„ %94  = @_15::Tuple{Int64, Int64}
โ”‚           (i@_33 = Core.getfield(%94, 1))
โ”‚    %96  = Core.getfield(%94, 2)::Int64
โ”‚    %97  = distance1::Float64
โ”‚    %98  = (i@_33 - 1)::Int64
โ”‚    %99  = Base.getindex(vehicle_stops_temp, %98, truck1)::Int64
โ”‚    %100 = Base.getindex(order_index, %99)::Int64
โ”‚    %101 = Base.getindex(vehicle_stops_temp, i@_33, truck1)::Int64
โ”‚    %102 = Base.getindex(order_index, %101)::Int64
โ”‚    %103 = Base.getindex(distance_matrix, %100, %102)::Float64
โ”‚           (distance1 = %97 + %103)
โ”‚           (@_15 = Base.iterate(%89, %96))
โ”‚    %106 = (@_15 === nothing)::Bool
โ”‚    %107 = Base.not_int(%106)::Bool
โ””โ”€โ”€โ”€        goto #15 if not %107
14 โ”€        goto #13
15 โ”„        (distance2 = 0.0)
โ”‚    %111 = (2:len2)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(2), Int64])
โ”‚           (@_14 = Base.iterate(%111))
โ”‚    %113 = (@_14 === nothing)::Bool
โ”‚    %114 = Base.not_int(%113)::Bool
โ””โ”€โ”€โ”€        goto #18 if not %114
16 โ”„ %116 = @_14::Tuple{Int64, Int64}
โ”‚           (i@_34 = Core.getfield(%116, 1))
โ”‚    %118 = Core.getfield(%116, 2)::Int64
โ”‚    %119 = distance2::Float64
โ”‚    %120 = (i@_34 - 1)::Int64
โ”‚    %121 = Base.getindex(vehicle_stops_temp, %120, truck2)::Int64
โ”‚    %122 = Base.getindex(order_index, %121)::Int64
โ”‚    %123 = Base.getindex(vehicle_stops_temp, i@_34, truck2)::Int64
โ”‚    %124 = Base.getindex(order_index, %123)::Int64
โ”‚    %125 = Base.getindex(distance_matrix, %122, %124)::Float64
โ”‚           (distance2 = %119 + %125)
โ”‚           (@_14 = Base.iterate(%111, %118))
โ”‚    %128 = (@_14 === nothing)::Bool
โ”‚    %129 = Base.not_int(%128)::Bool
โ””โ”€โ”€โ”€        goto #18 if not %129
17 โ”€        goto #16
18 โ”„ %132 = (len1 > 0)::Bool
โ””โ”€โ”€โ”€        goto #20 if not %132
19 โ”€ %134 = distance1::Float64
โ”‚    %135 = Base.getindex(vehicle_stops_temp, 1, truck1)::Int64
โ”‚    %136 = Base.getindex(order_index, %135)::Int64
โ”‚    %137 = Base.getindex(distance_matrix, 1, %136)::Float64
โ”‚           (distance1 = %134 + %137)
โ”‚    %139 = distance1::Float64
โ”‚    %140 = Base.getindex(vehicle_stops_temp, len1, truck1)::Int64
โ”‚    %141 = Base.getindex(order_index, %140)::Int64
โ”‚    %142 = Base.getindex(distance_matrix, 1, %141)::Float64
โ””โ”€โ”€โ”€        (distance1 = %139 + %142)
20 โ”„ %144 = (len2 > 0)::Bool
โ””โ”€โ”€โ”€        goto #22 if not %144
21 โ”€ %146 = distance2::Float64
โ”‚    %147 = Base.getindex(vehicle_stops_temp, 1, truck2)::Int64
โ”‚    %148 = Base.getindex(order_index, %147)::Int64
โ”‚    %149 = Base.getindex(distance_matrix, 1, %148)::Float64
โ”‚           (distance2 = %146 + %149)
โ”‚    %151 = distance2::Float64
โ”‚    %152 = Base.getindex(vehicle_stops_temp, len2, truck2)::Int64
โ”‚    %153 = Base.getindex(order_index, %152)::Int64
โ”‚    %154 = Base.getindex(distance_matrix, 1, %153)::Float64
โ””โ”€โ”€โ”€        (distance2 = %151 + %154)
22 โ”„ %156 = (distance1 + distance2)::Float64
โ”‚    %157 = Base.getindex(vehicles_distances, truck1)::Float64
โ”‚    %158 = Base.getindex(vehicles_distances, truck2)::Float64
โ”‚    %159 = (%157 + %158)::Float64
โ”‚           (change = %156 - %159)
โ”‚    %161 = Main.rand()::Float64
โ”‚    %162 = (change / T_c)::Float64
โ”‚    %163 = -%162::Float64
โ”‚    %164 = Main.exp(%163)::Float64
โ”‚    %165 = Main.min(1, %164)::Float64
โ”‚    %166 = (%161 <= %165)::Bool
โ””โ”€โ”€โ”€        goto #28 if not %166
23 โ”€        Base.setindex!(vehicle_num_stops, len1, truck1)
โ”‚           Base.setindex!(vehicle_num_stops, len2, truck2)
โ”‚           Base.setindex!(vehicles_present_inventory, vol1, truck1)
โ”‚           Base.setindex!(vehicles_present_inventory, vol2, truck2)
โ”‚    %172 = (len1 + 1)::Int64
โ”‚    %173 = (1:%172)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
โ”‚    %174 = Base.dotview(vehicle_stops, %173, truck1)::Core.PartialStruct(SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}, Any[Matrix{Int64}, Core.PartialStruct(Tuple{UnitRange{Int64}, Int64}, Any[Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]), Int64]), Int64, Core.Const(1)])
โ”‚           Core.typeassert(true, Core.Bool)
โ”‚    %176 = (len1 + 1)::Int64
โ”‚    %177 = (1:%176)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
โ”‚           (@_35 = (view)(vehicle_stops_temp, %177, truck1))
โ””โ”€โ”€โ”€        goto #25
24 โ”€        Core.Const(:(@_35 = false))
25 โ”„ %181 = @_35::Core.PartialStruct(SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}, Any[Matrix{Int64}, Core.PartialStruct(Tuple{UnitRange{Int64}, Int64}, Any[Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]), Int64]), Int64, Core.Const(1)])
โ”‚    %182 = Base.broadcasted(Base.identity, %181)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(identity), Tuple{SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}}}, Any[Core.Const(identity), Core.PartialStruct(Tuple{SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}}, Any[Core.PartialStruct(SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}, Any[Matrix{Int64}, Core.PartialStruct(Tuple{UnitRange{Int64}, Int64}, Any[Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]), Int64]), Int64, Core.Const(1)])]), Core.Const(nothing)])
โ”‚           Base.materialize!(%174, %182)
โ”‚    %184 = (1:len2)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
โ”‚    %185 = Base.dotview(vehicle_stops, %184, truck2)::Core.PartialStruct(SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}, Any[Matrix{Int64}, Core.PartialStruct(Tuple{UnitRange{Int64}, Int64}, Any[Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]), Int64]), Int64, Core.Const(1)])
โ”‚           Core.typeassert(true, Core.Bool)
โ”‚    %187 = (1:len2)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
โ”‚           (@_36 = (view)(vehicle_stops_temp, %187, truck2))
โ””โ”€โ”€โ”€        goto #27
26 โ”€        Core.Const(:(@_36 = false))
27 โ”„ %191 = @_36::Core.PartialStruct(SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}, Any[Matrix{Int64}, Core.PartialStruct(Tuple{UnitRange{Int64}, Int64}, Any[Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]), Int64]), Int64, Core.Const(1)])
โ”‚    %192 = Base.broadcasted(Base.identity, %191)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(identity), Tuple{SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}}}, Any[Core.Const(identity), Core.PartialStruct(Tuple{SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}}, Any[Core.PartialStruct(SubArray{Int64, 1, Matrix{Int64}, Tuple{UnitRange{Int64}, Int64}, true}, Any[Matrix{Int64}, Core.PartialStruct(Tuple{UnitRange{Int64}, Int64}, Any[Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]), Int64]), Int64, Core.Const(1)])]), Core.Const(nothing)])
โ”‚           Base.materialize!(%185, %192)
โ”‚           Base.setindex!(vehicles_distances, distance1, truck1)
โ””โ”€โ”€โ”€        Base.setindex!(vehicles_distances, distance2, truck2)
28 โ”„        (@_13 = Base.iterate(%3, %20))
โ”‚    %197 = (@_13 === nothing)::Bool
โ”‚    %198 = Base.not_int(%197)::Bool
โ””โ”€โ”€โ”€        goto #30 if not %198
29 โ”€        goto #2
30 โ”„        return nothing

For Parallel

MethodInstance for switch!(::Matrix{Int64}, ::Matrix{Int64}, ::Matrix{Int64}, ::Vector{Float64}, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::Int64, ::Vector{Int64}, ::Symmetric{Float64, Matrix{Float64}}, ::Float64)
  from switch!(random_vehicle_pairs, vehicle_stops_temp, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c) in Main at c:\Users\Public\Documents\logistics\CVRPVARCHANGED\CVRP2.jl:67
Arguments
  #self#::Core.Const(switch!)
  random_vehicle_pairs::Matrix{Int64}
  vehicle_stops_temp::Matrix{Int64}
  vehicle_stops::Matrix{Int64}
  vehicles_distances::Vector{Float64}
  vehicle_num_stops::Vector{Int64}
  vehicles_present_inventory::Vector{Int64}
  order_index::Vector{Int64}
  volume::Int64
  demand_volume::Vector{Int64}
  distance_matrix::Symmetric{Float64, Matrix{Float64}}
  T_c::Float64
Locals
  result#351::Any
  context_function#352::var"##context_function#352#58"
  __##combine_function#350::var"#__##combine_function#350#57"
  __##reducing_function#349::var"#__##reducing_function#349#56"{Matrix{Int64}, Matrix{Int64}, Matrix{Int64}, Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Symmetric{Float64, Matrix{Float64}}, Float64}
  __##oninit_function#348::var"#__##oninit_function#348#55"
Body::Any
1 โ”€ %1  = Base.broadcasted(Base.identity, vehicle_stops)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(identity), Tuple{Matrix{Int64}}}
โ”‚         Base.materialize!(vehicle_stops_temp, %1)
โ”‚         (__##oninit_function#348 = %new(Main.:(var"#__##oninit_function#348#55")))
โ”‚   %4  = Main.:(var"#__##reducing_function#349#56")::Core.Const(var"#__##reducing_function#349#56")
โ”‚   %5  = Core.typeof(random_vehicle_pairs)::Core.Const(Matrix{Int64})
โ”‚   %6  = Core.typeof(vehicle_stops_temp)::Core.Const(Matrix{Int64})
โ”‚   %7  = Core.typeof(vehicle_stops)::Core.Const(Matrix{Int64})
โ”‚   %8  = Core.typeof(vehicles_distances)::Core.Const(Vector{Float64})
โ”‚   %9  = Core.typeof(vehicle_num_stops)::Core.Const(Vector{Int64})
โ”‚   %10 = Core.typeof(vehicles_present_inventory)::Core.Const(Vector{Int64})
โ”‚   %11 = Core.typeof(order_index)::Core.Const(Vector{Int64})
โ”‚   %12 = Core.typeof(demand_volume)::Core.Const(Vector{Int64})
โ”‚   %13 = Core.typeof(distance_matrix)::Core.Const(Symmetric{Float64, Matrix{Float64}})
โ”‚   %14 = Core.typeof(T_c)::Core.Const(Float64)
โ”‚   %15 = Core.apply_type(%4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14)::Core.Const(var"#__##reducing_function#349#56"{Matrix{Int64}, Matrix{Int64}, Matrix{Int64}, Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Symmetric{Float64, Matrix{Float64}}, Float64})
โ”‚         (__##reducing_function#349 = %new(%15, random_vehicle_pairs, vehicle_stops_temp, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, demand_volume, distance_matrix, T_c))
โ”‚         (__##combine_function#350 = %new(Main.:(var"#__##combine_function#350#57")))
โ”‚         (context_function#352 = %new(Main.:(var"##context_function#352#58")))
โ”‚         (FLoops.verify_no_boxes)(__##reducing_function#349, context_function#352)
โ”‚   %20 = __##oninit_function#348::Core.Const(var"#__##oninit_function#348#55"())
โ”‚   %21 = (Transducers.whencombine)(__##combine_function#350, __##reducing_function#349)::Transducers.AdHocRF{Nothing, typeof(identity), var"#__##reducing_function#349#56"{Matrix{Int64}, Matrix{Int64}, Matrix{Int64}, Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Symmetric{Float64, Matrix{Float64}}, Float64}, typeof(identity), typeof(identity), var"#__##combine_function#350#57"}
โ”‚   %22 = (Transducers.wheninit)(%20, %21)::Transducers.AdHocRF{var"#__##oninit_function#348#55", typeof(identity), var"#__##reducing_function#349#56"{Matrix{Int64}, Matrix{Int64}, Matrix{Int64}, Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Symmetric{Float64, Matrix{Float64}}, Float64}, typeof(identity), typeof(identity), var"#__##combine_function#350#57"}
โ”‚   %23 = Main.axes(random_vehicle_pairs, 2)::Base.OneTo{Int64}
โ”‚         (result#351 = (FLoops._fold)(%22, %23, nothing, Val{false}()))
โ”‚   %25 = (result#351 isa FLoops.Return)::Bool
โ””โ”€โ”€       goto #3 if not %25
2 โ”€ %27 = Base.getproperty(result#351::FLoops.Return, :value)::Any
โ””โ”€โ”€       return %27
3 โ”€ %29 = Main.nothing::Core.Const(nothing)
โ””โ”€โ”€       return %29

I am not sure how to interpret the results. For example,

  1. In serial results @_13::Union{Nothing, Tuple{Int64, Int64}} seem to be bad because it was shown in yellow but Iโ€™m not sure which line it corresponds to.
  2. In parallel
Locals
  result#351::Any
__##oninit_function#348::var"#__##oninit_function#348#55"
Body::Any
2 โ”€ %27 = Base.getproperty(result#351::FLoops.Return, :value)::Any

seem to be horrible since itโ€™s shown in red but Iโ€™m not sure what they point to either. Please let me know if this new info is of any help.

Yes, exactly, those horrible Anys mean that the compiler cannot infer the type of the variables.

The most probable causes are the use of some non-constant global variable in your code, or the typing of a struct filed with an abstract type (or no type).

This is certainly related:

costs = zeros(AbstractFloat, number_of_turns)

You want zeros(number_of_turns), which is the same as zeros(Float64, number_of_turns)

Maybe fixing that solves the whole thing, maybe there are other places where abstract types are used (the code is quite long).

1 Like

Serially, each call to switch! takes less than a microsecond. Itโ€™s called 1 million times in this MWE. It could very well be that the parallel overhead is as large or larger than the runtime, so it dominates the total run time.

2 Likes

Dear @Imiq there is no improvement even after avoiding AbstractTypes,

  1. In serial
  @_13::Union{Nothing, Tuple{Int64, Int64}}
  @_14::Union{Nothing, Tuple{Int64, Int64}}
  @_15::Union{Nothing, Tuple{Int64, Int64}}
  @_16::Union{Nothing, Tuple{Int64, Int64}}
  @_17::Union{Nothing, Tuple{Int64, Int64}}

Are referring to loops
for idx in axes(random_vehicle_pairs, 2) , for j in temp_index:len1, for j in len2:-1:temp_index2, for i in 2:len1, for i in 2:len2 respectively

  1. Coming to the parallel code, even a simple code like this is giving the same results
@floop for idx in axes(random_vehicle_pairs, 2)

end

This means there seems to be no type instability in function switch!, and there are no non-constant global variables I replaced all the abstract types, but still, there are no changes in the results. The parallel code is still slow.

I havenโ€™t looked so closely, but are you sure there are no data races? Is this because you see identical results with and without threading (if so, then sure maybe there are none)? Iโ€™m suspicious of that, mostly because of lines like

for j in temp_index:len1
    vehicle_stops_temp[j, truck1] = vehicle_stops_temp[j+1, truck1]
end

where vehicle_stops_temp is defined outside of the loop. For example, if I just run the code three times:

costs1, vehicle_num_stops1, vehicle_stops1 = main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
costs2, vehicle_num_stops2, vehicle_stops2 = main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
costs3, vehicle_num_stops3, vehicle_stops3 = main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
julia> hcat(costs1, costs2, costs3)
1000000ร—3 Matrix{Float64}:
 5739.4   5914.26  6034.77
 5587.15  5758.69  5908.48
 5587.15  5805.08  5797.41
 5585.36  5801.06  5827.54
 5568.91  5808.86  5779.2
 5386.0   5785.66  5721.19
 5340.41  5796.75  5736.34
 5307.99  5789.35  5700.48
 5232.2   5650.29  5701.17
 5211.24  5641.77  5616.38
 5242.52  5604.95  5550.35
 5285.75  5594.49  5439.69
 5264.75  5307.34  5414.39
 5260.68  5308.1   5366.54
 5260.68  5199.42  5335.99
 5252.42  5079.96  5234.36
 5049.14  5069.46  5166.94
 4932.98  5018.88  5242.17
 4880.07  4960.97  5196.72
 4880.07  4949.89  5126.05
 4832.5   4939.42  5205.64
 4847.91  4905.74  5146.35
 4793.72  4901.13  5146.26
 4815.22  4910.15  5162.53
 4795.01  4868.45  5068.94
    โ‹ฎ
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97
 1716.92  1634.65  1632.97

If I now do the same, but without any threading:

ulia> hcat(costs1, costs2, costs3)
1000000ร—3 Matrix{Float64}:
 5998.52  6155.59  5586.12
 5849.63  6080.77  5565.52
 5683.05  5849.73  5560.81
 5673.56  5598.09  5551.8
 5675.97  5528.28  5460.11
 5572.2   5508.27  5417.2
 5516.84  5455.25  5401.11
 5519.79  5370.76  5164.08
 5471.5   5329.46  5077.7
 5470.84  5220.71  5001.69
 5286.13  5183.38  5008.48
 5279.32  5152.49  4978.25
 5294.28  5102.16  4909.94
 5289.8   5037.05  4780.39
 5316.29  4987.82  4780.39
 5316.29  4942.05  4795.06
 5323.78  4949.77  4788.56
 5259.22  4971.42  4780.47
    โ‹ฎ
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54
 1647.54  1644.91  1665.54

Should the results be so different with threading?

Also, as mentioned before and by @sgaure, switch! is already so fast I donโ€™t see the point in trying to use multithreading here. Multithreading is not always going to help you. Consider this very simple example as a case where multithreading makes things slower:

julia> using BenchmarkTools

julia> x = rand(100000);

julia> @btime sum($x)
  7.100 ฮผs (0 allocations: 0 bytes)
50088.02082567231

julia> @btime ThreadsX.sum($x)
  48.100 ฮผs (2212 allocations: 180.92 KiB)
50088.02082567231

I would just keep your serial code, refine it as needed, and stop there.

1 Like

Dear @DanielVandH,

  1. There is no data race because every thread works on a different column pair of vehicle_stops_temp
  2. The results donโ€™t match because this algorithm randomly searches the solution space. The aim is to reduce the cost at every iteration.
  3. I want to think the same but my benchmarks for parallel code show too many allocations happening which is not the case. I am trying very hard to understand the
    reason for these allocations.
1 Like

In that case, follow the suggestion of @gdalle and profile the parallel code to see what the main problem is. @gdalleโ€™s link is useful to look at, and you can look at the allocation profiler here Profiling ยท The Julia Language for profiling.

You are trying to parallelize this loop:

for idx in axes(random_vehicle_pairs, 2)

but you have:

size(random_vehicle_pairs) = (2, 5)

and each iteration of the loop is fast. So it is very unlikely that this parallelization will help at all, unless your โ€œrealโ€ examples have a much greater number of loop iterations. And, in that case, you should try the parallelization with the real numbers.

2 Likes

One idea to test for data races is to preset the random number generators with a specific seed. Unless Iโ€™m mistaken if you added something like

rng = MersenneTwister(54321)

to the beginning of every function that needs a random value and then called

rand(rng, 1)  #as an example

then you should get the same result every time.

It would probably be worth trying this out in the serial version, make sure youโ€™re getting identical output each run, then try it with the parallel version. If the results are different, then you probably have a data race.

It might also be worth it to do @views function switch!(...) to make sure that youโ€™re not necessarily copying data.

By the way, if you increase the number of vehicles and profile the code, youโ€™ll see that most of the time is spent in this line:

    vehicle_stops_temp .= vehicle_stops

(even in the serial version). Thus, parallelizing the loop, which comes after that, wonยดt help much.

1 Like

Here is the allocation file for the algorithm
For Serial

        - using DataFrames, DelimitedFiles, Distances, LinearAlgebra, StatsBase, Random, Distributions, Plots, BenchmarkTools, CSV
        - using FLoops, FoldsThreads, ThreadsX, Revise
        - ###########################################################################
        - 
        - using DataFrames, Distances, LinearAlgebra, Random, Distributions, CSV
        - 
        - using FLoops, FoldsThreads, ThreadsX, Revise
        - 
        - export generate_data, initial_ans!, switch!, main
        - 
        - function generate_data(num_of_customers, num_of_vehicles, max_positions, volume, min_demand_volume, max_demand_volume)
        0     num_of_positions = num_of_customers + 1
        - 
     1808     positions = Array{Float64,2}(undef, 2, num_of_positions)
        0     foreach(x -> positions[x, :] = sample(0:max_positions, num_of_positions, replace=true), axes(positions, 1))
        - 
      896     demand_volume = sample(min_demand_volume:max_demand_volume, num_of_positions, replace=true)
        - 
        0     demand_volume[1] = 0
        - 
        0     return positions, demand_volume
        - end
        - 
        - function initial_ans!(num_of_vehicles, num_of_customers, demand_volume, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, distance_matrix)
        - 
      336     temp_vehicle_num_stops = deepcopy(vehicle_num_stops)
      336     temp_vehicles_present_inventory = deepcopy(vehicles_present_inventory)
      336     temp_vehicle_stops = deepcopy(vehicle_stops)
        - 
        -     @label start_again
        0     for i in 1:num_of_customers
    14400         temp = vehicles_present_inventory .- demand_volume[order_index[i]]
    24000         free_vehicles = findall(temp .>= 0)
        - 
        0         if sizeof(free_vehicles) > 0
        0             chosen_vehicle = rand(free_vehicles)
        - 
        0             vehicle_num_stops[chosen_vehicle] = vehicle_num_stops[chosen_vehicle] + 1
        0             vehicles_present_inventory[chosen_vehicle] = vehicles_present_inventory[chosen_vehicle] - demand_volume[order_index[i]]
        0             vehicle_stops[vehicle_num_stops[chosen_vehicle],chosen_vehicle] = i
        -         else
        - 
        0             vehicle_num_stops = deepcopy(temp_vehicle_num_stops)
        0             vehicles_present_inventory = deepcopy(temp_vehicles_present_inventory)
        0             vehicle_stops = deepcopy(temp_vehicle_stops)
        - 
        -             @goto start_again
        -         end
        0     end
        - 
        0     for i in axes(vehicle_stops, 2)
        0         len = vehicle_num_stops[i]
        0         for j in 2:len
        0             vehicles_distances[i] = vehicles_distances[i] + distance_matrix[order_index[vehicle_stops[j-1,i]], order_index[vehicle_stops[j,i]]]
        0         end
        - 
        -         #############################################################
        0         if len > 0
        0             vehicles_distances[i] = vehicles_distances[i] + distance_matrix[1, order_index[vehicle_stops[1,i]]]
        0             vehicles_distances[i] = vehicles_distances[i] + distance_matrix[1, order_index[vehicle_stops[len,i]]]
        -         end
        -         #############################################################
        0     end
        - 
        - end
        - 
        - function switch!(random_vehicle_pairs,vehicle_stops_temp, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)
        - 
        0     vehicle_stops_temp .= vehicle_stops
        - 
        -     # @floop 
        0     for idx in axes(random_vehicle_pairs, 2)
        0         truck1 = random_vehicle_pairs[1, idx]
        0         truck2 = random_vehicle_pairs[2, idx]
        - 
        0         len1 = vehicle_num_stops[truck1]
        0         len2 = vehicle_num_stops[truck2]
        - 
        0         vol1 = vehicles_present_inventory[truck1]
        0         vol2 = vehicles_present_inventory[truck2]
        - 
        0         if (len1 > 0)
        - 
        0             temp_index = rand(1:len1)
        - 
        0             if (vol2 >= demand_volume[order_index[vehicle_stops_temp[temp_index,truck1]]])
        0                 temp = vehicle_stops_temp[temp_index,truck1]
        - 
        0                 for j in temp_index:len1
        0                     vehicle_stops_temp[j,truck1] = vehicle_stops_temp[j+1,truck1]
        0                 end
        - 
        0                 len1 = len1 - 1
        0                 vol1 = vol1 + demand_volume[order_index[temp]]
        - 
        -                 temp_index2 = 1
        - 
        0                 if len2 != 0
        0                     temp_index2 = rand(1:len2)
        -                 end
        - 
        0                 for j in len2:-1:temp_index2
        0                     vehicle_stops_temp[j+1,truck2] = vehicle_stops_temp[j,truck2]
        0                 end
        -                 
        0                 len2 = len2 + 1
        - 
        0                 vehicle_stops_temp[temp_index2,truck2] = temp
        - 
        0                 vol2 = vol2 - demand_volume[order_index[temp]]
        - 
        -                 distance1 = 0.0
        0                 for i in 2:len1
        0                     distance1 = distance1 + distance_matrix[order_index[vehicle_stops_temp[i-1,truck1]], order_index[vehicle_stops_temp[i,truck1]]]
        0                 end
        - 
        0                 distance2 = 0.0
        0                 for i in 2:len2
        0                     distance2 = distance2 + distance_matrix[order_index[vehicle_stops_temp[i-1,truck2]], order_index[vehicle_stops_temp[i,truck2]]]
        0                 end
        - 
        -                 ##############################################################
        0                 if len1 > 0
        0                     distance1 = distance1 + distance_matrix[1, order_index[vehicle_stops_temp[1,truck1]]]
        0                     distance1 = distance1 + distance_matrix[1, order_index[vehicle_stops_temp[len1,truck1]]]
        -                 end
        - 
        0                 if len2 > 0
        0                     distance2 = distance2 + distance_matrix[1, order_index[vehicle_stops_temp[1,truck2]]]
        0                     distance2 = distance2 + distance_matrix[1, order_index[vehicle_stops_temp[len2,truck2]]]
        -                 end
        -                 ##############################################################
        - 
        0                 change = (distance1 + distance2) - (vehicles_distances[truck1] + vehicles_distances[truck2])
        - 
        0                 if rand() <= min(1, exp(-(change / T_c)))
        - 
        0                     vehicle_num_stops[truck1] = len1
        0                     vehicle_num_stops[truck2] = len2
        - 
        0                     vehicles_present_inventory[truck1] = vol1
        0                     vehicles_present_inventory[truck2] = vol2
        - 
        0                     vehicle_stops[1:len1+1, truck1] .= @view(vehicle_stops_temp[1:len1+1,truck1])
        0                     vehicle_stops[1:len2, truck2] .= @view(vehicle_stops_temp[1:len2,truck2])
        - 
        0                     vehicles_distances[truck1] = distance1
        0                     vehicles_distances[truck2] = distance2
        -                 end
        -             end
        -         end
        0     end
        - end
        - 
        - function main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
        - 
        -     ################# Declaring the Truck Variables#############################
     8128     vehicle_stops = -ones(Int64, num_of_customers, num_of_vehicles)
      144     vehicles_distances = zeros(Float64, num_of_vehicles)
      144     vehicle_num_stops = zeros(Int, num_of_vehicles)
      144     vehicles_present_inventory = fill(volume, num_of_vehicles)
        - 
     1792     order_index = collect(1:num_of_customers) .+ 1
        - 
        -     ########Calculating Distance Matrix####################
        - 
        0     distance_matrix = Symmetric(pairwise(Euclidean(), positions, dims=2))
        - 
        -     ################Creating Initial Solution###################################
        - 
        0     initial_ans!(num_of_vehicles, num_of_customers, demand_volume, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, distance_matrix)
        - 
        -     ################Annealing Parameters########################################
        - 
        -     T_f = 0
        -     T_c = T_i
        - 
  8000048     costs = zeros(AbstractFloat, number_of_turns)
        - 
        0     ฮณ = (T_i - T_f) / number_of_turns
        -     ################Loop###########################################
        - 
      336     vehicle_stops2 = deepcopy(vehicle_stops)
        - 
       48     println("Solver Started")
        - 
        0     for j in 1:number_of_turns
        - 
240000000         random_vehicle_pairs = reshape(sample(1:num_of_vehicles, number_of_switches * 2; replace=false), 2, number_of_switches)
        - 
        0         switch!(random_vehicle_pairs, vehicle_stops2, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)
        - 
        0         T_c = T_c - ฮณ
        - 
 16000000         costs[j] = sum(vehicles_distances)
        0     end
        - 
        0     return costs, vehicle_num_stops, vehicle_stops
        - end
        - 
        - ################Reading Data###############################################
        - 
        - num_of_customers :: Int =100
        - num_of_vehicles :: Int = 10
        - max_positions :: Int = 100
        - volume :: Int = 100
        - min_demand_volume, max_demand_volume = 1, 10
        - 
        - positions, demand_volume = generate_data(num_of_customers, num_of_vehicles, max_positions, volume, min_demand_volume, max_demand_volume)
        - 
        - ###########Input Parameters####################################
        - number_of_turns::Int = 1000000
        - T_i::Int = 15
        - number_of_switches = floor(Int, num_of_vehicles / 2)
        - ##############################################################
        - 
        - # @benchmark begin
        -     costs, vehicle_num_stops, vehicle_stops = main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
        - # end

For Parallel

        - using DataFrames, DelimitedFiles, Distances, LinearAlgebra, StatsBase, Random, Distributions, Plots, BenchmarkTools, CSV
        - using FLoops, FoldsThreads, ThreadsX, Revise
        - ###########################################################################
        - 
        - using DataFrames, Distances, LinearAlgebra, Random, Distributions, CSV
        - 
        - using FLoops, FoldsThreads, ThreadsX, Revise
        - 
        - export generate_data, initial_ans!, switch!, main
        - 
        - function generate_data(num_of_customers, num_of_vehicles, max_positions, volume, min_demand_volume, max_demand_volume)
        0     num_of_positions = num_of_customers + 1
        - 
     1808     positions = Array{Float64,2}(undef, 2, num_of_positions)
        0     foreach(x -> positions[x, :] = sample(0:max_positions, num_of_positions, replace=true), axes(positions, 1))
        - 
      896     demand_volume = sample(min_demand_volume:max_demand_volume, num_of_positions, replace=true)
        - 
        0     demand_volume[1] = 0
        - 
        0     return positions, demand_volume
        - end
        - 
        - function initial_ans!(num_of_vehicles, num_of_customers, demand_volume, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, distance_matrix)
        - 
      336     temp_vehicle_num_stops = deepcopy(vehicle_num_stops)
      336     temp_vehicles_present_inventory = deepcopy(vehicles_present_inventory)
      336     temp_vehicle_stops = deepcopy(vehicle_stops)
        - 
        -     @label start_again
        0     for i in 1:num_of_customers
    14400         temp = vehicles_present_inventory .- demand_volume[order_index[i]]
    23984         free_vehicles = findall(temp .>= 0)
        - 
        0         if sizeof(free_vehicles) > 0
        0             chosen_vehicle = rand(free_vehicles)
        - 
        0             vehicle_num_stops[chosen_vehicle] = vehicle_num_stops[chosen_vehicle] + 1
        0             vehicles_present_inventory[chosen_vehicle] = vehicles_present_inventory[chosen_vehicle] - demand_volume[order_index[i]]
        0             vehicle_stops[vehicle_num_stops[chosen_vehicle],chosen_vehicle] = i
        -         else
        - 
        0             vehicle_num_stops = deepcopy(temp_vehicle_num_stops)
        0             vehicles_present_inventory = deepcopy(temp_vehicles_present_inventory)
        0             vehicle_stops = deepcopy(temp_vehicle_stops)
        - 
        -             @goto start_again
        -         end
        0     end
        - 
        0     for i in axes(vehicle_stops, 2)
        0         len = vehicle_num_stops[i]
        0         for j in 2:len
        0             vehicles_distances[i] = vehicles_distances[i] + distance_matrix[order_index[vehicle_stops[j-1,i]], order_index[vehicle_stops[j,i]]]
        0         end
        - 
        -         #############################################################
        0         if len > 0
        0             vehicles_distances[i] = vehicles_distances[i] + distance_matrix[1, order_index[vehicle_stops[1,i]]]
        0             vehicles_distances[i] = vehicles_distances[i] + distance_matrix[1, order_index[vehicle_stops[len,i]]]
        -         end
        -         #############################################################
        0     end
        - 
        - end
        - 
        - function switch!(random_vehicle_pairs,vehicle_stops_temp, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)
        - 
        0     vehicle_stops_temp .= vehicle_stops
        - 
        0     @floop for idx in axes(random_vehicle_pairs, 2)
        -         truck1 = random_vehicle_pairs[1, idx]
        -         truck2 = random_vehicle_pairs[2, idx]
        - 
        -         len1 = vehicle_num_stops[truck1]
        -         len2 = vehicle_num_stops[truck2]
        - 
        -         vol1 = vehicles_present_inventory[truck1]
        -         vol2 = vehicles_present_inventory[truck2]
        - 
        -         if (len1 > 0)
        - 
        -             temp_index = rand(1:len1)
        - 
        -             if (vol2 >= demand_volume[order_index[vehicle_stops_temp[temp_index,truck1]]])
        -                 temp = vehicle_stops_temp[temp_index,truck1]
        - 
        -                 for j in temp_index:len1
        -                     vehicle_stops_temp[j,truck1] = vehicle_stops_temp[j+1,truck1]
        -                 end
        - 
        -                 len1 = len1 - 1
        -                 vol1 = vol1 + demand_volume[order_index[temp]]
        - 
        -                 temp_index2 = 1
        - 
        -                 if len2 != 0
        -                     temp_index2 = rand(1:len2)
        -                 end
        - 
        -                 for j in len2:-1:temp_index2
        -                     vehicle_stops_temp[j+1,truck2] = vehicle_stops_temp[j,truck2]
        -                 end
        -                 
        -                 len2 = len2 + 1
        - 
        -                 vehicle_stops_temp[temp_index2,truck2] = temp
        - 
        -                 vol2 = vol2 - demand_volume[order_index[temp]]
        - 
        -                 distance1 = 0.0
        -                 for i in 2:len1
        -                     distance1 = distance1 + distance_matrix[order_index[vehicle_stops_temp[i-1,truck1]], order_index[vehicle_stops_temp[i,truck1]]]
        -                 end
        - 
        -                 distance2 = 0.0
        -                 for i in 2:len2
        -                     distance2 = distance2 + distance_matrix[order_index[vehicle_stops_temp[i-1,truck2]], order_index[vehicle_stops_temp[i,truck2]]]
        -                 end
        - 
        -                 ##############################################################
        -                 if len1 > 0
        -                     distance1 = distance1 + distance_matrix[1, order_index[vehicle_stops_temp[1,truck1]]]
        -                     distance1 = distance1 + distance_matrix[1, order_index[vehicle_stops_temp[len1,truck1]]]
        -                 end
        - 
        -                 if len2 > 0
        -                     distance2 = distance2 + distance_matrix[1, order_index[vehicle_stops_temp[1,truck2]]]
        -                     distance2 = distance2 + distance_matrix[1, order_index[vehicle_stops_temp[len2,truck2]]]
        -                 end
        -                 ##############################################################
        - 
        -                 change = (distance1 + distance2) - (vehicles_distances[truck1] + vehicles_distances[truck2])
        - 
        -                 if rand() <= min(1, exp(-(change / T_c)))
        - 
        -                     vehicle_num_stops[truck1] = len1
        -                     vehicle_num_stops[truck2] = len2
        - 
        -                     vehicles_present_inventory[truck1] = vol1
        -                     vehicles_present_inventory[truck2] = vol2
        - 
        -                     vehicle_stops[1:len1+1, truck1] .= @view(vehicle_stops_temp[1:len1+1,truck1])
        -                     vehicle_stops[1:len2, truck2] .= @view(vehicle_stops_temp[1:len2,truck2])
        - 
        -                     vehicles_distances[truck1] = distance1
        -                     vehicles_distances[truck2] = distance2
        -                 end
        -             end
        -         end
        -     end
        - end
        - 
        - function main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
        - 
        -     ################# Declaring the Truck Variables#############################
     8128     vehicle_stops = -ones(Int64, num_of_customers, num_of_vehicles)
      144     vehicles_distances = zeros(Float64, num_of_vehicles)
      144     vehicle_num_stops = zeros(Int, num_of_vehicles)
      144     vehicles_present_inventory = fill(volume, num_of_vehicles)
        - 
     1792     order_index = collect(1:num_of_customers) .+ 1
        - 
        -     ########Calculating Distance Matrix####################
        - 
        0     distance_matrix = Symmetric(pairwise(Euclidean(), positions, dims=2))
        - 
        -     ################Creating Initial Solution###################################
        - 
        0     initial_ans!(num_of_vehicles, num_of_customers, demand_volume, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, distance_matrix)
        - 
        -     ################Annealing Parameters########################################
        - 
        -     T_f = 0
        -     T_c = T_i
        - 
  8000048     costs = zeros(AbstractFloat, number_of_turns)
        - 
        0     ฮณ = (T_i - T_f) / number_of_turns
        -     ################Loop###########################################
        - 
      336     vehicle_stops2 = deepcopy(vehicle_stops)
        - 
       48     println("Solver Started")
        - 
        0     for j in 1:number_of_turns
        - 
240000000         random_vehicle_pairs = reshape(sample(1:num_of_vehicles, number_of_switches * 2; replace=false), 2, number_of_switches)
        - 
        0         switch!(random_vehicle_pairs, vehicle_stops2, vehicle_stops, vehicles_distances, vehicle_num_stops, vehicles_present_inventory, order_index, volume, demand_volume, distance_matrix, T_c)
        - 
        0         T_c = T_c - ฮณ
        - 
 16000000         costs[j] = sum(vehicles_distances)
        0     end
        - 
        0     return costs, vehicle_num_stops, vehicle_stops
        - end
        - 
        - ################Reading Data###############################################
        - 
        - num_of_customers :: Int =100
        - num_of_vehicles :: Int = 10
        - max_positions :: Int = 100
        - volume :: Int = 100
        - min_demand_volume, max_demand_volume = 1, 10
        - 
        - positions, demand_volume = generate_data(num_of_customers, num_of_vehicles, max_positions, volume, min_demand_volume, max_demand_volume)
        - 
        - ###########Input Parameters####################################
        - number_of_turns::Int = 1000000
        - T_i::Int = 15
        - number_of_switches = floor(Int, num_of_vehicles / 2)
        - ##############################################################
        - 
        - # @benchmark begin
        -     costs, vehicle_num_stops, vehicle_stops = main(number_of_turns, T_i, num_of_vehicles, num_of_customers, volume, positions, demand_volume, number_of_switches)
        - # end

Seems like there are no allocations happening in function switch! as expected. I am beginning to think that the reason for the slowness is because of the overhead of launching threads as suggested by @sqaure. Hope someone has a better explanation.

Yes, it is the overheard of threading. Because the loop being threaded is too fast for the dimensions given. If you increase the number of vehicles that is better, but you are limited by the fact that simply copying one array Is more costly than the work of the threaded loop, as I mentioned above.

Yes, @Imiq you seem to be correct, after changing the lines

for j in 1:number_of_turns

        random_vehicle_pairs = reshape(sample(1:num_of_vehicles, number_of_switches * 2; replace=false), 2, number_of_switches)

to

random_vehicle_pairs = reshape(collect(1:num_of_vehicles), 2, number_of_switches)

    for j in 1:number_of_turns

        shuffle!(random_vehicle_pairs)

and
vehicle_stops_temp .= vehicle_stops to ThreadsX.copyto!(vehicle_stops_temp, vehicle_stops)

After changing the problem size to

num_of_customers :: Int =10000
num_of_vehicles :: Int = 1000
max_positions :: Int = 100

number_of_turns::Int = 1000

The following are the results
for Serial

BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 7.971 s (0.10% GC) to evaluate,
 with a memory estimate of 1.24 GiB, over 125631 allocations.

for Parallel

BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 7.505 s (0.06% GC) to evaluate,
 with a memory estimate of 1.26 GiB, over 239763 allocations.