You can get a nice speed-up by declaring a precise type for the dictionary with
d = Dict{eltype(labels), eltype(array)}()
In my small test this divides the time by two.
I get another 30% savings by using DataFrames instead:
using DataFrames
a = rand(1000, 1000)
b = rand(["a", "b", "c", "d"], size(a))
df = DataFrame(array=vec(a), labels=vec(b), copycols=false)
combine(groupby(df, :labels), :array => median)
Not sure if you will get the same savings with your real dataset though…