For an algorithm of mine, I am looking at hashes in order to reset (srand) the random number generator.
Now if I am iteratively applying hash, i.e. hash(x,hash(y,hash(z,hash(a)))). But the results seem to differ even though the individual hashes are the same.
More precisely, I have two concurrent Julia sessions running and the result of a similar hash expression does not match. Is this expected?
I read somewhere that the address of an object would matter. Is that the case?
Notably in my case x is a DataFrame.
I can get what I want by iterating over all columns of the DataFrame (last row of my code). But this is more cubmersome.
Why is doesnotmatch not the same in both sessions?
Please see the yellow markings on the screenshot.
[EDIT: I cannot share the data, I could try to find an MWE, if that is needed to answer my question]
julia> hw=0x220926c324d7bb27
0x220926c324d7bb27
julia> hd=0xcbb7fba98e0ff596
0xcbb7fba98e0ff596
julia> hn=0xe1c9b71b1ae3b261
0xe1c9b71b1ae3b261
julia> hf=0x213b20190172ee15
0x213b20190172ee15
julia>
julia> @assert hash(dtmtable.weight)==hw
julia> @assert hash(dtmtable.denominator)==hd
julia> @assert hash(dtmtable.numerator)==hn
julia> @assert hash(dtmtable.features)==hf
julia>
julia> hash(hash(dtmtable.numerator,hash(dtmtable.denominator,hash(dtmtable.weight))))
0x18566e5352609be5
julia> doesnotmatch=hash(dtmtable.features,hash(dtmtable.numerator,hash(dtmtable.denominator,hash(dtmtable.weight)))
0x9e53050d52b56580
julia> typeof(dtmtable.features)
DataFrames.DataFrame
julia> hash(dtmtable.features)
0x213b20190172ee15
julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e2ff* (2017-10-24 22:15 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> Pkg.installed("DataFrames")
v"0.11.2"
julia>
julia> s=hash(dtmtable.numerator,hash(dtmtable.denominator,hash(dtmtable.weight)))
0xc4d8482b839696fe
julia> for x=1:size(dtmtable.features,2)
s=hash(dtmtable.features[x],s)
end
julia> s #matches which is fine
0x207ad236952c3bdb
I’m having trouble following your example, but could this be simply because hash() falls back to hashing the object ID, which is different for every instance of an object and will be different across Julia sessions?
Good catch, that’s a bug in DataFrames. Note that for the initial @assert calls, you are calling the one-argument hash method, while for the combined computation below you are using the two-argument version, which can differ. You should call hash(dtmtable.weight, zero(UInt)) and so on to ensure you are calling the same functions.
In the present case, the bug is that DataFrames only defines the one-argument version of hash. See this pull request, which you can try via Pkg.checkout("DataFrames", "nl/hash").
Apologies. My problem was not well formulated (I did not have the full understanding of the hash with 1 argument and 2 arguments). Actually, I still do not quite know how things are mixed if it has two arguments.
Luckily nalimilan understood my problem and the issue which she (or he) also fixed.
For completeness: below would be the MWE. If you start up Julia several times (or several paralle sessions) you will get a different result each time (which is not the case if the hash is correctly defined; you could try it with PooledArray for instance).
I think in your example the hash varies because it has not been explicitly defined (in what I consider a meaningful way) for the new type.