Hi everyone, do you know a way to have this working on Julia REPL?
>>> a = [1,2,3]
>>> a + [4]
[1, 2, 3, 4]
Hi everyone, do you know a way to have this working on Julia REPL?
>>> a = [1,2,3]
>>> a + [4]
[1, 2, 3, 4]
julia> a=[1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> push!(a,4)
4-element Vector{Int64}:
1
2
3
4
Thank you for your reply, but in that case, I can’t use append or push, because I’m iterating over the array. Let me explain with an example:
julia> a = [(1,2), (3,4)]
julia> for (i, t) in enumerate(a)
push!(a, reverse(t))
end
It will generate an infinite loop
What do you want to obtain? Is it a = [(1,2), (3,4), (2,1), (4,3)]
?
For this example, yes, but I wanna a generic solution. For example, for this case, I know that I can solve it with the code below (just renaming the variables and separating into different lists):
function compound(SMILES::String)
benzene = smilestomol(SMILES)
benzene_ed = []
i_vert = []
j_vert = []
for edge in benzene.edges
push!(benzene_ed, edge, reverse(edge))
end
for ed in benzene_ed
push!(i_vert, ed[1])
push!(j_vert, ed[2])
end
neighbor_list = tuple(i_vert, j_vert)
return neighbor_list
end
I really would like to know if there is a way of adding elements to arrays inside of an iteration, without using push! or append!
Without loops?
append!(a, reverse.(a))
There are options other than append!
and push!
but they are generally less efficient. There is a = [a; reverse.(a)]
or a = vcat(a, reverse.(a))
, …etc.
No, I wanna use it inside of an iterative loop
If push!
and the like are expensive for your application, you can pre-allocate the output array once and fill the individual elements inside the loop using the indices. Like:
i_vert = Vector{eltype(benzene_ed)}(undef,100)
...
i_vert[i] = benzene_ed[i]
I’m not sure if I understood correctly.
enumerate
isn’t normally what you want to use on vectors, it does not give you indices, it just counts. It’s better to use pairs
or eachindex
, depending on what you need. It’s unfortunate that enumerate
is so widely known and (mis)used on vectors, and pairs
is apparently so little known.
1.7.1> a = [(1,2), (3,4)]
2-element Vector{Tuple{Int64, Int64}}:
(1, 2)
(3, 4)
1.7.1> for (i, t) in pairs(a)
push!(a, reverse(t))
end
1.7.1> a
4-element Vector{Tuple{Int64, Int64}}:
(1, 2)
(3, 4)
(2, 1)
(4, 3)
Or
1.7.1> for i in eachindex(a)
push!(a, reverse(a[i]))
end
BTW, these are very negative for performance:
benzene_ed = []
i_vert = []
j_vert = []
Untyped containers are a big red flag, if you ever have x = []
somewhere in your code, it’s a bad sign, since they create a Vector{Any}
for which the compiler cannot create specialized (fast) code.
You could also avoid the infinite loop, by looping over a view of a
:
a = [(1,2), (3,4)]
for t in view(a,:)
push!(a, reverse(t))
end
It is not clear what you don’t like about those, is it the syntax only?
I think it’s probably a misunderstanding. The problem isn’t with push!
or append!
, but with enumerate
and with for val in itr
.
I think this is clearly an instance of the XY problem, you are asking for something that you think it is the solution instead of just asking how to solve your real problem. It was not clear (maybe it is not yet, but I think I understand now) what property this function that is not push!
or append!
should have to satisfy your requirements.
It seems to me that your problem has nothing to do with push!
or append!
, but the fact that the algorithm (or at least the loop inside it) is badly designed. If you wanna grow a Vector
adding an element for each element originally inside it, then just looping over the Vector
will not work, as you noticed this ends up as an infinite loop. The first solutions I can think of are either:
Vector
, iterate over the copy, but make changes to the original, this way the loop will stop where after all the original elements were processed.Vector
in a variable, and iterate from 1 to this length, this way the loop will also stop after all the original elements were processed, even if the Vector
grew in the meantime .This is the same as using pairs
or eachindex
, as suggested upthread, but those will also handle generic indexing (like zero-based).
For me it was a novelty that enumerate
works that way. It is not clear in the documentation that it has such a fundamental implementation difference relative to pairs
, etc. Not sure how to search for for val in vec
in the docs, I think that is not clear there either.
It should not be that rare to iterate over an array adding elements to it, I’m surprised nobody came with that issue before (particularly for the for val in vec
syntax, which is very common).
Yes, I know, but why these are a solution was not explained anywhere in the thread. I tried to explain the source of the problem and give basic understandable alternatives. As it seemed to me OP needed that more than “just use eachindex
or pairs
”.
Not sure how to search for for val in vec in the docs, I think that is not clear there either.
The structure for val in vec
calls Base.iterate. There is also “Manual > Control Flow > Repeated Iteration: Loops” but, while they show the for val in vec
syntax, they do not mention Base.Iterate
or the detail that the original length is not cached. Only “Manual > Interfaces > Iteration” gets closer to giving answers.
For me it was a novelty that enumerate works that way. It is not clear in the documentation that it has such a fundamental implementation difference relative to pairs, etc.
Yes, I think maybe a documentation PR would be adequate. Maybe I will start one later if nobody else want dibs. But seems that everyone (including myself) was under a false impression about pairs
, because pairs
documentation explicitly says:
Mutation of the bounds of the underlying array will invalidate this iterator.
So it should not be used in the case above. The eachindex
documentation does not make the same comment.
It should not be that rare to iterate over an array adding elements to it, I’m surprised nobody came with that issue before (particularly for the for val in vec syntax, which is very common).
Yes, I agree. But almost every time I do it, I want to iterate over the newly added elements too (almost every graph algorithm will want that) so this behaviour was always what I wanted and expected.
Okay, perhaps my post was less useful than I hoped. But I would not recommend 1:length()
, that’s a bad habit that should be ditched as soon as possible. The natural alternative to enumerate
is pairs
.
My real problem is in finding the chemical structure cycles recursively via DFS, the iterations problem I reported is in the find_cycles function. I want the dumb_cycles function to return an array composed of the arrays with the vertices that make up the cycle.
using Pkg
using LinearAlgebra
using PyCall
using Statistics
using DataStructures
function convert_neighbor_list(nl)
n_vert = maximum.(nl)[1] + 1
K = []
V = []
for i in 1:n_vert
#make two different lists and merge it to a dict
push!(K, i)
push!(V, Int64[])
end
new = OrderedDict(zip(K, V))
for (i_v, j_v) in zip(nl[1], nl[2])
push!(new[i_v + 1], j_v)
end
return new
end
function find_cycles(i_vert, cnl, max_length, cur_parth, passed_edges)
if length(cur_parth) == max_length
return []
end
acc_cycles = []
sort_cycles = []
res = []
neighbs = cnl[i_vert+1]
for n in neighbs
edge = (minimum([i_vert, n]), maximum([i_vert, n]))
if edge ∉ passed_edges
if n in cur_parth[2:end]
return []
end
end
end
for n in neighbs
edge = (minimum([i_vert, n]), maximum([i_vert, n]))
if edge ∉ passed_edges
if n == cur_parth[1]
return [cur_parth]
end
end
end
for n in neighbs
edge = (minimum([i_vert, n]), maximum([i_vert, n]))
if edge ∉ passed_edges
cycs = find_cycles(n, cnl, max_length, append!(cur_parth, [n]), append!(passed_edges, [edge]))
for cyc in cycs
sorted_cyc = tuple(cyc)
if sorted_cyc ∉ sort_cycles
append!(sort_cycles, sorted_cyc)
append!(acc_cycles, cyc)
end
end
end
end
return acc_cycles
end
function dumb_cycle_detection(ase_atoms_no_h, max_length)
neighborlist = pyimport("ase.neighborlist")
neighbor_list = neighborlist.neighbor_list("ij", ase_atoms_no_h, 2.0)
cycles = []
sorted_cycles = []
n_vert = maximum(neighbor_list[1])
cnl = convert_neighbor_list(neighbor_list)
for i_vert in range(1, n_vert+1)
cycs = find_cycles(i_vert-1, cnl, max_length, [i_vert-1], [])
for cyc in cycs
sorted_cyc = tuple(cyc)
if sorted_cyc ∉ sorted_cycles
append!(sorted_cycles, sorted_cyc)
append!(cycles, cyc)
end
end
end
return cycles
end
ase_io = pyimport("ase.io")
ase_atoms = ase_io.read("donut-6-b3lyp-opt.xyz")
ase = pyimport("ase")
ase_atoms_no_h = ase.Atoms([a for a in ase_atoms if a.symbol != 'H'])
a = dumb_cycle_detection(ase_atoms_no_h, 6) ## max_length is the maximum size of the cycle that I wanna find
print(a)
The output should be:
[[0, 2, 3, 4, 5, 1], [1, 5, 9, 8, 7, 6], [4, 5, 9, 10, 11, 12], [6, 7, 24, 25, 26, 27], [7, 8, 20, 22, 23, 24], [8, 9, 10, 19, 21, 20], [10, 11, 16, 17, 18, 19], [11, 12, 13, 14, 15, 16], [15, 16, 17, 163, 164, 165], [23, 24, 25, 43, 44, 45], [28, 29, 33, 32, 31, 30], [29, 33, 37, 36, 35, 34], [32, 40, 39, 38, 37, 33], [34, 35, 52, 53, 54, 55], [35, 36, 48, 50, 51, 52], [36, 37, 38, 47, 49, 48], [38, 39, 44, 45, 46, 47], [39, 40, 41, 42, 43, 44], [51, 73, 72, 71, 53, 52], [56, 57, 61, 60, 59, 58], [57, 61, 65, 64, 63, 62], [60, 68, 67, 66, 65, 61], [62, 63, 80, 81, 82, 83], [63, 80, 79, 78, 76, 64], [64, 76, 77, 75, 66, 65], [66, 67, 72, 73, 74, 75], [67, 68, 69, 70, 71, 72], [79, 80, 81, 99, 100, 101], [84, 85, 89, 88, 87, 86], [85, 90, 91, 92, 93, 89], [88, 89, 93, 94, 95, 96], [90, 91, 108, 109, 110, 111], [91, 92, 104, 106, 107, 108], [92, 93, 94, 103, 105, 104], [94, 103, 102, 101, 100, 95], [95, 96, 97, 98, 99, 100], [107, 129, 128, 127, 109, 108], [112, 113, 117, 116, 115, 114], [113, 117, 121, 120, 119, 118], [116, 124, 123, 122, 121, 117], [118, 119, 136, 137, 138, 139], [119, 120, 132, 134, 135, 136], [120, 121, 122, 131, 133, 132], [122, 123, 128, 129, 130, 131], [123, 124, 125, 126, 127, 128], [135, 136, 137, 155, 156, 157], [140, 141, 145, 144, 143, 142], [141, 146, 147, 148, 149, 145], [144, 145, 149, 150, 151, 152], [146, 147, 164, 165, 166, 167], [147, 164, 163, 162, 160, 148], [148, 160, 161, 159, 150, 149], [150, 151, 156, 157, 158, 159], [151, 152, 153, 154, 155, 156]]
My real problem is in finding the chemical structure cycles recursively via DFS, the iterations problem I reported is in the find_cycles function. I want the dumb_cycles function to return an array composed of the arrays with the vertices that make up the cycle.
using Pkg
using LinearAlgebra
using PyCall
using Statistics
using DataStructures
function convert_neighbor_list(nl)
n_vert = maximum.(nl)[1] + 1
K = []
V = []
for i in 1:n_vert
#make two different lists and merge it to a dict
push!(K, i)
push!(V, Int64[])
end
new = OrderedDict(zip(K, V))
for (i_v, j_v) in zip(nl[1], nl[2])
push!(new[i_v + 1], j_v)
end
return new
end
function find_cycles(i_vert, cnl, max_length, cur_parth, passed_edges)
if length(cur_parth) == max_length
return []
end
acc_cycles = []
sort_cycles = []
res = []
neighbs = cnl[i_vert+1]
for n in neighbs
edge = (minimum([i_vert, n]), maximum([i_vert, n]))
if edge ∉ passed_edges
if n in cur_parth[2:end]
return []
end
end
end
for n in neighbs
edge = (minimum([i_vert, n]), maximum([i_vert, n]))
if edge ∉ passed_edges
if n == cur_parth[1]
return [cur_parth]
end
end
end
for n in neighbs
edge = (minimum([i_vert, n]), maximum([i_vert, n]))
if edge ∉ passed_edges
cycs = find_cycles(n, cnl, max_length, append!(cur_parth, [n]), append!(passed_edges, [edge]))
for cyc in cycs
sorted_cyc = tuple(cyc)
if sorted_cyc ∉ sort_cycles
append!(sort_cycles, sorted_cyc)
append!(acc_cycles, cyc)
end
end
end
end
return acc_cycles
end
function dumb_cycle_detection(ase_atoms_no_h, max_length)
neighborlist = pyimport("ase.neighborlist")
neighbor_list = neighborlist.neighbor_list("ij", ase_atoms_no_h, 2.0)
cycles = []
sorted_cycles = []
n_vert = maximum(neighbor_list[1])
cnl = convert_neighbor_list(neighbor_list)
for i_vert in range(1, n_vert+1)
cycs = find_cycles(i_vert-1, cnl, max_length, [i_vert-1], [])
for cyc in cycs
sorted_cyc = tuple(cyc)
if sorted_cyc ∉ sorted_cycles
append!(sorted_cycles, sorted_cyc)
append!(cycles, cyc)
end
end
end
return cycles
end
ase_io = pyimport("ase.io")
ase_atoms = ase_io.read("donut-6-b3lyp-opt.xyz")
ase = pyimport("ase")
ase_atoms_no_h = ase.Atoms([a for a in ase_atoms if a.symbol != 'H'])
a = dumb_cycle_detection(ase_atoms_no_h, 6) ## max_length is the maximum size of the cycle that I wanna find
print(a)
The output should be:
[[0, 2, 3, 4, 5, 1], [1, 5, 9, 8, 7, 6], [4, 5, 9, 10, 11, 12], [6, 7, 24, 25, 26, 27], [7, 8, 20, 22, 23, 24], [8, 9, 10, 19, 21, 20], [10, 11, 16, 17, 18, 19], [11, 12, 13, 14, 15, 16], [15, 16, 17, 163, 164, 165], [23, 24, 25, 43, 44, 45], [28, 29, 33, 32, 31, 30], [29, 33, 37, 36, 35, 34], [32, 40, 39, 38, 37, 33], [34, 35, 52, 53, 54, 55], [35, 36, 48, 50, 51, 52], [36, 37, 38, 47, 49, 48], [38, 39, 44, 45, 46, 47], [39, 40, 41, 42, 43, 44], [51, 73, 72, 71, 53, 52], [56, 57, 61, 60, 59, 58], [57, 61, 65, 64, 63, 62], [60, 68, 67, 66, 65, 61], [62, 63, 80, 81, 82, 83], [63, 80, 79, 78, 76, 64], [64, 76, 77, 75, 66, 65], [66, 67, 72, 73, 74, 75], [67, 68, 69, 70, 71, 72], [79, 80, 81, 99, 100, 101], [84, 85, 89, 88, 87, 86], [85, 90, 91, 92, 93, 89], [88, 89, 93, 94, 95, 96], [90, 91, 108, 109, 110, 111], [91, 92, 104, 106, 107, 108], [92, 93, 94, 103, 105, 104], [94, 103, 102, 101, 100, 95], [95, 96, 97, 98, 99, 100], [107, 129, 128, 127, 109, 108], [112, 113, 117, 116, 115, 114], [113, 117, 121, 120, 119, 118], [116, 124, 123, 122, 121, 117], [118, 119, 136, 137, 138, 139], [119, 120, 132, 134, 135, 136], [120, 121, 122, 131, 133, 132], [122, 123, 128, 129, 130, 131], [123, 124, 125, 126, 127, 128], [135, 136, 137, 155, 156, 157], [140, 141, 145, 144, 143, 142], [141, 146, 147, 148, 149, 145], [144, 145, 149, 150, 151, 152], [146, 147, 164, 165, 166, 167], [147, 164, 163, 162, 160, 148], [148, 160, 161, 159, 150, 149], [150, 151, 156, 157, 158, 159], [151, 152, 153, 154, 155, 156]]