Add elements to tuples or arrays

Leticia-maria · February 6, 2022, 1:02am

Hi everyone, do you know a way to have this working on Julia REPL?

>>> a = [1,2,3]
>>> a + [4]
[1, 2, 3, 4]

ctkelley · February 6, 2022, 1:22am

julia> a=[1, 2, 3]

3-element Vector{Int64}:

1

2

3

julia> push!(a,4)

4-element Vector{Int64}:

1

2

3

4

Leticia-maria · February 6, 2022, 1:43am

Thank you for your reply, but in that case, I can’t use append or push, because I’m iterating over the array. Let me explain with an example:

julia> a = [(1,2), (3,4)]
julia> for (i, t) in enumerate(a) 
push!(a, reverse(t))
end

It will generate an infinite loop

Seif_Shebl · February 6, 2022, 1:51am

What do you want to obtain? Is it a = [(1,2), (3,4), (2,1), (4,3)]?

Leticia-maria · February 6, 2022, 1:55am

For this example, yes, but I wanna a generic solution. For example, for this case, I know that I can solve it with the code below (just renaming the variables and separating into different lists):

function compound(SMILES::String)
    benzene = smilestomol(SMILES)
    benzene_ed = []
    i_vert = []
    j_vert = []
    for edge in benzene.edges
        push!(benzene_ed, edge, reverse(edge))
    end
    for ed in benzene_ed
        push!(i_vert, ed[1])
        push!(j_vert, ed[2])
    end
    
    neighbor_list = tuple(i_vert, j_vert)
    return neighbor_list
end

I really would like to know if there is a way of adding elements to arrays inside of an iteration, without using push! or append!

Seif_Shebl · February 6, 2022, 2:00am

Without loops?

append!(a, reverse.(a))

There are options other than append! and push! but they are generally less efficient. There is a = [a; reverse.(a)] or a = vcat(a, reverse.(a)), …etc.

Leticia-maria · February 6, 2022, 2:04am

No, I wanna use it inside of an iterative loop

Seif_Shebl · February 6, 2022, 2:13am

If push! and the like are expensive for your application, you can pre-allocate the output array once and fill the individual elements inside the loop using the indices. Like:

i_vert = Vector{eltype(benzene_ed)}(undef,100)
...
i_vert[i] = benzene_ed[i]

I’m not sure if I understood correctly.

DNF · February 6, 2022, 7:51am

enumerate isn’t normally what you want to use on vectors, it does not give you indices, it just counts. It’s better to use pairs or eachindex, depending on what you need. It’s unfortunate that enumerate is so widely known and (mis)used on vectors, and pairs is apparently so little known.

1.7.1> a = [(1,2), (3,4)]
2-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)

1.7.1> for (i, t) in pairs(a)
       push!(a, reverse(t))
       end

1.7.1> a
4-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)
 (2, 1)
 (4, 3)

Or

1.7.1> for i in eachindex(a)
       push!(a, reverse(a[i]))
       end

BTW, these are very negative for performance:

benzene_ed = []
i_vert = []
j_vert = []

Untyped containers are a big red flag, if you ever have x = [] somewhere in your code, it’s a bad sign, since they create a Vector{Any} for which the compiler cannot create specialized (fast) code.

rafael.guerra · February 6, 2022, 9:59am

You could also avoid the infinite loop, by looping over a view of a:

a = [(1,2), (3,4)]
for t in view(a,:)
    push!(a, reverse(t))
end

lmiq · February 6, 2022, 11:12am

It is not clear what you don’t like about those, is it the syntax only?

DNF · February 6, 2022, 11:26am

I think it’s probably a misunderstanding. The problem isn’t with push! or append!, but with enumerate and with for val in itr.

Henrique_Becker · February 6, 2022, 3:18pm

I think this is clearly an instance of the XY problem, you are asking for something that you think it is the solution instead of just asking how to solve your real problem. It was not clear (maybe it is not yet, but I think I understand now) what property this function that is not push! or append! should have to satisfy your requirements.

It seems to me that your problem has nothing to do with push! or append!, but the fact that the algorithm (or at least the loop inside it) is badly designed. If you wanna grow a Vector adding an element for each element originally inside it, then just looping over the Vector will not work, as you noticed this ends up as an infinite loop. The first solutions I can think of are either:

Make a copy of the original Vector, iterate over the copy, but make changes to the original, this way the loop will stop where after all the original elements were processed.
(Probably the best.) Just save the length of the original Vector in a variable, and iterate from 1 to this length, this way the loop will also stop after all the original elements were processed, even if the Vector grew in the meantime .

DNF · February 6, 2022, 3:33pm

This is the same as using pairs or eachindex, as suggested upthread, but those will also handle generic indexing (like zero-based).

lmiq · February 6, 2022, 4:10pm

For me it was a novelty that enumerate works that way. It is not clear in the documentation that it has such a fundamental implementation difference relative to pairs, etc. Not sure how to search for for val in vec in the docs, I think that is not clear there either.

It should not be that rare to iterate over an array adding elements to it, I’m surprised nobody came with that issue before (particularly for the for val in vec syntax, which is very common).

Henrique_Becker · February 6, 2022, 4:10pm

Yes, I know, but why these are a solution was not explained anywhere in the thread. I tried to explain the source of the problem and give basic understandable alternatives. As it seemed to me OP needed that more than “just use eachindex or pairs”.

Henrique_Becker · February 6, 2022, 4:31pm

Not sure how to search for for val in vec in the docs, I think that is not clear there either.

The structure for val in vec calls Base.iterate. There is also “Manual > Control Flow > Repeated Iteration: Loops” but, while they show the for val in vec syntax, they do not mention Base.Iterate or the detail that the original length is not cached. Only “Manual > Interfaces > Iteration” gets closer to giving answers.

For me it was a novelty that enumerate works that way. It is not clear in the documentation that it has such a fundamental implementation difference relative to pairs, etc.

Yes, I think maybe a documentation PR would be adequate. Maybe I will start one later if nobody else want dibs. But seems that everyone (including myself) was under a false impression about pairs, because pairs documentation explicitly says:

Mutation of the bounds of the underlying array will invalidate this iterator.

So it should not be used in the case above. The eachindex documentation does not make the same comment.

It should not be that rare to iterate over an array adding elements to it, I’m surprised nobody came with that issue before (particularly for the for val in vec syntax, which is very common).

Yes, I agree. But almost every time I do it, I want to iterate over the newly added elements too (almost every graph algorithm will want that) so this behaviour was always what I wanted and expected.

DNF · February 6, 2022, 6:08pm

Okay, perhaps my post was less useful than I hoped. But I would not recommend 1:length(), that’s a bad habit that should be ditched as soon as possible. The natural alternative to enumerate is pairs.

Leticia-maria · February 6, 2022, 6:57pm

My real problem is in finding the chemical structure cycles recursively via DFS, the iterations problem I reported is in the find_cycles function. I want the dumb_cycles function to return an array composed of the arrays with the vertices that make up the cycle.

using Pkg
using LinearAlgebra
using PyCall
using Statistics
using DataStructures

function convert_neighbor_list(nl)
    n_vert = maximum.(nl)[1] + 1
    K = []
    V = []

    for i in 1:n_vert
        #make two different lists and merge it to a dict
        push!(K, i)
        push!(V, Int64[])
    end
    new = OrderedDict(zip(K, V))

    for (i_v, j_v) in zip(nl[1], nl[2])
        push!(new[i_v + 1], j_v)
    end

    return new
end

function find_cycles(i_vert, cnl, max_length, cur_parth, passed_edges)

    if length(cur_parth) == max_length
        return []
    end

    acc_cycles = []
    sort_cycles = []
    res = []

    neighbs = cnl[i_vert+1]
    for n in neighbs
        edge = (minimum([i_vert, n]), maximum([i_vert, n]))
        
        if edge ∉ passed_edges
            if n in cur_parth[2:end]
                return []
            end
        end
    end

    for n in neighbs
        edge = (minimum([i_vert, n]), maximum([i_vert, n]))
        
        if edge ∉ passed_edges
            if n == cur_parth[1]
                return [cur_parth]
            end
        end
    end

    for n in neighbs
        edge = (minimum([i_vert, n]), maximum([i_vert, n]))

        if edge ∉ passed_edges
            cycs = find_cycles(n, cnl, max_length, append!(cur_parth, [n]), append!(passed_edges, [edge]))
            for cyc in cycs
                sorted_cyc = tuple(cyc)
                if sorted_cyc ∉ sort_cycles
                    append!(sort_cycles, sorted_cyc)
                    append!(acc_cycles, cyc)
                end
            end
        end
    end

    return acc_cycles
end

function dumb_cycle_detection(ase_atoms_no_h, max_length)
    neighborlist = pyimport("ase.neighborlist")
    neighbor_list = neighborlist.neighbor_list("ij", ase_atoms_no_h, 2.0)
    cycles = []
    sorted_cycles = []
    n_vert = maximum(neighbor_list[1])
    cnl  = convert_neighbor_list(neighbor_list)
    for i_vert in range(1, n_vert+1)
        cycs = find_cycles(i_vert-1, cnl, max_length, [i_vert-1], [])
        for cyc in cycs
            sorted_cyc = tuple(cyc)
            if sorted_cyc ∉ sorted_cycles
                append!(sorted_cycles, sorted_cyc)
                append!(cycles, cyc)
            end
        end
    end
    return cycles
end

ase_io = pyimport("ase.io")
ase_atoms = ase_io.read("donut-6-b3lyp-opt.xyz")
ase = pyimport("ase")
ase_atoms_no_h = ase.Atoms([a for a in ase_atoms if a.symbol != 'H'])
a = dumb_cycle_detection(ase_atoms_no_h, 6) ## max_length is the maximum size of the cycle that I wanna find
print(a)

The output should be:

[[0, 2, 3, 4, 5, 1], [1, 5, 9, 8, 7, 6], [4, 5, 9, 10, 11, 12], [6, 7, 24, 25, 26, 27], [7, 8, 20, 22, 23, 24], [8, 9, 10, 19, 21, 20], [10, 11, 16, 17, 18, 19], [11, 12, 13, 14, 15, 16], [15, 16, 17, 163, 164, 165], [23, 24, 25, 43, 44, 45], [28, 29, 33, 32, 31, 30], [29, 33, 37, 36, 35, 34], [32, 40, 39, 38, 37, 33], [34, 35, 52, 53, 54, 55], [35, 36, 48, 50, 51, 52], [36, 37, 38, 47, 49, 48], [38, 39, 44, 45, 46, 47], [39, 40, 41, 42, 43, 44], [51, 73, 72, 71, 53, 52], [56, 57, 61, 60, 59, 58], [57, 61, 65, 64, 63, 62], [60, 68, 67, 66, 65, 61], [62, 63, 80, 81, 82, 83], [63, 80, 79, 78, 76, 64], [64, 76, 77, 75, 66, 65], [66, 67, 72, 73, 74, 75], [67, 68, 69, 70, 71, 72], [79, 80, 81, 99, 100, 101], [84, 85, 89, 88, 87, 86], [85, 90, 91, 92, 93, 89], [88, 89, 93, 94, 95, 96], [90, 91, 108, 109, 110, 111], [91, 92, 104, 106, 107, 108], [92, 93, 94, 103, 105, 104], [94, 103, 102, 101, 100, 95], [95, 96, 97, 98, 99, 100], [107, 129, 128, 127, 109, 108], [112, 113, 117, 116, 115, 114], [113, 117, 121, 120, 119, 118], [116, 124, 123, 122, 121, 117], [118, 119, 136, 137, 138, 139], [119, 120, 132, 134, 135, 136], [120, 121, 122, 131, 133, 132], [122, 123, 128, 129, 130, 131], [123, 124, 125, 126, 127, 128], [135, 136, 137, 155, 156, 157], [140, 141, 145, 144, 143, 142], [141, 146, 147, 148, 149, 145], [144, 145, 149, 150, 151, 152], [146, 147, 164, 165, 166, 167], [147, 164, 163, 162, 160, 148], [148, 160, 161, 159, 150, 149], [150, 151, 156, 157, 158, 159], [151, 152, 153, 154, 155, 156]]

Leticia-maria · February 6, 2022, 7:00pm

My real problem is in finding the chemical structure cycles recursively via DFS, the iterations problem I reported is in the find_cycles function. I want the dumb_cycles function to return an array composed of the arrays with the vertices that make up the cycle.

using Pkg
using LinearAlgebra
using PyCall
using Statistics
using DataStructures

function convert_neighbor_list(nl)
    n_vert = maximum.(nl)[1] + 1
    K = []
    V = []

    for i in 1:n_vert
        #make two different lists and merge it to a dict
        push!(K, i)
        push!(V, Int64[])
    end
    new = OrderedDict(zip(K, V))

    for (i_v, j_v) in zip(nl[1], nl[2])
        push!(new[i_v + 1], j_v)
    end

    return new
end

function find_cycles(i_vert, cnl, max_length, cur_parth, passed_edges)

    if length(cur_parth) == max_length
        return []
    end

    acc_cycles = []
    sort_cycles = []
    res = []

    neighbs = cnl[i_vert+1]
    for n in neighbs
        edge = (minimum([i_vert, n]), maximum([i_vert, n]))
        
        if edge ∉ passed_edges
            if n in cur_parth[2:end]
                return []
            end
        end
    end

    for n in neighbs
        edge = (minimum([i_vert, n]), maximum([i_vert, n]))
        
        if edge ∉ passed_edges
            if n == cur_parth[1]
                return [cur_parth]
            end
        end
    end

    for n in neighbs
        edge = (minimum([i_vert, n]), maximum([i_vert, n]))

        if edge ∉ passed_edges
            cycs = find_cycles(n, cnl, max_length, append!(cur_parth, [n]), append!(passed_edges, [edge]))
            for cyc in cycs
                sorted_cyc = tuple(cyc)
                if sorted_cyc ∉ sort_cycles
                    append!(sort_cycles, sorted_cyc)
                    append!(acc_cycles, cyc)
                end
            end
        end
    end

    return acc_cycles
end

function dumb_cycle_detection(ase_atoms_no_h, max_length)
    neighborlist = pyimport("ase.neighborlist")
    neighbor_list = neighborlist.neighbor_list("ij", ase_atoms_no_h, 2.0)
    cycles = []
    sorted_cycles = []
    n_vert = maximum(neighbor_list[1])
    cnl  = convert_neighbor_list(neighbor_list)
    for i_vert in range(1, n_vert+1)
        cycs = find_cycles(i_vert-1, cnl, max_length, [i_vert-1], [])
        for cyc in cycs
            sorted_cyc = tuple(cyc)
            if sorted_cyc ∉ sorted_cycles
                append!(sorted_cycles, sorted_cyc)
                append!(cycles, cyc)
            end
        end
    end
    return cycles
end

ase_io = pyimport("ase.io")
ase_atoms = ase_io.read("donut-6-b3lyp-opt.xyz")
ase = pyimport("ase")
ase_atoms_no_h = ase.Atoms([a for a in ase_atoms if a.symbol != 'H'])
a = dumb_cycle_detection(ase_atoms_no_h, 6) ## max_length is the maximum size of the cycle that I wanna find
print(a)

The output should be:

[[0, 2, 3, 4, 5, 1], [1, 5, 9, 8, 7, 6], [4, 5, 9, 10, 11, 12], [6, 7, 24, 25, 26, 27], [7, 8, 20, 22, 23, 24], [8, 9, 10, 19, 21, 20], [10, 11, 16, 17, 18, 19], [11, 12, 13, 14, 15, 16], [15, 16, 17, 163, 164, 165], [23, 24, 25, 43, 44, 45], [28, 29, 33, 32, 31, 30], [29, 33, 37, 36, 35, 34], [32, 40, 39, 38, 37, 33], [34, 35, 52, 53, 54, 55], [35, 36, 48, 50, 51, 52], [36, 37, 38, 47, 49, 48], [38, 39, 44, 45, 46, 47], [39, 40, 41, 42, 43, 44], [51, 73, 72, 71, 53, 52], [56, 57, 61, 60, 59, 58], [57, 61, 65, 64, 63, 62], [60, 68, 67, 66, 65, 61], [62, 63, 80, 81, 82, 83], [63, 80, 79, 78, 76, 64], [64, 76, 77, 75, 66, 65], [66, 67, 72, 73, 74, 75], [67, 68, 69, 70, 71, 72], [79, 80, 81, 99, 100, 101], [84, 85, 89, 88, 87, 86], [85, 90, 91, 92, 93, 89], [88, 89, 93, 94, 95, 96], [90, 91, 108, 109, 110, 111], [91, 92, 104, 106, 107, 108], [92, 93, 94, 103, 105, 104], [94, 103, 102, 101, 100, 95], [95, 96, 97, 98, 99, 100], [107, 129, 128, 127, 109, 108], [112, 113, 117, 116, 115, 114], [113, 117, 121, 120, 119, 118], [116, 124, 123, 122, 121, 117], [118, 119, 136, 137, 138, 139], [119, 120, 132, 134, 135, 136], [120, 121, 122, 131, 133, 132], [122, 123, 128, 129, 130, 131], [123, 124, 125, 126, 127, 128], [135, 136, 137, 155, 156, 157], [140, 141, 145, 144, 143, 142], [141, 146, 147, 148, 149, 145], [144, 145, 149, 150, 151, 152], [146, 147, 164, 165, 166, 167], [147, 164, 163, 162, 160, 148], [148, 160, 161, 159, 150, 149], [150, 151, 156, 157, 158, 159], [151, 152, 153, 154, 155, 156]]