Connecting DataFrames with diffrent columns

I am a moron, who does not know how to connect DataFrames with diffrent columns.

So essentialy the issue is:
offspring1 = hcat(parent1[1:crossover_point], parent2[(crossover_point + 1):end])

The result should be a DataFrame containing selected columns from dataframe parent1 and dataframe parent2. Instead I get a matrix. The solution must be obvious, pointing clearly at my lack of intelligence.

Well, it turns out I really am an idiot. It seems that parent1[1:crossover_point] and parent2[(crossover_point + 1):end] create an instance of DataFrameRow instead of a DataFrame.

Therefore, the solution is:
offspring1 = hcat(DataFrame(parent1[1:crossover_point]), DataFrame(parent2[(crossover_point + 1):end])).

Let my stupidity be a warning to future generations as I am removed by the real Genetic Algorithm that I am trying to recreate.

Your sense of humour will carry you a long way in the real Genetic Algorithm, but I still don’t understand what type is parent1, and how this subsetting parent1[1:crossover_point] works, as it errored on Julia 1.9 when I tried it.

The DataFrames parent1 and parent2 essentially contain one row and 15 columns each, with 15 different indexes.

For example, parent1 looks like this:
8762 11968 2472 10832 4342 14746 5286 18075 5779 1915 9584 9051 16086 1530 15590

And parent2 looks like this:
4423 16075 13175 16537 18392 14211 12010 47 12065 8230 13449 16864 8410 6076 15923

With column names being: Position_1 to Position_15.
The code above is intended to create one offspring by mixing two candidate solutions in a crossover. The second offspring is created in the opposite way. At this point, before mutation, the future generation is created.

Still Dan makes a good point which is unrelated to the Julia version but relevant to the DataFrames version - indexing as df[1:something], ie one dimension only, was deprecated from DataFrames a long time ago so if that works for you you might be on a very outdated (pre 1.0) Version of DataFrames.

1 Like

I have a version of DataFrames 1.5.0 and Julia 1.9.0, so it is unlikley to be that outdated. Perahps it works because somehow parent1 and parent 2 are created as an instances of DataFrameRow instead of DataFrame.
Parents are created like that:
parent_pop[c, :], parent_pop[c + 1, :],
where for example c=1, and parent_pop is just a regular DataFrame.

1 Like

You may actually want to avoid DataFrames and use matrices, and when using matrices, let each agent/creature be a column and not a row. Why? DataFrames and Matrices store each column in a block of memory. And when cutting and stitching you would like to use bunched memory and not many far-flung locations in memory. In addition, DataFrames are good when there are disparate types of data (different column types).
Having said that, an even better way might be to have a DataFrame with all the agent metadata and a column of ‘genes’ which would hold vectors of all the traits.
Doing this refactoring would also be a good way to gain experience points with Julia.

1 Like

That is a very useful idea. Perhaps it will faster the process. I base my current knowledge about DataFrames with this course: 1. Environment Setup | JuliaAcademy