How to export data in arrays form?

Hello
in the database I have this data:

16Γ—3 Array{Any,2}:
 "P1_BloEco_m"   1  0.01
 "P1_BloEco_s"   2  0.06
 "P1_R"          3  0.22
 "P1_R"          4  0.242
 "P2_R"          5  0.015
 "P2_R"          6  0.06
 "P3_A"          7  0.225
 "P3_A"          8  0.245
 "P4_B"          9  0.5
 "P4_B"         10  0.52
 "P5_C"         11  0.53
 "P5_C"         12  0.54
 "P6_H"         13  0.2
 "P6_H"         14  0.21
 "P7_T"         15  0.22
 "P7_T"         16  0.23

Would it be possible to read this data in this format?

4Γ—4 Array{Float64,2}:
 0.01   0.015  0.5   0.2
 0.06   0.06   0.52  0.21
 0.22   0.225  0.53  0.22
 0.242  0.245  0.54  0.23

Thanks!!

Something like this?

julia> a = [ "P1_BloEco_m"   1  0.01
        "P1_BloEco_s"   2  0.06
        "P1_R"          3  0.22
        "P1_R"          4  0.242
        "P2_R"          5  0.015
        "P2_R"          6  0.06
        "P3_A"          7  0.225
        "P3_A"          8  0.245
        "P4_B"          9  0.5
        "P4_B"         10  0.52
        "P5_C"         11  0.53
        "P5_C"         12  0.54
        "P6_H"         13  0.2
        "P6_H"         14  0.21
        "P7_T"         15  0.22
        "P7_T"         16  0.23]

julia> reshape(a[:, 3], 4, 4)
4Γ—4 Array{Any,2}:
 0.01   0.015  0.5   0.2
 0.06   0.06   0.52  0.21
 0.22   0.225  0.53  0.22
 0.242  0.245  0.54  0.23
4 Likes

If your data is in the format you show then reshaping after the fact is the way to go. Your title asks for how to export (rather than import) data in arrays form, so here’s a simple example using the DelimitedFiles standard library:

julia> using DelimitedFiles

julia> a = rand(5, 5)
5Γ—5 Matrix{Float64}:
 0.382475  0.371708   0.248178   0.469148  0.978764
 0.537508  0.224802   0.670567   0.303162  0.309597
 0.362615  0.990693   0.793947   0.047375  0.514432
 0.159678  0.0254657  0.0519728  0.601986  0.73002
 0.971006  0.914345   0.427885   0.238975  0.479552

julia> writedlm("matrix.txt", a)

julia> readdlm("matrix.txt")
5Γ—5 Matrix{Float64}:
 0.382475  0.371708   0.248178   0.469148  0.978764
 0.537508  0.224802   0.670567   0.303162  0.309597
 0.362615  0.990693   0.793947   0.047375  0.514432
 0.159678  0.0254657  0.0519728  0.601986  0.73002
 0.971006  0.914345   0.427885   0.238975  0.479552
2 Likes

Hello

This example looks a little like what I posted above, but in this case I would like to know if it is possible to manipulate the data so that the fields in which they do not have data are filled in by one.

I have this data in my database

b	    g	Cbg
CM_c	1	0,01
CM_c	2	0,06
CM_c	3	0,22
CM_c	4	0,24
CM_c	5	0,29
CM_c	6	0,35
CM_c	7	0,39
CM_cf	1	0,003
CM_cf	2	0,02
CM_cf	3	0,07
CM_cf	4	0,09
CM_cf	5	0,14
CM_cf	6	0,20
CM_cf	7	0,25
Eq_c	1	0,20
Eq_cf	1	0,10
Eq_u	1	0,05
Eq_m	1	0,10
Eq_s	1	0,10
Eq_c	2	0,60
Eq_cf	2	0,50
Eq_u	2	0,20
Eq_m	2	0,30
Eq_s	2	0,20
Eq_c	3	0,80
Eq_cf	3	0,70
Eq_u	3	0,50
Eq_m	3	0,50
Eq_s	3	0,40
Eq_c	4	0,90
Eq_cf	4	0,80
Eq_u	4	0,90
Eq_m	4	0,70
Eq_s	4	0,60
Eq_c	5	1,00
Eq_cf	5	1,00
Eq_u	5	1,00
Eq_m	5	1,00
Eq_s	5	1,00

Would it be possible to manipulate the data so that I can read this in Julia?

0.01   0.06  0.22  0.24  0.29  0.35  0.29
0.003  0.2   0.07  0.09  0.14  0.2   0.25
0.2    0.6   0.8   0.9   1.0   1.0   1.0
0.05   0.5   0.7   0.8   1.0   1.0   1.0

Hey @RaquelSantos - I think what you are looking for is the reshape command. I would suggest taking your column named Cbg, converting that to an array (if it is in a dataframe, should be pretty easy to do something like df["Cbg"]), and then put your array into reshape like this: reshape(yourList, (4, 5)).

Example:

mylist = rand(20)
reshape(mylist, (4, 5)

...

4Γ—5 Matrix{Float64}:
 0.0511541  0.767957  0.806415  0.972731  0.262882
 0.795663   0.936699  0.673377  0.879979  0.515648
 0.870645   0.583758  0.509925  0.465228  0.716269
 0.102783   0.368073  0.46795   0.324086  0.0286889

Hope that helps and I am not misunderstanding you!

P.S. Whoops - missed @Satvik 's answer. I think that should work for your purposes!

1 Like

Hi @TheCedarPrince, Thank you for your help
I tried to use the reshape command but a dimension error appears because the database is not complete. As I put in the example above, β€œg” goes from 1 to 7 for types CM_c …, but when it comes to types Eq_c … it goes up to 5.

Hey Raquel,

Anytime! This is an unoptimized solution but is what came to my mind first. Let me know if you have any questions (and anyone who comes here, please don’t hesitate to improve my answer!):

Using your data exactly as shown here (copied and pasted to a file I called discourse_data.txt), here is how I coded it:

using CSV
using DataFrames

df = CSV.File("discourse_data.txt") |> DataFrame
grouped_df = groupby(df, Symbol("    g"))

And then you can easily query your data like this based on g:

grouped_df[(5,)]

7Γ—3 SubDataFrame
 Row β”‚ b           g  Cbg
     β”‚ String  Int64  String
─────┼───────────────────────
   1 β”‚ CM_c        5  0,29
   2 β”‚ CM_cf       5  0,14
   3 β”‚ Eq_c        5  1,00
   4 β”‚ Eq_cf       5  1,00
   5 β”‚ Eq_u        5  1,00
   6 β”‚ Eq_m        5  1,00
   7 β”‚ Eq_s        5  1,00

The reason why I thought not to exactly do what you proposed is that it is not always guaranteed you have the same number of groups based on your b column. Hopefully that helps somewhat!

Your data appear to be somewhat strangely organized:

  • First, you have CM_c values for g= 1,…, 7
  • Next, you have CM_cf values for g= 1,…,7
  • Then you have values for g= 1 for Eq_c, Eq_cf, Eq_u, Eq_m, Eq_s
  • Then you have values for g= 2 for …
  • Etc… until g= 5 for …

So you miss data for g=6, 7 for Eq_c, Eq_cf, Eq_u, Eq_m, Eq_s

Anyway, organization of data is probably out of your control.

I took your data into Excel, removed some spacing around g in the header, and replaced decimal symbol , with decimal symbol .. Then I exported the Excel-file as a CSV file Test.csv.

The following Julia code should generate your matrix (??):

# Packages
julia> using DataFrames, CSV
# Read in *.csv file into DataFrame
julia> df = CSV.File("Test.csv",silencewarnings=true) |> DataFrame
# Find unique b values... not the most elegant way perhaps, but works
julia> b_val = Set(df.b) |> collect |> sort
7-element Vector{String}:
 "CM_c"
 "CM_cf"
 "Eq_c"
 "Eq_cf"
 "Eq_m"
 "Eq_s"
 "Eq_u"

Next, the following code generates the matrix:

# Populate a 7-by-7 matrix with NaN (not-a-number)
C_bg = ones(7,7)*NaN
# Create a row for each value in b_val
for i in 1:length(b_val)
    # find the indices idx in column df.b which has value b_val[i]
    idx = df.b .== b_val[i]
    # pick out the rows of df.g and df.Cbg corresponding to b_val[i]
    g_val = df.g[idx]
    Cbg_val = df.Cbg[idx]
    # Insert the values Cbg_val into matrix Cbg
    C_bg[i,g_val] .= Cbg_val
end

which results in:

julia> C_bg
7Γ—7 Matrix{Float64}:
 0.01  0.06  0.22  0.24  0.29    0.35    0.39
 0.0   0.02  0.07  0.09  0.14    0.2     0.25
 0.2   0.6   0.8   0.9   1.0   NaN     NaN
 0.1   0.5   0.7   0.8   1.0   NaN     NaN
 0.1   0.3   0.5   0.7   1.0   NaN     NaN
 0.1   0.2   0.4   0.6   1.0   NaN     NaN
 0.05  0.2   0.5   0.9   1.0   NaN     NaN

OK – the code is somewhat clumsy in that I use that I know the size of the resulting 7-by-7 matrix. So there is room for some improvement.

It may be possible to do the same thing with reshape. However, use of reshape is complicated by the fact that some data are missing, and that the order of the information varies in the columns.

2 Likes

Hi @BLI , thanks for your help
That’s exactly what I wanted. I tested your example here and it worked, but I have data in my database that looks like this:

#"i" and "a" are of type string
i	       a	       APia
PJ01_INT  PJ01_INT_c	1
PJ01_D	  PJ01_D_c 	    1
PJ01_F	  PJ01_F_c	    1
PJ01_H	  PJ01_H_c	    1
PJ01_J	  PJ01_J_c	    1
PJ01_INT  PJ01_INT_cf	1
PJ01_D	  PJ01_D_cf	    1
PJ01_F	  PJ01_F_cf	    1
PJ01_H	  PJ01_H_cf	    1

The logic is exactly the same, to visualize the data in the form of a matrix as you did in the example like this, but when I try to compile the for the following error message appears: ArgumentError: unable to check bounds for indices of type String

Hm. In your original problem, you had:

b	    g	Cbg
CM_c	1	0,01
CM_c	2	0,06
CM_c	3	0,22

So I made a sorted list of the elements in b and made a matrix with one row for each element in b (say, i), with the column number given by the value of g (say, j). Then I put the number of Cbg into this matrix location (i, j). So, essentially I mapped the strings of the first column into positive integers which defined the row number, and then used the positive integers of the second column as column number in the matrix. Observe that in the second column (g), there is no mapping from string to integers, because the values are already integers.

In your new, and more general case, the first column elements are still strings, but the second column (a) elements are now also strings, and not positive integers. So you would need to map both the elements of the first column (i) into positive integers, and the elements of the second column (a) into positive integers in order to make the positions in the matrix.

Let me know if this makes sense?

So… a little thinking after supper: again, I took your example data into Excel, removed some white space, and stored the result – this time in csv file Test_new.csv:

julia> using DataFrames, CSV
#
julia> df_n = CSV.File("Test_new.csv") |> DataFrame
julia> i_val = Set(df_n.i) |> collect |> sort
5-element Vector{String}:
 "PJ01_D"
 "PJ01_F"
 "PJ01_H"
 "PJ01_INT"
 "PJ01_J"
julia> a_val = Set(df_n.a) |> collect |> sort
9-element Vector{String}:
 "PJ01_D_c"
 "PJ01_D_cf"
 "PJ01_F_c"
 "PJ01_F_cf"
 "PJ01_H_c"
 "PJ01_H_cf"
 "PJ01_INT_c"
 "PJ01_INT_cf"
 "PJ01_J_c"

Then I do:

ni = length(i_val)
na = length(a_val)
#
AP_ia = Matrix(undef,ni,na) # alternative: AP_ia = fill(NaN,ni,na)
#
row_no = collect(1:na)
for j in 1:ni
    idx_i = df_n.i .== i_val[j]
    a_val_j = df_n.a[idx_i]
    APia_j = df_n.APia[idx_i]
    for k in 1:length(a_val_j)
        idx_a = a_val_j[k] .== a_val
        AP_ia[j,row_no[idx_a][1]] = APia_j[k]
    end
end

The result is:

julia> AP_ia
5Γ—9 Matrix{Any}:
   1       1     #undef  #undef  #undef  #undef  #undef  #undef  #undef
 #undef  #undef    1       1     #undef  #undef  #undef  #undef  #undef
 #undef  #undef  #undef  #undef    1       1     #undef  #undef  #undef
 #undef  #undef  #undef  #undef  #undef  #undef    1       1     #undef
 #undef  #undef  #undef  #undef  #undef  #undef  #undef  #undef    1

If you accept that the elements of AP_ia can be floats, then use the alternative initialization of AP_ia, which leads to:

5Γ—9 Matrix{Float64}:
   1.0    1.0  NaN    NaN    NaN    NaN    NaN    NaN    NaN
 NaN    NaN      1.0    1.0  NaN    NaN    NaN    NaN    NaN
 NaN    NaN    NaN    NaN      1.0    1.0  NaN    NaN    NaN
 NaN    NaN    NaN    NaN    NaN    NaN      1.0    1.0  NaN
 NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN      1.0

Two comments:

  1. Check that this is what you want, and that my simple loop works ok.
  2. In this case, your APia values may be integers (all examples in your table are 1). In that case, you may not want to use NaN (not-a-number) for those values that are not defined, since that leads to matrix AP_ia holding float values.

Thank you for the help @BLI :smiley: