Writing data to csv

Hi Guys, I am working with data set and writing my results back to the csv file. my code is

Result = [permutedims(Names); K2Score; ShortestPath; score; cum_weight]
save_file = save_dialog("Save Hierarchy as...")
CSV.write(save_file, Tables.table(Result::AbstractMatrix),  writeheader = false)

But in my csv, second row is not printed completely. Please see the attachment. I am am not able to figure out why this problem is occurring. Please suggest


Instead of this line, create a DataFrame

DataFrame([K2Score; ShortestPath; score; cum_weight], Names)

then try saving the CSV.

1 Like

Hi @pdeffebach Thanks for your response. I tried to implement the change you have suggested. I got following error.

MethodError: no method matching sortslices(::DataFrame; dims=2, by=var"#5#6"())
Closest candidates are:
  sortslices(::AbstractArray; dims, kws...) at multidimensional.jl:1751

 [1] top-level scope at In[7]:16
 [2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

@ashwanimalviya, please read the guidelines here and post again your MWE in proper format. Thank you.

Hi, @rafael.guerra Thanks for the suggetion. I have edited it. I hope it is perfect formate what I understand by the above link.

Sorry but this still isn’t an MWE - save_dialog is not defined. Also, none of the variables in Result are defined.

Can you copy the output you get when you run Peter’s suggestion (without any writing to csv, just what’s printed in the REPL):

julia> DataFrame([K2Score; ShortestPath; score; cum_weight], Names)

Hi @nilshg
Thanks for your reply. I try my best to let you understand the problem.

I am using the following packages in Julia version 1.5.3

import Pkg


I have created some of my own function to make all the calculations as follow.

# Application of functions
DAG, Graph, Names, BS, K2Score  = main(20,true);
PlotNames = convertNames2string(Names);
GRAPH = TikzGraphs.plot(Graph, PlotNames, node_style="draw, rounded corners, fill=blue!10")

K2Score = replaceZeros(K2Score)
distmx = replace_K2S_in_DAG(DAG, K2Score)
root = Find_root(DAG)
ShortestPath = Shortest_Path(Graph, root, distmx)
# Calculation of final result
score = Weight(ShortestPath)
total_weight = sum(score)
cum_weight = score./total_weight
Result = DataFrame([K2Score; ShortestPath; score; cum_weight], permutedims(Names))
# Making heirarchy by arranging the result in ascending order.
xs = Result
Result = sortslices(xs; dims=2, by=x->x[3])
#Exporting the result into .csv file
save_file = save_dialog("Save Hierarchy as...")
CSV.write(save_file, Tables.table(Result::AbstractMatrix),  writeheader = false)

I have used permutedims() to transpose Names .

save_dialog is from Gtk package and has been used to save the final result at a specific place.

Following is the error msg from REPL and it is same for with and without csv.

MethodError: no method matching DataFrame(::Array{Float64,2}, ::Array{String,2})

You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.
Closest candidates are:
  DataFrame(::AbstractArray{T,2} where T) at deprecated.jl:70
  DataFrame(::AbstractArray{T,2} where T, ::AbstractArray{Symbol,1}; makeunique) at C:\Users\tecnico2\.julia\packages\DataFrames\3mEXm\src\dataframe\dataframe.jl:322
  DataFrame(::AbstractArray{T,2} where T, ::AbstractArray{var"#s45",1} where var"#s45"<:AbstractString; makeunique) at C:\Users\tecnico2\.julia\packages\DataFrames\3mEXm\src\dataframe\dataframe.jl:327

 [1] top-level scope at In[16]:9
 [2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

Thanks again for taking your time. I hope this will be more clear. Please write if you need anymore information.


Names is not defined in your code afaict, but that’s where your problem lies. Peter’s suggestion is asking you to do this:

julia> using DataFrames

julia> x = rand(3, 3)
3×3 Matrix{Float64}:
 0.568797  0.879866  0.178767
 0.699596  0.925057  0.318236
 0.673037  0.847415  0.22209

julia> names = ["a", "b", "c"]
3-element Vector{String}:

julia> DataFrame(x, names)
3×3 DataFrame
 Row │ a         b         c        
     │ Float64   Float64   Float64  
   1 │ 0.568797  0.879866  0.178767 
   2 │ 0.699596  0.925057  0.318236 
   3 │ 0.673037  0.847415  0.22209 

As you can see from your error, your Names seems to be a matrix rather than a vector, maybe because you are calling permutedims. Just make sure that Names is a vector and you should be good to go.

1 Like

I just mean make it a dataframe for printing to CSV, not for your whole analysis.

I mean, Tables.table should work, and its not great that it doesn’t. But its easier to diagnose what’s going on when you call DataFrame. And it’s more likely to solve your immediate problem and have it “just work”.

Dear @rafael.guerra and @pdeffebach Thank you very much for your time take for this issue.

@rafael.guerra suggestion worked for until line 9, by using vec() for Names.

K2Score = replaceZeros(K2Score)
distmx = replace_K2S_in_DAG(DAG, K2Score)
root = Find_root(DAG)
ShortestPath = Shortest_Path(Graph, root, distmx)
# Calculation of final result
score = Weight(ShortestPath)
total_weight = sum(score)
cum_weight = score./total_weight
Result = DataFrame([K2Score; ShortestPath; score; cum_weight], vec(Names))
REPL print is 

4 rows × 82 columns (omitted printing of 75 columns)

FC111	FC1141	FC1142	FC113	FC1321	FC214	FC2161
Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	-280.721	-314.661	-278.807	-290.976	-295.954	-262.717	-198.453
2	389.652	1362.99	1048.33	680.628	769.524	1032.24	473.57
3	7.45589	1.10078	3.15526	5.55605	4.97564	3.26031	6.90797
4	0.0128894	0.00190298	0.00545468	0.00960508	0.00860168	0.00563629	0.0119422

I have to sort this output by row 2 in ascending order. So when I am using

# Making heirarchy by arranging the result in ascending order.
xs = Result
Result = sortslices(xs; dims=2, by=x->x[2])

I am getting the following error.

MethodError: no method matching sortslices(::DataFrame; dims=2, by=var"#17#18"())
Closest candidates are:
  sortslices(::AbstractArray; dims, kws...) at multidimensional.jl:1751

 [1] top-level scope at In[30]:12
 [2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

Is sortslices() defined in DataFrames? Can you please suggest what best methods can be for sorting by row in DataFrames?

This is a different problem though, it looks like now you want to sort a DataFrame which you can do by just

sort!(Result, :FC1141)

(assuming that that’s the column you want to sort by).

Hi @nilshg thanks but I want to sort rows by values obtained in row 2. sort!() method works only for sorting by columns I guess.

Ah right, that is not a good usecase for DataFrames, in that case I would sort the matrix before turning it into a DataFrame, but again all that is likely unrelated to your issue with writing a csv.

OP, please read my post earlier. I am suggesting only making a DataFrame for the purposes of writing to CSV, not for your whole analysis. I’m just trying to help you debug your problem with writing to CSV.

1 Like

Hi @pdeffebach I applied your idea of turning it in DataFrames just before writing in csv like below

Result = [permutedims(Names); K2Score; ShortestPath; score; cum_weight]
xs = Result
Result = sortslices(xs; dims=2, by=x->x[3])
Result = DataFrame(Result)
save_file = save_dialog("Save Hierarchy as...")
CSV.write(save_file, Result,  writeheader = false)

Although I am not getting any error but old problem persists.
Any other suggestion?

Hi @nilshg and @pdeffebach I found that there is something wrong with the variable K2Score . I tried to export it separately into csv but all columns are not being printed. I tried to convert it into DataFrame and print but the problem persists. The REPL is printing all columns but while printing in csv is the problem. I can not understand what is going wrong with me. typeof(K2Score) is Array{Float64,2}

How are you writing it to csv? If it’s an Array presumably you are doing something else to it beforehand because

julia> using CSV

julia> CSV.write("test.csv", rand(10, 10))
ERROR: ArgumentError: a 'Matrix{Float64}' is not a table; see `?Tables.table` for ways to treat an AbstractMatrix as a table

Hi @nilshg I figured out the problem. K2Score contains all negative values of Float64. When I changed these values to positive and my problem solved.
I I have got a serious doubt that If there is negative values in our calculation then the problem occur every time?
Is there anyone who has faced a similar problem while writing CSV which contains negative values and all the columns is not printed correctly?

That is a very odd error, and likely what’s going on is slightly different.

If you can show us a little bit of K2Score we will be able to help you better understand the problem.

use three backticks like this 
1 Like

Hi, @pdeffebach I guess you would like to see the code of K2Score function.

function k2(DataObj,Order,u)

    # implementation of the K2 algoithm.
    # input is the DataObj which is the julia version of a Object. Contains all the information of the dataset
    # Order is the given topological sort for network
    # u is the max number of parents for a given node

    # copying the object to a new variable

    LG = DataObj;

    #number of variables in the dataset
    Dim = LG.VarNumber;

    # initial graph structure. We assume fully unconnected graph
    DAG = zeros(Dim,Dim);

    # initialize the K2Score this is dependant on the G-function and provides the search strategy for
    # finding the network structure
    K2Score = zeros(1,Dim);

    for p = 2:Dim

        # initialize a parent vector for a given node. The for loop means we are going to try a
        # a different number of parents for the node

        parent = zeros(Dim,1);

        # Ok is a helper function so that when we do not find a better score we break out of the loop and move on

        Ok = 1;
        P_old = -Inf; #the initiate state

        # sum of parents must be less than or equal to u since u is our max number of parents for a given node

        while Ok == 1 && sum(parent) <= u

            #initial local max
            LocalMax = -Inf;

            # our initial node
            LocalNode = 0;

            # q is going to be a node from the topological sort to adjust
            for q = (p-1):(-1):1

                if parent[Order[q]] == 0

                    # Don't forget that Order is our topological sort so when we index order we are
                    # selecting a node. Now by indexing parent, we are seeing if there is a parent for this node.

                    # greedily add a parent to this node
                    parent[Order[q]] = 1;

                    # finder finds (shocker haha) the indicies of parent which correspond to the children

                    temp_parent = reshape(parent,length(parent),)

                    finder = reshape(findall(x -> x==1,temp_parent),1,length(findall(x -> x==1,temp_parent)));

                    # calculate the G-function on this neighborhood operation to see if it improves our score
                    LocalScore = GFunction(DataObj, Order[p],finder);

                    # conditional statement to say which neighborhood option to choose
                    if LocalScore > LocalMax
                        LocalMax = LocalScore;
                        LocalNode = Order[q]
                    parent[Order[q]] = 0;

            # update our state to the new value
            P_new = LocalMax;
            if P_new > P_old
                P_old = P_new;

                # if the score is better then update our parent to now have the child
                parent[LocalNode] = 1;

                # if it is not then break out of the loop and move on
                Ok = 0;
        K2Score[Order[p]] = P_old;
        DAG[:,Order[p]] = parent;

    # how you return values in julia
    DAG ,K2Score

Please let me know if you need anything more.