Selecting best DAG from the family of DAGs based on AIC score

Thank you @Jeff Erlich for your suggestion to post my question on discourse. I would like to say to the Julia community here, I have posted a question(In the link) and have one more block regarding this https://julialang.slack.com/archives/C6821M4KE/p1623076255061000 issue.

I have a family of DAG saved in the path. This time I have 3 of them named Graph_…lg (In Screenshot)


. I wrote some code to calculate AIC for each of the graphs as

 # Upload the data.
inputFile = open_dialog("Select data file from Input data for BN")
NameOfData = basename(inputFile)
data = CSV.read(inputFile,DataFrame);
# Upload the graph obtained from K2
inputGraph = open_dialog("Select Graph from Auxiliry file")
NameOfGraph = basename(inputGraph)
Graph = loadgraph(NameOfGraph);

bn5 = fit(DiscreteBayesNet, data, (Graph));
 LogL = logpdf(bn5, data);
M, N = size(data)
k = N;
AIC = 2k-2LogL

(I wanted to attach some of my graph and test data but I don’t know how to attach files in discourse. If someone interested try please feel free to ask.). Since it is a manual process, the user has to select graphs each time to see their AIC score. I want to loop the process So that graph having the best AIC stay and the rest to be deleted automatically. Is it possible to in Julia or am I being unnecessarily too much ambitious? Please suggest me a way forward. In a later case, what can I do at least to identify the best graph from the family of graphs automatically based on their AIC calculated? Thank you very much in advance for taking the time to write.
Best Regards - Ashwani

I want to loop the process So that graph having the best AIC stay and the rest to be deleted automatically.

Yes, this sounds like a very straight forward thing to do, perhaps to the point where I don’t understand what the issue is as it seems like it would be easier than what you are doing above.

Wouldn’t a loop like this be sufficient:

bestscore = 0
for file in readdir(dir_with_the_graphs)
    graph = loadgraph(file)
    score = fit_and_calc_score(graph)
    if score > bestscore
          bestscore = score
    else
          rm(file) # Deletes file on disk. Make sure you don't do this until you are sure the stuff above works unless you can easily recreate the graphs
    end
end

It sounds like you are doing some form of hyperparameter search or even neural architecture search. There are many methods and a couple of packages to do this.

I’m not sure what the general opinion about it is, but I think it is considered a bit of a rabbit hole where you’ll easily get stuck tuning and reasoning about hyper-hyper parameters for marginal benefits.

1 Like

Dear @DrChainsaw Thanks for your reply. It seems like a solution but I am getting error. I modified your code like

bestscore = 0
for file in readdir("C:\\Users\\tecnico2\\Desktop\\AHP+BN\\Final Hierarchy UC-IV\\2 BN\\Auxilary Functions")
    graph = loadgraph(file)
    score = fit_and_calc_score(graph)
    if score > bestscore
          bestscore = score
    else
          rm(file) # Deletes file on disk. Make sure you don't do this until you are sure the stuff above works unless you can easily recreate the graphs
    end
end

But I got SystemError

SystemError: opening file ".ipynb_checkpoints": Permission denied

Stacktrace:
 [1] systemerror(::String, ::Int32; extrainfo::Nothing) at .\error.jl:168
 [2] #systemerror#48 at .\error.jl:167 [inlined]
 [3] systemerror at .\error.jl:167 [inlined]
 [4] open(::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing) at .\iostream.jl:284
 [5] open(::String, ::String; lock::Bool) at .\iostream.jl:346
 [6] open(::String, ::String) at .\iostream.jl:346
 [7] open(::LightGraphs.var"#117#118"{String,LGFormat}, ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\io.jl:323
 [8] open at .\io.jl:323 [inlined]
 [9] loadgraph at C:\Users\tecnico2\.julia\packages\LightGraphs\IgJif\src\persistence\common.jl:14 [inlined]
 [10] loadgraph at C:\Users\tecnico2\.julia\packages\LightGraphs\IgJif\src\persistence\common.jl:18 [inlined]
 [11] top-level scope at .\In[18]:3
 [12] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

I would like to say that I do not have admin access of the system I am working on. So, it might be the reason for this error? Or it is trying to open first file in the directory?

The code I posted was just a gist and not a full solution. As a general rule, never run code which has permanent effects (such as deleting files) without making sure it does what you think it does. In this case you might have lucked out as trying to read a directory with notebook checkpoints (.ipynb_checkpoints) as a graph caused the program to crash, or else you might have wiped the whole directory.

If you can read the graphs from within Julia with your original script the access rights should not be a problem. Just make sure that whatever enters the loop actually is a graph file and not something else.

I would recommend that you put the graph files in a separate dir and only work with that.

The problem is when I am using to save the obtained graphs by every iterations using savegraph() function, It always saving in the current directory. I tried to ude the method suggested at Reading / Writing Graphs · LightGraphs by using the function savegraph(file, g, d, format=LGFormat) like savegraph("C:\\Users\\tecnico2\\Desktop\\Ready\\Final Hierarchy UC-IV\\BN Result\\Graphs", "Graph_$BS.lg", Graph); , but I always gets a method error.

MethodError: no method matching savegraph(::String, ::String, ::SimpleDiGraph{Int64})
Closest candidates are:
  savegraph(::AbstractString, ::LightGraphs.SimpleGraphs.AbstractSimpleGraph; compress) at C:\Users\tecnico2\.julia\packages\LightGraphs\IgJif\src\persistence\common.jl:95
  savegraph(::AbstractString, ::AbstractMetaGraph) at C:\Users\tecnico2\.julia\packages\MetaGraphs\NpVqv\src\persistence.jl:17
  savegraph(::AbstractString, ::AbstractGraph, ::AbstractString, ::LightGraphs.AbstractGraphFormat; compress) at C:\Users\tecnico2\.julia\packages\LightGraphs\IgJif\src\persistence\common.jl:70
  ...

Stacktrace:
 [1] top-level scope at In[12]:2
 [2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

I am still having SystemError: opening file "Graph1.lg": No such file or directory . I am almost there. I have created directory exclusive for graphs, the folder is being open but not the file. I guess. Any suggestion? @DrChainsaw @pdeffebach @cormullion @nilshg

You should add some @show or @info lines so you can see what the values of each thing are as you run them.

Your problems (at this point) are pretty basic IO / looping stuff, unrelated to the stats question of how to evaluate graphs.

Deleting the graphs as you go is very strange. Instead you should build a DataFrame (which you can then save as csv with the name of the graph and the AIC of that graph (and the BIC, cross-validated likelihood, etc). That will allow you to compare the graphs.

1 Like