Dictionary of dictionaries as decision tree

Hi!

I am trying to do a decision tree using my implementation of ID3 but I am having some problems with one of my functions.

For example, I have the following data frame:

julia> df
14×6 DataFrame
 Row │ pa       as       ic       aa       oa       af      
     │ String7  String7  String7  String3  String3  String3 
─────┼──────────────────────────────────────────────────────
   1 │ alta     alto     alto     no       no       si
   2 │ alta     alto     alto     si       no       si
   3 │ baja     alto     bajo     no       no       si
   4 │ media    alto     alto     no       si       no
   5 │ media    bajo     alto     si       si       no
   6 │ baja     bajo     alto     si       si       si
   7 │ alta     bajo     alto     si       no       si
   8 │ alta     bajo     bajo     no       si       si
   9 │ alta     alto     bajo     si       si       no
  10 │ baja     bajo     alto     si       si       si
  11 │ media    bajo     bajo     si       si       si
  12 │ alta     bajo     alto     si       si       no
  13 │ baja     alto     alto     si       si       si
  14 │ baja     alto     bajo     no       no       si

The class is af. From within my code I wrote a function that “creates” a tree, although it is not done properly, but it gets the job done for my purposes.

First, I load the database file:

julia> df = dt.read_database("data/administar_farmaco.csv"; dropcols=[:n])

Then I can just generate the tree as:

julia> tree = dt.tree("af", df)
4-element Vector{Any}:
 "pa"
 Any[InlineStrings.String7("alta"), "oa", Any[InlineStrings.String3("no"), InlineStrings.String3["si"]], Any[InlineStrings.String3("si"), "aa", Any[InlineStrings.String3("no"), InlineStrings.String3["si"]], Any[InlineStrings.String3("si"), InlineStrings.String3["no"]]]]
 Any[InlineStrings.String7("baja"), InlineStrings.String3["si"]]
 Any[InlineStrings.String7("media"), "ic", Any[InlineStrings.String7("alto"), InlineStrings.String3["no"]], Any[InlineStrings.String7("bajo"), InlineStrings.String3["si"]]]

I was able to write a function that takes that output, tree, and parses it:

julia> dt.preetyprint(tree)
pa
alta
	oa
	no	si
	si
		aa
		no	si
		si	no
baja	si
media
	ic
	alto	no
	bajo	si

Which can be more clearly understood if I just format it a little it by hand as:

pa
 |--- alta
       |--- oa
            |--- no --- si
            |--- si
                 |--- aa
                      |--- no --- si
                      |--- si --- no
 |--- baja --- si
 |--- media
      | --- ic
             |--- alto --- no
             |--- bajo --- si

It’s not pretty haha but it gets the job done to allow me to visualize the decision tree.

The problem is that I wanted to create a dictionary in the first place where each entry was another dictionary.

I found something similar to what I want to achieve here in this kaggle notebook written in Python.

The owner of that notebook achieved to have each key of each dictionary as a node of the tree I showed above. The image attached below shows what I mean.

image

Here is my code attached: DecisionTrees.jl (5.4 KB)

Currently, I have a function that attempts to create a dictionary of dictionaries but… Well, I couldn’t figure it out. I called that function tree_dict for lack of a better name.

Can anyone point me in the right direction on how could I implement it with my current code?

Just skimming at your code, it seems that preetyprint has already the structure you could use. Except, that it prints instead of creating a nested data structure.
Just start from there: Adapt the base case to return values you want at the leaves and collect the results of the recursive calls in the for-loop into a dictionary (would use a comprehension for that).

try this

tree = [
 "pa",
 Any["alta", "oa", Any["no", ["si"]], Any["si", "aa", Any["no", ["si"]], Any["si", ["no"]]]],
 Any["baja", ["si"]],
 Any["media", "ic", Any["alto", ["no"]], Any["bajo", ["si"]]]]


recdic(tree)=Dict(tree[1]=>Dict([first(vv)=>(last(vv) isa Array{String} ? only(last(vv)) : recdic(vv[2:end])) for vv in tree[2:end]]))


1 Like
julia> walk3(tree, lev=0) = join(['\n'*"     "^lev*"|.."*tree[1],join(['\n'*"     "^(lev+1)*"|.."*first(vv)*(last(vv) isa Array{String} ? "---"*only(last(vv)) : walk3(vv[2:end],lev+2)) for vv in tree[2:end]])])
walk3 (generic function with 2 methods)

julia> println(walk3(tree))

|..pa
     |..alta
          |..oa
               |..no---si
               |..si
                    |..aa
                         |..no---si
                         |..si---no
     |..baja---si
     |..media
          |..ic
               |..alto---no
               |..bajo---si
1 Like