PureFun.jl: implementations from Okasaki's *Purely Functional Data Structures*

Hello, I’ve been implementing the data structures presented in Chris Okasaki’s book Purely Functional Data Structures in Julia, and wanted to share what I’ve got with you: PureFun.jl. It’s not quite every structure presented in the book, but quite a few of them. There are also some rudimentary benchmarks here. Beyond being a (hopefully) fun introduction to to a (definitely) fun book, there’s a couple of features I want to highlight:

  • “Chunky lists”, which take a normal list type plus a “chunk” type that stores data contiguously, and use them to assemble a “chunky” list that has the same interface as other lists but internally stores data in cache-friendly contiguous chunks. There are chunk types available backed by both StaticArrays.SVector and Base.Vector, the list benchmarks show some of the resulting performance improvements.

  • Growing support for multi-threading via the juliafolds packages, here are a couple of examples

  • Composable dictionary types: In the same way that chunky lists build general purpose lists from small fast lists, the @trie macro builds dictionaries over complex keys from component dictionaries defined over simple keys. PureFun.jl defines small and fast dictionary types (in the Contiguous module), allowing you to build up dictionary types with customized performance characteristics. This tutorial walks through a handful of examples in the process of assembling a hash array mapped trie

Anyways, hope you’ll check it out and find it as fun as I have. I’m new to the community, and relatively new to Julia (we mainly use R and Python in my day job), and undertook this project as a learning exercise. Very happy for feedback or tips. Thank you.

42 Likes

I love the performance graphs! The results are more “textbook” than I would have expected—I would have expected noisier results but you get nice flat lines for constant time operations and mostly linear slopes for linear operations with some cache-size-related artifacts at the small sizes. They clearly demonstrate what operations standard vectors excel at (fast indexing), and what operations these purely functional data structures are better at (head insertion, etc.). Nice work!

10 Likes

Also, on the workload of the case of the linked list and random access lists, it might be worth mentioning that Okasaki’s random access linked list has easy support for a very interesting kind of memoization trick, for the case where you memoize a fold from the bottom of the stack to the top.

If the fold memoizes the nodes of the linked list portion of the list (i.e. not the btree nodes) you need only ~log_2(N) space for the memoization (assuming old versions of the list get GCed and that your memoizer holds weak references), but the memoization still makes append/drop-and-recompute-fold an O(N) operation instead of O(N^2) for appending N elements and then dropping N elements.

It isn’t perfect because any log(N) space approach to memoize a stack fold will only have amortized O(1) instead of O(1) complexity to recompute the reduce, which can be nonideal for cases where you are using the persistent stack to do backtracking, but it can still be very useful in many applications. Finger trees also support a deque version of this where you memoize reduces for the spine nodes, which is incredibly useful for window functions.

2 Likes

If you do not mind. Could you please advise me how to install this Package?

julia> 
(PureFun) pkg> add PureFun
[ Info: Use `./PureFun` to add or develop the local directory at `~/juliascript/PureFun`.
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
ERROR: The following package names could not be resolved:
 * PureFun (not found in project, manifest or registry)
   Suggestions: PressureFieldContact

(PureFun) pkg> 

@StevenSiew I haven’t registered the package yet, so currently you have to use the full github url to add it:

julia>
(test-purefun) pkg> add https://github.com/tarakc02/PureFun.jl

Thank you , done that but now I got another problem

# add https://github.com/tarakc02/PureFun.jl
using PureFun
using PureFun.Lazy: Stream, @stream

println("Start of program")

positive_integers = Stream(Iterators.countfrom(1))

println("length of positive_integers is ",length(positive_integers))

with the result

Starting Julia...
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.5 (2023-01-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
 project at `~/juliascript/PureFun`
[ Info: Precompiling PureFun [5c24d6e1-cb81-4e79-8836-67b5b43306d6]
ERROR: LoadError: UndefVarError: childrentype not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:31
 [2] top-level scope
   @ ~/.julia/packages/PureFun/ZzlJA/src/lists-streams/skew-binary-ral.jl:303
 [3] include(mod::Module, _path::String)
   @ Base ./Base.jl:419
 [4] include(x::String)
   @ PureFun ~/.julia/packages/PureFun/ZzlJA/src/PureFun.jl:1
 [5] top-level scope
   @ ~/.julia/packages/PureFun/ZzlJA/src/PureFun.jl:7
 [6] include
   @ ./Base.jl:419 [inlined]
 [7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
   @ Base ./loading.jl:1554
 [8] top-level scope
   @ stdin:1
in expression starting at /Users/ssiew/.julia/packages/PureFun/ZzlJA/src/lists-streams/skew-binary-ral.jl:1
in expression starting at /Users/ssiew/.julia/packages/PureFun/ZzlJA/src/PureFun.jl:1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile PureFun [5c24d6e1-cb81-4e79-8836-67b5b43306d6] to /Users/ssiew/.julia/compiled/v1.8/PureFun/jl_Zqc85A.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
    @ Base ./loading.jl:1707
  [3] compilecache
    @ ./loading.jl:1651 [inlined]
  [4] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1337
  [5] _require_prelocked(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1200
  [6] macro expansion
    @ ./loading.jl:1180 [inlined]
  [7] macro expansion
    @ ./lock.jl:223 [inlined]
  [8] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1144
  [9] eval
    @ ./boot.jl:368 [inlined]
 [10] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1428
in expression starting at /Users/ssiew/juliascript/PureFun/lazy01.jl:3
julia> 

I wanted to get this working so that I can build my Hilbert Hotel

Hmmn, not sure why you’re getting a precompile error. What OS are you using? I tried installing the package from github in a fresh environment on MacOS and Ubuntu, and was able to use it without seeing that error. childrentype is exported by the AbstractTrees package, which is a listed dependency of PureFun.jl . . … If you add AbstractTrees beforehand does that fix it? All of that said, once you do get PureFun compiled, your script will result in an infiinite loop and won’t ever return. I should probably fix that to throw an error instead.

julia> 
(PureFun) pkg> st
Status `~/juliascript/PureFun/Project.toml`
  [5c24d6e1] PureFun v0.0.1 `https://github.com/tarakc02/PureFun.jl#main`

(PureFun) pkg> add AbstractTrees
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
    Updating `~/juliascript/PureFun/Project.toml`
  [1520ce14] + AbstractTrees v0.4.4
  No Changes to `~/juliascript/PureFun/Manifest.toml`

Please try this

Julia has exited.
Press Enter to start a new session.
Starting Julia...
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _  |  |
  | | |_| | | | (_| |  |  Version 1.8.5 (2023-01-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
 project at `~/juliascript/PureFun`
julia> 
julia> using AbstractTrees

julia> AbstractTrees.childrentype(::Type{<:Digit{T}}) where T = Tuple{ Tree{T},Tree{T} }
ERROR: UndefVarError: childrentype not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:31
 [2] top-level scope
   @ none:1


julia> import AbstractTrees: childrentype
WARNING: could not import AbstractTrees.childrentype into Main

julia> using AbstractTrees: childrentype
ERROR: UndefVarError: childrentype not defined

It appears at

$ cat /Users/ssiew/.julia/packages/PureFun/ZzlJA/src/lists-streams/skew-binary-ral.jl | grep childrentype
AbstractTrees.childrentype(::Type{<:Digit{T}}) where T = Tuple{ Tree{T},Tree{T} }
AbstractTrees.childrentype(::Type{<:Node{T}}) where T = Tuple{ Tree{T},Tree{T} }
AbstractTrees.childrentype(::Type{<:Leaf}) = Tuple{}