Reading/Writing struct instance to file as binary

Hi everyone, I am looking into possible solutions for persisting struct instances on disk as binary data. I’m using credentials data as an MVP example.

The following code works as I expect. I create a struct with some fields and instantiate a couple of instances of the struct with some faux content. I write the struct instances to disk wrapped in a Ref. I can then subsequently open the file and each iteration of read!(<IOStream>, Ref(blueprint)) yields a unique entry where blueprint is just a form of the Struct that matches the size/structure of each entry.

struct SimpleCred
    username::String
    password::String
end

creds = [
    SimpleCred("someusername", "somepassword"), 
    SimpleCred("someusername", "adiffpassword")]

open("user1.bin", "w") do io
    for cred in creds
        write(io, Ref(cred))
    end
end

blueprint = SimpleCred("", "")
io = open("user1.bin", "r")

@show cred = read!(io, Ref(blueprint))[]
#> cred = (read!(io, Ref(blueprint)))[] = SimpleCred("someusername", "somepassword")
@show cred = read!(io, Ref(blueprint))[]
#> cred = (read!(io, Ref(blueprint)))[] = SimpleCred("someusername", "adiffpassword")

When I try to play the same game with a more complicated struct, problems ensue. The thing which is confusing me is that the first read returns a properly parsed entry from the bin file, but subsequent reads return what appears to be uncollected garbage. I am not sure if the issue is related to the Ref pointer or if there is maybe some sort of dynamic size component to some of the types I’m using in the more complex struct which is throwing off the bit unpacking? Any help would be greatly appreciated!

For the following trouble example, I included a couple of the content generator functions I’m using in case that helps shed any light on what I might be doing wrong.

using Dates

# Type Aliases
Source = String
Word = String
Words = Array{String, 1}

# Objects
struct TwoFAWords
    source::Source
    words::Words
end

struct UserCredential
    source::Source
    username::Word
    password::Word
    lastupdated::DateTime
    active::Bool
    backupwords::TwoFAWords
end

# Content Generating Helpers
const alphanumerics = reduce(
    (acc, iter) -> acc = vcat(acc, collect(iter)
        ), ['a':'z', 'A':'Z', '0':'9'], init=[]);

const words = readlines(joinpath("/usr/share/dict", "words"));

generaterandomstr(;length=2^6) = [ rand(alphanumerics) for _ in 1:length ] |> join;
generaterandomwords(;count=1) = map(lowercase, [ rand(words) for _ in 1:count ]);
genrandbool() = (-1)^rand(0:1) < 0 ? true : false

function generateusercredential()
    source = generaterandomstr(length=2^4)
    username = generaterandomstr(length=2^3)
    password = generaterandomstr(length=2^5)
    lastupdated = Dates.now()
    active = genrandbool()
    words = TwoFAWords(source, generaterandomwords(count=10))
    
    UserCredential(source, username, password, lastupdated, active, words)
end

# Create some faux-credential objects
credentials = [ generateusercredential() for _ in 1:5 ]

# Write Credential objects to `user.bin`
open("user.bin", "w") do io
    for credential in credentials
        write(io, Ref(credential))
    end
end

blueprint = UserCredential("", "", "", Dates.now(), true, TwoFAWords("", String[]));
io = open("user.bin", "r")

@show credential = read!(io, Ref(blueprint))[]
#> credential = (read!(io, Ref(blueprint)))[] = UserCredential("gbLzNXMfmDwZEKo0", "93gAQQ4g", "eF6K2bpABIM3HSBrnmc5f5CO8UIDPBuV", DateTime("2021-06-29T22:46:28.412"), true, TwoFAWords("gbLzNXMfmDwZEKo0", ["unsymbolical", "unprejudicialness", "semiform", "untirability", "pyrotic", "niota", "carbonification", "atophan", "nunatak", "numud"]))

@show credential = read!(io, Ref(blueprint))[]
#> credential = (read!(io, Ref(blueprint)))[] = UserCredential((Core.MethodMatch(Tuple{typeof(isequal), Vector{String}, Missing}, svec(), isequal(::Any, ::Missing) in Base at missing.jl:82, false), 2), (Core.Const(nothing), MethodInstance for print(::IOContext{IOBuffer}, ::String)), svec(Union{Nothing, NamedTuple{names, T} where T<:Tuple}, Any, #undef, Union{}, Any, #undef), DateTime("2021-06-29T22:46:28.412"), true, TwoFAWords((Core.MethodMatch(Tuple{typeof(isequal), Vector{String}, Missing}, svec(), isequal(::Any, ::Missing) in Base at missing.jl:82, false), 2), Any[Core.Compiler.VarState(Core.Const(Base.ht_keyindex), false), Core.Compiler.VarState(Core.Const(Dict{Union{Int64, Symbol}, String}(56 => "\e[38;5;56m", 35 => "\e[38;5;35m", 60 => "\e[38;5;60m", 220 => "\e[38;5;220m", :blink => "\e[5m", 67 => "\e[38;5;67m", 215 => "\e[38;5;215m", 73 => "\e[38;5;73m", 251 => "\e[38;5;251m", 115 => "\e[38;5;115m", 112 => "\e[38;5;112m", 185 => "\e[38;5;185m", 86 => "\e[38;5;86m", 168 => "\e[38;5;168m", 207 => "\e[38;5;207m", 242 => "\e[38;5;242m", 183 => "\e[38;5;183m", 224 => "\e[38;5;224m", 177 => "\e[38;5;177m", 12 => "\e[38;5;12m", 75 => "\e[38;5;75m", 23 => "\e[38;5;23m", 111 => "\e[38;5;111m", 41 => "\e[38;5;41m", 68 => "\e[38;5;68m", 82 => "\e[38;5;82m", 130 => "\e[38;5;130m", 125 => "\e[38;5;125m", 77 => "\e[38;5;77m", 172 => "\e[38;5;172m", 71 => "\e[38;5;71m", 66 => "\e[38;5;66m", 103 => "\e[38;5;103m", 59 => "\e[38;5;59m", 208 => "\e[38;5;208m", 26 => "\e[38;5;26m", 211 => "\e[38;5;211m", 127 => "\e[38;5;127m", 116 => "\e[38;5;116m", 100 => "\e[38;5;100m", 79 => "\e[38;5;79m", 230 => "\e[38;5;230m", 195 => "\e[38;5;195m", :white => "\e[37m", :light_cyan => "\e[96m", 141 => "\e[38;5;141m", 135 => "\e[38;5;135m", 138 => "\e[38;5;138m", 222 => "\e[38;5;222m", 107 => "\e[38;5;107m", 46 => "\e[38;5;46m", 57 => "\e[38;5;57m", 152 => "\e[38;5;152m", 247 => "\e[38;5;247m", 170 => "\e[38;5;170m", 129 => "\e[38;5;129m", 238 => "\e[38;5;238m", 250 => "\e[38;5;250m", 78 => "\e[38;5;78m", 133 => "\e[38;5;133m", 72 => "\e[38;5;72m", 184 => "\e[38;5;184m", 252 => "\e[38;5;252m", 1 => "\e[38;5;1m", 137 => "\e[38;5;137m", 22 => "\e[38;5;22m", 154 => "\e[38;5;154m", 237 => "\e[38;5;237m", 206 => "\e[38;5;206m", :light_red => "\e[91m", 33 => "\e[38;5;33m", 40 => "\e[38;5;40m", 113 => "\e[38;5;113m", 231 => "\e[38;5;231m", 245 => "\e[38;5;245m", 254 => "\e[38;5;254m", 165 => "\e[38;5;165m", 142 => "\e[38;5;142m", 5 => "\e[38;5;5m", 55 => "\e[38;5;55m", 114 => "\e[38;5;114m", :blue => "\e[34m", 136 => "\e[38;5;136m", 117 => "\e[38;5;117m", 45 => "\e[38;5;45m", 145 => "\e[38;5;145m", :cyan => "\e[36m", :magenta => "\e[35m", :black => "\e[30m", 158 => "\e[38;5;158m", 218 => "\e[38;5;218m", 176 => "\e[38;5;176m", 28 => "\e[38;5;28m", 148 => "\e[38;5;148m", 92 => "\e[38;5;92m", 36 => "\e[38;5;36m", :light_magenta => "\e[95m", 118 => "\e[38;5;118m", 162 => "\e[38;5;162m", 84 => "\e[38;5;84m", 7 => "\e[38;5;7m", 25 => "\e[38;5;25m", 95 => "\e[38;5;95m", 203 => "\e[38;5;203m", 232 => "\e[38;5;232m", 93 => "\e[38;5;93m", 18 => "\e[38;5;18m", 240 => "\e[38;5;240m", 147 => "\e[38;5;147m", 157 => "\e[38;5;157m", :default => "\e[39m", 16 => "\e[38;5;16m", 19 => "\e[38;5;19m", 44 => "\e[38;5;44m", 31 => "\e[38;5;31m", 217 => "\e[38;5;217m", 146 => "\e[38;5;146m", 74 => "\e[38;5;74m", :light_yellow => "\e[93m", 61 => "\e[38;5;61m", 29 => "\e[38;5;29m", 212 => "\e[38;5;212m", 228 => "\e[38;5;228m", 159 => "\e[38;5;159m", 193 => "\e[38;5;193m", 226 => "\e[38;5;226m", 101 => "\e[38;5;101m", 105 => "\e[38;5;105m", 223 => "\e[38;5;223m", 17 => "\e[38;5;17m", 166 => "\e[38;5;166m", 89 => "\e[38;5;89m", 198 => "\e[38;5;198m", 214 => "\e[38;5;214m", 80 => "\e[38;5;80m", 51 => "\e[38;5;51m", :nothing => "", 246 => "\e[38;5;246m", 143 => "\e[38;5;143m", 48 => "\e[38;5;48m", 15 => "\e[38;5;15m", 97 => "\e[38;5;97m", 134 => "\e[38;5;134m", 110 => "\e[38;5;110m", 30 => "\e[38;5;30m", :hidden => "\e[8m", 6 => "\e[38;5;6m", 234 => "\e[38;5;234m", 219 => "\e[38;5;219m", 182 => "\e[38;5;182m", 164 => "\e[38;5;164m", 153 => "\e[38;5;153m", 186 => "\e[38;5;186m", 253 => "\e[38;5;253m", 64 => "\e[38;5;64m", 90 => "\e[38;5;90m", 139 => "\e[38;5;139m", 4 => "\e[38;5;4m", 13 => "\e[38;5;13m", :red => "\e[31m", 104 => "\e[38;5;104m", 52 => "\e[38;5;52m", 179 => "\e[38;5;179m", 43 => "\e[38;5;43m", 11 => "\e[38;5;11m", 69 => "\e[38;5;69m", 171 => "\e[38;5;171m", 85 => "\e[38;5;85m", 119 => "\e[38;5;119m", 39 => "\e[38;5;39m", 216 => "\e[38;5;216m", 126 => "\e[38;5;126m", 108 => "\e[38;5;108m", 156 => "\e[38;5;156m", 2 => "\e[38;5;2m", 10 => "\e[38;5;10m", 27 => "\e[38;5;27m", 124 => "\e[38;5;124m", 144 => "\e[38;5;144m", 200 => "\e[38;5;200m", 20 => "\e[38;5;20m", 81 => "\e[38;5;81m", 187 => "\e[38;5;187m", 0 => "\e[38;5;0m", 213 => "\e[38;5;213m", 9 => "\e[38;5;9m", 189 => "\e[38;5;189m", 227 => "\e[38;5;227m", 109 => "\e[38;5;109m", 161 => "\e[38;5;161m", 249 => "\e[38;5;249m", 241 => "\e[38;5;241m", 88 => "\e[38;5;88m", 209 => "\e[38;5;209m", 236 => "\e[38;5;236m", 120 => "\e[38;5;120m", 24 => "\e[38;5;24m", 8 => "\e[38;5;8m", 37 => "\e[38;5;37m", 83 => "\e[38;5;83m", 190 => "\e[38;5;190m", 201 => "\e[38;5;201m", 99 => "\e[38;5;99m", 121 => "\e[38;5;121m", :light_black => "\e[90m", 14 => "\e[38;5;14m", 174 => "\e[38;5;174m", 123 => "\e[38;5;123m", 32 => "\e[38;5;32m", 197 => "\e[38;5;197m", 233 => "\e[38;5;233m", 196 => "\e[38;5;196m", :light_blue => "\e[94m", 210 => "\e[38;5;210m", 151 => "\e[38;5;151m", 239 => "\e[38;5;239m", :normal => "\e[0m", 54 => "\e[38;5;54m", 63 => "\e[38;5;63m", 191 => "\e[38;5;191m", 91 => "\e[38;5;91m", 62 => "\e[38;5;62m", 205 => "\e[38;5;205m", 244 => "\e[38;5;244m", :light_green => "\e[92m", 150 => "\e[38;5;150m", 122 => "\e[38;5;122m", 58 => "\e[38;5;58m", 199 => "\e[38;5;199m", :green => "\e[32m", 173 => "\e[38;5;173m", 188 => "\e[38;5;188m", 98 => "\e[38;5;98m", 235 => "\e[38;5;235m", 204 => "\e[38;5;204m", 76 => "\e[38;5;76m", 34 => "\e[38;5;34m", 50 => "\e[38;5;50m", 243 => "\e[38;5;243m", 194 => "\e[38;5;194m", 167 => "\e[38;5;167m", 42 => "\e[38;5;42m", 87 => "\e[38;5;87m", 132 => "\e[38;5;132m", 140 => "\e[38;5;140m", 202 => "\e[38;5;202m", 248 => "\e[38;5;248m", 169 => "\e[38;5;169m", 180 => "\e[38;5;180m", 255 => "\e[38;5;255m", 160 => "\e[38;5;160m", 49 => "\e[38;5;49m", 106 => "\e[38;5;106m", :bold => "\e[1m", 94 => "\e[38;5;94m", 225 => "\e[38;5;225m", 102 => "\e[38;5;102m", 128 => "\e[38;5;128m", 70 => "\e[38;5;70m", 21 => "\e[38;5;21m", 229 => "\e[38;5;229m", 38 => "\e[38;5;38m", 163 => "\e[38;5;163m", 131 => "\e[38;5;131m", 192 => "\e[38;5;192m", 221 => "\e[38;5;221m", 53 => "\e[38;5;53m", 47 => "\e[38;5;47m", 175 => "\e[38;5;175m", 3 => "\e[38;5;3m", 178 => "\e[38;5;178m", 96 => "\e[38;5;96m", 149 => "\e[38;5;149m", 155 => "\e[38;5;155m", :yellow => "\e[33m", 181 => "\e[38;5;181m", 65 => "\e[38;5;65m", :reverse => "\e[7m", :underline => "\e[4m")), false), Core.Compiler.VarState(Core.Const(:bold), false), Core.Compiler.VarState(Union{}, true), Core.Compiler.VarState(Union{}, true), Core.Compiler.VarState(Union{}, true), Core.Compiler.VarState(Union{}, true), Core.Compiler.VarState(Core.Const(0), false), Core.Compiler.VarState(Int64, false), Core.Compiler.VarState(Union{}, true)]))

@show credential = read!(io, Ref(blueprint))[]
#> credential = (read!(io, Ref(blueprint)))[] = UserCredential(Core.LineInfoNode(Base, :-, Symbol("int.jl"), 86, 0), Core.Compiler.BitSet(UInt64[], -1152921504606846976), svec(Union{}, LinearAlgebra.AbstractRotation, #undef, Union{}, Any, #undef), DateTime("2021-06-29T22:46:28.412"), true, TwoFAWords(Core.LineInfoNode(Base, :-, Symbol("int.jl"), 86, 0), Core.Compiler.ResolvedInliningSpec(291 1 ─ %1 = $(Expr(:foreigncall, :(:jl_object_id), UInt64, svec(Any), 0, :(:ccall), Core.Argument(2)))::UInt64
    └──      return %1
, true)))

close(io)

Thanks in advance!

It seems like you might be looking for Serialization · The Julia Language.

There are various threads on discourse discussing related problems.
My experience is that there is no fully reliable solution. JLD2.jl is working well for me most of the time, but not always. Sometimes files just cannot be read anymore for no reason that I can identify.
For structs that I really need to be able to read again, I convert them into nested Dicts and write those to JLD2 and JSON formats. The logic being that the JSON files can always be read.

1 Like

Unrelated to the serialization question per se, but I would recommend not storing credentials in plaintext on disk. I know you’ve mentioned that this is just an MVP, but from my experience these kinds of “temporary solutions” tend to linger, waiting to blow up.

For some information on alternatives, check out the OWASP Recommendations for storing passwords. If you have more data, it really sounds like you want to use a proper database (e.g. Postgres or SQLite if portability is a concern, both have Julia wrappers available as far as I know) instead of writing to disk directly.

1 Like

I’ve been doing something similar but with the built-in TOML library. For each structure I have code that converts it so it can be read and written by the TOML. So far so good. TOML understands Infinity, NaNs, bool, integers and floats. You can nest information in a few ways.

Definitely agreed! For a more serious go at credential storage I would definitely employ some sort of encryption of the strings prior to storing them as binary or encrypting the entire file after the fact. Thanks for the recommendations, thought!

I see what you mean. I will try using JLD2 as it looks like it is “good enough” for most stuff.

But it is important to understand that you will occasionally lose files. This is at least my experience, and it may have to do with the fact that I read and write files on different operating systems.

1 Like