Is there something similar to grep -q Filename String in Julia? Meaning, a command that just checks if a string is present in a text file and returns true or false?
Alternatively (this is actually what I need), is there a way to check if a JSON file has a field, without actually loading it into memory? I’m using JSON3 to read/write json files with nested structs, and I need to know if one o the nested structs has a specific field before reading the json file into the appropriate data structure.
I have a single-line long file (that’s how the JSON3 package saves my struct). So now I’ve put the relevant field as the first field in the line, something like:
and I’m reading char-by-char (my goal is to get that version number), with:
julia> function get_version(filename)
i = 0
str = ""
file = open(filename, "r")
while true
c = read(file, Char)
i += 1
(c == ',' || i > 100) && break
str = str*c
end
close(file)
str = split(str, ":")
v = if occursin("Version", str[1])
parse(VersionNumber, string(str[2][begin+1:end-1]))
else
v"0.0.0"
end
return v
end
I think this is not reading the whole file in memory (let me know if I’m wrong). I’m open to better ideas as well.
You can do memory mapping with StringViews.jl. Given a StringView of the memory mapped file, you can then call functions like contains or match and it will then load only the file pages that are needed. (Of course, if the string isn’t found, these functions will still traverse the whole file, but if the file is larger than what fits in memory the VM should page things out when they are no longer needed.)
For example:
using StringViews, Mmap
open(filename, "r") do io
s = StringView(mmap(io))
v = match(r"\"Version\":\"([^\"]*)\"", s)
return isnothing(v) ? v"0.0.0" : VersionNumber(v[1])
end
If you know it’s the first field, and you can bound the length of the version, you could also just read the first few bytes of the file, e.g. with:
s = open(io -> String(readbytes!(io, UInt8[], 256)), filename, "r")
and then do regex extraction on that. This should be more efficient than reading character by character.
(Building a string this way is especially inefficient because it allocates a new string for each character, which is O(n^2) for n characters. If you are going to iteratively build strings in Julia, do it by writing into an IOBuffer.)
I was going to suggest that, but then I saw that @lmiq wanted to give up after 100 characters if a comma isn’t found, which readuntil doesn’t support. Depends on how badly you want to avoid reading in the whole file in the worst case of a comma-free file.