I would like to extract a float from a string of the form
strg = file_t=0.234
. In Python, I would use a package called scanf
with similar functionality to what is offered in C++. I might also use regular expressions, but that is the last resort. What solutions does Julia offer besides regular expressions? If any. Thanks.
strg = "file_t=0.234"
parse(Float64,split(strg,'=')[2])
Thank you! I knew I’d use parse, and split, but never thought of using the [2] selector.
Oops. My minimal example was too minimal. How about extracting the flow from the string: "time t=0.234_ext.bson"
?
I thought that parse
would extract the first float it finds, similar to scanf
in C++, rather than require that the string parsed was a float.
Thanks.
If we stick with split and a fixed format we could do:
julia> strg="time t=0.234_ext.bson"
"time t=0.234_ext.bson"
julia> parse(Float64,split(split(strg,'=')[2],'_')[1])
0.234
If the format isn’t as fixed I would do a regular expression. Why do you exclude them?
For the first example there is another one (besides multiple others):
julia> strg = "file_t=0.234"
"file_t=0.234"
julia> eval(Meta.parse(strg))
0.234
julia> file_t
0.234
Any particular reason you want to avoid regexps? They are often a good solution for tasks like this.
To compare with @oheil’s nice solution,
using BenchmarkTools
strg="time t=0.234_ext.bson"
f1(str) = parse(Float64, split(split(str, '=')[2], '_')[1])
f2(str) = parse(Float64, match(r".*=(.*)_.*", str).captures[1])
julia> @btime f1($strg)
592.303 ns (6 allocations: 448 bytes)
0.234
julia> @btime f2($strg)
389.634 ns (4 allocations: 288 bytes)
0.234
In addition, it is easier to do more validation wiht regexps.
I have more than enough information now, thanks.
I tend to avoid regular expressions because I never learned them really well. I have used them in several languages but always forget the subtleties. I recognize their value.
But I also believe if efficiency is not an issue (reading data once for example, and small amounts of data), I feel that regex is overkill and simpler solutions should be available.
Of course, nothing prevents me from writing specialized routines that are easy to use and tuned to my workflow. It all takes time.
I appreciate the input and the help to make me more proficient.
Perhaps a bit simpler would be to just look for the first float-like portion in the string:
f3(str) = parse(Float64, match(r"\d*\.?\d+", str).match)
Happens to be a bit faster, but more importantly (IMO) I think it’s easier to read and should be more robust.
if your string may contain exponential numbers you might modify the inner part as follows:
match(r"\d*\.?\d+(e[+|-]\d*)?", str).match
(I hope this is correct, I have not tested all possible cases)
It doesn’t accurately match this:
julia> match(r"\d*\.?\d+(e[+|-]\d*)?", "2.3e6").match
"2.3"
Nor Float32
:
julia> match(r"\d*\.?\d+(e[+|-]\d*)?", "2.3f-6").match
"2.3"
Changing "e[+|-]"
to "[ef][+-]?"
helps:
julia> match(r"\d*\.?\d+([ef][+-]?\d*)?", "2.3e6").match
"2.3e6"
julia> match(r"\d*\.?\d+([ef][+-]?\d*)?", "2.3f-6").match
"2.3f-6"
It needs -?
, in order to catch negative floats as well:
match(r"-?\d*\.?\d+([ef][+-]?\d*)?", "-2.3f-6").match
comment:
parse cannot handle “f+/-xx”
parse(Float64, "-1.234f-4")
ERROR: ArgumentError: cannot parse "-1.234f-04" as Float64
That’s probably because it’s not a Float64
, but a Float32
.
parse(Float32, "-1.234f-04")
fails as well
See https://regexr.com (I believe the site I had in mind), but also consider: GitHub - jkrumbiegel/ReadableRegex.jl: regexes for people who don't really want to learn or read regexes
I’m stumped. A parser bug?
See this issue here, confirming that we cannot parse as Float32 strings like "-1.234f04"
using parse(Float32, "-1.234f04")
Meta.parse("-1.234f04")
works.