Extracting a float from a string

I would like to extract a float from a string of the form
strg = file_t=0.234. In Python, I would use a package called scanf with similar functionality to what is offered in C++. I might also use regular expressions, but that is the last resort. What solutions does Julia offer besides regular expressions? If any. Thanks.

strg = "file_t=0.234"
parse(Float64,split(strg,'=')[2])
1 Like

Thank you! I knew I’d use parse, and split, but never thought of using the [2] selector.

Oops. My minimal example was too minimal. How about extracting the flow from the string: "time t=0.234_ext.bson"?
I thought that parse would extract the first float it finds, similar to scanf in C++, rather than require that the string parsed was a float.
Thanks.

If we stick with split and a fixed format we could do:

julia> strg="time t=0.234_ext.bson"
"time t=0.234_ext.bson"
julia> parse(Float64,split(split(strg,'=')[2],'_')[1])
0.234

If the format isn’t as fixed I would do a regular expression. Why do you exclude them?

For the first example there is another one (besides multiple others):

julia> strg = "file_t=0.234"
"file_t=0.234"
julia> eval(Meta.parse(strg))
0.234
julia> file_t
0.234
1 Like
1 Like

Any particular reason you want to avoid regexps? They are often a good solution for tasks like this.

To compare with @oheil’s nice solution,

using BenchmarkTools
strg="time t=0.234_ext.bson"
f1(str) = parse(Float64, split(split(str, '=')[2], '_')[1])
f2(str) = parse(Float64, match(r".*=(.*)_.*", str).captures[1])

julia> @btime f1($strg)
  592.303 ns (6 allocations: 448 bytes)
0.234

julia> @btime f2($strg)
  389.634 ns (4 allocations: 288 bytes)
0.234

In addition, it is easier to do more validation wiht regexps.

6 Likes

I have more than enough information now, thanks.
I tend to avoid regular expressions because I never learned them really well. I have used them in several languages but always forget the subtleties. I recognize their value.

But I also believe if efficiency is not an issue (reading data once for example, and small amounts of data), I feel that regex is overkill and simpler solutions should be available.
Of course, nothing prevents me from writing specialized routines that are easy to use and tuned to my workflow. It all takes time.

I appreciate the input and the help to make me more proficient.

4 Likes

Perhaps a bit simpler would be to just look for the first float-like portion in the string:

f3(str) = parse(Float64, match(r"\d*\.?\d+", str).match)

Happens to be a bit faster, but more importantly (IMO) I think it’s easier to read and should be more robust.

3 Likes