Extracting a float from a string

I would like to extract a float from a string of the form
strg = file_t=0.234. In Python, I would use a package called scanf with similar functionality to what is offered in C++. I might also use regular expressions, but that is the last resort. What solutions does Julia offer besides regular expressions? If any. Thanks.

strg = "file_t=0.234"
parse(Float64,split(strg,'=')[2])
1 Like

Thank you! I knew I’d use parse, and split, but never thought of using the [2] selector.

Oops. My minimal example was too minimal. How about extracting the flow from the string: "time t=0.234_ext.bson"?
I thought that parse would extract the first float it finds, similar to scanf in C++, rather than require that the string parsed was a float.
Thanks.

If we stick with split and a fixed format we could do:

julia> strg="time t=0.234_ext.bson"
"time t=0.234_ext.bson"
julia> parse(Float64,split(split(strg,'=')[2],'_')[1])
0.234

If the format isn’t as fixed I would do a regular expression. Why do you exclude them?

For the first example there is another one (besides multiple others):

julia> strg = "file_t=0.234"
"file_t=0.234"
julia> eval(Meta.parse(strg))
0.234
julia> file_t
0.234
1 Like

https://github.com/JuliaIO/Formatting.jl/issues/21

1 Like

https://gist.github.com/c42f/9999dc6f9b63a9bd4ea4237a95876475

Any particular reason you want to avoid regexps? They are often a good solution for tasks like this.

To compare with @oheil’s nice solution,

using BenchmarkTools
strg="time t=0.234_ext.bson"
f1(str) = parse(Float64, split(split(str, '=')[2], '_')[1])
f2(str) = parse(Float64, match(r".*=(.*)_.*", str).captures[1])

julia> @btime f1($strg)
  592.303 ns (6 allocations: 448 bytes)
0.234

julia> @btime f2($strg)
  389.634 ns (4 allocations: 288 bytes)
0.234

In addition, it is easier to do more validation wiht regexps.

6 Likes

I have more than enough information now, thanks.
I tend to avoid regular expressions because I never learned them really well. I have used them in several languages but always forget the subtleties. I recognize their value.

But I also believe if efficiency is not an issue (reading data once for example, and small amounts of data), I feel that regex is overkill and simpler solutions should be available.
Of course, nothing prevents me from writing specialized routines that are easy to use and tuned to my workflow. It all takes time.

I appreciate the input and the help to make me more proficient.

4 Likes

Perhaps a bit simpler would be to just look for the first float-like portion in the string:

f3(str) = parse(Float64, match(r"\d*\.?\d+", str).match)

Happens to be a bit faster, but more importantly (IMO) I think it’s easier to read and should be more robust.

5 Likes

if your string may contain exponential numbers you might modify the inner part as follows:

match(r"\d*\.?\d+(e[+|-]\d*)?", str).match

(I hope this is correct, I have not tested all possible cases)

It doesn’t accurately match this:

julia> match(r"\d*\.?\d+(e[+|-]\d*)?", "2.3e6").match
"2.3"

Nor Float32:

julia> match(r"\d*\.?\d+(e[+|-]\d*)?", "2.3f-6").match
"2.3"

Changing "e[+|-]" to "[ef][+-]?" helps:

julia> match(r"\d*\.?\d+([ef][+-]?\d*)?", "2.3e6").match
"2.3e6"

julia> match(r"\d*\.?\d+([ef][+-]?\d*)?", "2.3f-6").match
"2.3f-6"
3 Likes

It needs -?, in order to catch negative floats as well:

match(r"-?\d*\.?\d+([ef][+-]?\d*)?", "-2.3f-6").match
3 Likes

comment:
parse cannot handle “f+/-xx”

parse(Float64, "-1.234f-4")
ERROR: ArgumentError: cannot parse "-1.234f-04" as Float64

That’s probably because it’s not a Float64, but a Float32.

parse(Float32, "-1.234f-04")

fails as well :frowning:

See https://regexr.com (I believe the site I had in mind), but also consider: GitHub - jkrumbiegel/ReadableRegex.jl: regexes for people who don't really want to learn or read regexes

I’m stumped. A parser bug?

See this issue here, confirming that we cannot parse as Float32 strings like "-1.234f04" using parse(Float32, "-1.234f04")

1 Like

Meta.parse("-1.234f04") works.

1 Like