Extracting a float from a string

erlebach · July 15, 2020, 12:45pm

I would like to extract a float from a string of the form
strg = file_t=0.234. In Python, I would use a package called scanf with similar functionality to what is offered in C++. I might also use regular expressions, but that is the last resort. What solutions does Julia offer besides regular expressions? If any. Thanks.

oheil · July 15, 2020, 12:49pm

strg = "file_t=0.234"
parse(Float64,split(strg,'=')[2])

erlebach · July 15, 2020, 12:56pm

Thank you! I knew I’d use parse, and split, but never thought of using the [2] selector.

erlebach · July 15, 2020, 12:59pm

Oops. My minimal example was too minimal. How about extracting the flow from the string: "time t=0.234_ext.bson"?
I thought that parse would extract the first float it finds, similar to scanf in C++, rather than require that the string parsed was a float.
Thanks.

oheil · July 15, 2020, 1:04pm

If we stick with split and a fixed format we could do:

julia> strg="time t=0.234_ext.bson"
"time t=0.234_ext.bson"
julia> parse(Float64,split(split(strg,'=')[2],'_')[1])
0.234

If the format isn’t as fixed I would do a regular expression. Why do you exclude them?

For the first example there is another one (besides multiple others):

julia> strg = "file_t=0.234"
"file_t=0.234"
julia> eval(Meta.parse(strg))
0.234
julia> file_t
0.234

oheil · July 15, 2020, 1:15pm

https://github.com/JuliaIO/Formatting.jl/issues/21

oheil · July 15, 2020, 1:16pm

https://gist.github.com/c42f/9999dc6f9b63a9bd4ea4237a95876475

Tamas_Papp · July 15, 2020, 1:24pm

Any particular reason you want to avoid regexps? They are often a good solution for tasks like this.

To compare with @oheil’s nice solution,

using BenchmarkTools
strg="time t=0.234_ext.bson"
f1(str) = parse(Float64, split(split(str, '=')[2], '_')[1])
f2(str) = parse(Float64, match(r".*=(.*)_.*", str).captures[1])

julia> @btime f1($strg)
  592.303 ns (6 allocations: 448 bytes)
0.234

julia> @btime f2($strg)
  389.634 ns (4 allocations: 288 bytes)
0.234

In addition, it is easier to do more validation wiht regexps.

erlebach · July 15, 2020, 1:52pm

I have more than enough information now, thanks.
I tend to avoid regular expressions because I never learned them really well. I have used them in several languages but always forget the subtleties. I recognize their value.

But I also believe if efficiency is not an issue (reading data once for example, and small amounts of data), I feel that regex is overkill and simpler solutions should be available.
Of course, nothing prevents me from writing specialized routines that are easy to use and tuned to my workflow. It all takes time.

I appreciate the input and the help to make me more proficient.

mbauman · July 15, 2020, 2:18pm

Perhaps a bit simpler would be to just look for the first float-like portion in the string:

f3(str) = parse(Float64, match(r"\d*\.?\d+", str).match)

Happens to be a bit faster, but more importantly (IMO) I think it’s easier to read and should be more robust.

ellocco · October 13, 2022, 8:25am

if your string may contain exponential numbers you might modify the inner part as follows:

match(r"\d*\.?\d+(e[+|-]\d*)?", str).match

(I hope this is correct, I have not tested all possible cases)

DNF · October 13, 2022, 9:14am

It doesn’t accurately match this:

julia> match(r"\d*\.?\d+(e[+|-]\d*)?", "2.3e6").match
"2.3"

Nor Float32:

julia> match(r"\d*\.?\d+(e[+|-]\d*)?", "2.3f-6").match
"2.3"

Changing "e[+|-]" to "[ef][+-]?" helps:

julia> match(r"\d*\.?\d+([ef][+-]?\d*)?", "2.3e6").match
"2.3e6"

julia> match(r"\d*\.?\d+([ef][+-]?\d*)?", "2.3f-6").match
"2.3f-6"

rafael.guerra · October 13, 2022, 9:40am

It needs -?, in order to catch negative floats as well:

match(r"-?\d*\.?\d+([ef][+-]?\d*)?", "-2.3f-6").match

ellocco · October 13, 2022, 1:00pm

comment:
parse cannot handle “f+/-xx”

parse(Float64, "-1.234f-4")
ERROR: ArgumentError: cannot parse "-1.234f-04" as Float64

DNF · October 13, 2022, 1:02pm

That’s probably because it’s not a Float64, but a Float32.

ellocco · October 13, 2022, 1:03pm

parse(Float32, "-1.234f-04")

fails as well

Palli · October 13, 2022, 1:06pm

See https://regexr.com (I believe the site I had in mind), but also consider: GitHub - jkrumbiegel/ReadableRegex.jl: regexes for people who don't really want to learn or read regexes

DNF · October 13, 2022, 1:12pm

I’m stumped. A parser bug?

rafael.guerra · October 13, 2022, 1:15pm

See this issue here, confirming that we cannot parse as Float32 strings like "-1.234f04" using parse(Float32, "-1.234f04")

DNF · October 13, 2022, 1:26pm

Meta.parse("-1.234f04") works.

Topic		Replies	Views
Parse Float32 string General Usage	9	470	September 8, 2023
Parse string representation of a vector into floats General Usage question	17	240	September 5, 2024
Best way to get all substrings or numbers matching a regex General Usage strings , regex , parsing	9	8503	November 27, 2019
Questions about string New to Julia question , strings , regex	21	1046	January 16, 2023
Converting strings of numbers to numbers? New to Julia	22	35708	May 29, 2018

Extracting a float from a string

Related topics