Converting strings of numbers to numbers?


#1

I have strings of numbers (possibly a vector of strings of numbers) which can be either integer or floating point – I may not know the type in advance – and I want to preserve the type. I understand that I should use function parse to convert the string to number. One possible (but ugly?) strategy is the following (e.g., with s="200" or s="2.0e3"):

s = "200";
x = try
    parse(Int,s)
  catch
    parse(Float64,s)
end;

Is there a better way to do this? Something like parse(Number,s), or something?


#2

Look at tryparse.


#3

You can consider parsing as Float64 and then checking if it is an integer:

y = parse(Float64, "200")
x = isinteger(y) ? Int(y) : y

#4

Thanks for 2 suggestions!


#5

I don’t understand what is wrong with plain old parse.

julia> s1 = "200"
"200"

julia> s2 = "2.0e3"
"2.0e3"

julia> parse(s1)
200

julia> parse(s2)
2000.0


#6

It will no longer work in v0.7/master.
parse(string) is really designed for parsing Julia code, so it has been moved to Meta.parse.

Unfortunately, you can’t either do parse(Number, string), or parse(Unsigned, string) which would be useful if you don’t mind the type instability, and simply want to get the value returned as a type that can represent the number in the string (if it is a valid format), possibly as a BigInt or BigFloat.

I may just have to add that functionality to my Strs package :grinning:


#7

There is a problem with that though, if the integer would take more than 52/53 bits to represent exactly.


#8
  1. It invokes the Julia parser, which is a much more generic solution, and consequently less efficient for a specific parsing task. Eg on v0.6.2, I find a ~400x speed difference between parse(...) and parse(Int, ...).
  2. For the same reason, it does not restrict or validate input (as long as it is valid Julia code). This can lead to unintended consequences.

Parsing text as a given type and parsing it as code are very different tasks; consequently v0.7 distinguishes Meta.parse.


#9

I tried to do parse(Rational,"2//3") but that didn’t work.
Anyway, I assume that parse(Int,string) and parse(Float64,string) will work in v0.7, and that it is parse(string) that won’t work in v0.7??


#10

Technically parse(string) will work in v0.7, but you get a deprecation warning.


#11

Just a minor comment. Sorry that’s a bit out of topic.

Would it be better at this point to say "It will no longer work in v1.0”? :sweat_smile: So the discussion keeps more focused on what’s the technical problem and not generating confusion between what is v0.7 and v1.0 and so on.


#12

Ok: will parse(Int,string) work i v1.0?


#13

Yes, AFAIK what doesn’t show a deprecation warning in v0.7 will work in v1.0 too. What still works in v0.7 but prints a deprecation warning, then you might want to change it to something more stable, since in v1.0 won’t work.

Go for parse(Int, str) :yum:


#14

Yes (unless, of course, a last minute PR changes that, but this is unlikely :wink:)

However, generally, if you are asking questions in the “first steps” category, you should not need to worry about v0.7 at this point. People are trying to be helpful by pointing these things out, but this is not an immediate concern if you are just learning the language.


#15

One possibility would also be to replace parse(Int,str) with Int(str), parse(Float64,str) with Float64(str), introduce Rational(str), etc. – unless some of these methods are taken.


#16

Conversion (convert(T, ...)), construction (T(...)), and parsing (parse(T, ...)) are usually distinguished (the manual talks about the first two), even if the distinction isn’t always clearcut. Parsing usually involves ambiguities that have to be resolved/decided (eg the base), and the user should always be prepared for a ParseError (or preferably use tryparse). Because of this, I like the current arrangement.


#17

Might it not be possible to speed up the generic Meta.parse method by initially using a regex to check if the string is only digits, or has a decimal point, then branching it to parse(Int,...) or whatever? This might help with the 400x performance gap, to at least give it reasonable performance for floats and ints, and one can use the generic algorithm then with better speed.


#18

For short strings like that, I’m afraid that using Regex might be slower than just some optimized tests for those conditions.


#19

Not really, because the check will fail in all cases where Meta.parse is supposed to be used and for the other cases there are already optimized versions. Also, the regex lookup itself will dominate the time so this would still be much slower than using the correct function.


#20

The main goal of my post was not to use regexes specifically or to try to replace the specialized methods. There are real world scenarios where completely arbitraty code has to be parsed, and a large percentage of the time you might be dealing with integers or floats, but you might have to be prepared to deal with algebraic expressions and other functional lisp forms. Therefore it is desirable to have a meta-method that selects the optimal parse method upon arbitray code parsing, since input may contain a large percentage of integers or floats, but it is also necessary to be able to handle expressions if necessary.