Foolproof method for converting to Float64

mjanun · May 28, 2021, 7:11pm

I am reading some data from a CSV file, which is intended to be Float64. But it does not always parses as Float64- some times its type is Any, some times String, some times integer, etc.

To convert it to Float64, I use parse.(Float64,DataFrame.colname) but it does not always work. The data may have a currency symbol or a comma separator.

Imagine that the data in that particular column could be one of the format given below.

Is there a way of converting it to Float64 which always work? I am looking for a simple function which converts it in to Float64.

A1 = ["£131,222.23","£223,422"]   # I can't use $ symbol here as it throws an error.
A2 = ["131,222.23","223,422"]
A3 = ["£131222.23","£223422"]
A4 = ["131222.23","223422"]
A5 = [131222.23,223422]

jzr · May 28, 2021, 7:20pm

For the strings, remove the £ and comma characters before parsing. For the values that are already floats, don’t do anything. If you’re not sure which type you have, either redesign your system so that you always have the same type by telling CSV to read everything as strings, or dispatch on the type.

mjanun · May 28, 2021, 7:33pm

The CSV file has several different types of fields. I am not sure if one can specify a particular type for one specified column. I gave an example of only a part of the data

jzr · May 28, 2021, 7:43pm

You can

https://csv.juliadata.org/stable/#Providing-Types

jpsamaroo · May 28, 2021, 7:44pm

The problem here is an expectation that data that isn’t quite numerical (contains units, commas, etc.) should still be treated numerically. One could argue that certain patterns like these should be caught by the CSV reader; however, doing that pattern matching is expensive to do by default, and might not actually be intended (some users might want the currency symbol, or the value isn’t actually a number, it just looks like one). In such a case, it’s best to write your own conversion routine that strips out such characters (with replace or something similar) and then converts the resulting strings to Float64.

Sukera · May 28, 2021, 7:55pm

Small aside, it seems like you’re parsing currency into floating point - note that this is usually not a good idea, since floating point is not capable of accurately representing every decimal point value you may get and may introduce roundoff errors (which is the last thing you want when dealing with currency). I’d recommend modeling your currency explicitly (e.g. with a struct containing two fields, one the leading part and the other being the 2 digit fraction, with appropriate definition for +,-,*,/ etc. - Another option would be to save whole cents instead, removing the need for fractional parts in the first place) and instructing CSV.jl to parse the relevant columns into your format instead.

In either case, you still have to canoicalize the to-be-parsed element.

pdeffebach · May 28, 2021, 7:59pm

Unless OP is working in high performance finance where these round-off errors matter, this advice is likely overkill.

Sukera · May 28, 2021, 8:01pm

Uhh… I hope my banks frontend doesn’t aggregate my bank statements using floating point, even though that’s not high performance finance It’s good practice, since it prevents hard to track bugs with the order things are summed in.

mjanun · May 28, 2021, 8:04pm

The data is originally contained in an excel file, so it is in one of the formats that excel regards as numerical. This data is then copied over to a CSV file as it is without any changes. I think Excel limits floats to 15-16 decimal places so accuracy is already reduced

jzr · May 28, 2021, 8:10pm

There is also Tutorial · XLSX.jl if you want to skip the CSV step.

ptoche · May 28, 2021, 8:56pm

You may want to read this part of the docs. Also (disclaimer: I haven’t tried it myself), this could be useful.

Topic		Replies	Views
Converting columns in Dataframe from Int to Float type New to Julia question , dataframes , convert	8	13085	June 12, 2017
String7 type with read CSV? New to Julia	8	334	June 23, 2023
Converting CSV string values to floats (Python to Julia) New to Julia python , dataframes , csv	13	3227	February 2, 2021
Convert Any["0.30", "0.34"] to [0.30, 0.34] New to Julia question	7	448	August 30, 2021
If I have a string such as, "Float64", can I convert it to the Julia DataType? General Usage	8	257	February 10, 2024

Foolproof method for converting to Float64

Related topics