Parse String to DataType

I can do

julia> s = "Complex{Float64}"
"Complex{Float64}"

julia> x = eval(parse(s))
Complex{Float64}

julia> typeof(x)
DataType

Is there a neater way than using eval to interpret the string as a datatype? Why is there no parse(DataType, s)?

2 Likes

x-ref: https://github.com/JuliaLang/julia/issues/24349

2 Likes

Perhaps include_string(s) ?

Thanks for your replies!

Is there still no better way to parse a DataType than to use an eval statement? include_string is essentially an eval, isn’t it?

Why would you provide data types as strings outside of a code interpreter?

For example, as part of a header for a data file that says what the element type of the file is.

My specific use case is this: I have a bunch of large arrays that I would like to work with, but they can have different element types (Float32, Int16) and different sizes. I’m generating the arrays as binary files on a one computer, and memory mapping them on a second computer. Right now the files are just binary values, and the metadata (identifier, element type, size) are in the file name. I would like to read the file name, parse it with a regular expression, and use the capture groups to memory map the file.

I’ve wanted to convert strings into datatypes for similar reasons (producing data artefacts in julia that I want to store and use later) a number of times now. Perhaps there’s some easier way that I’m missing?

You definitely don’t want to use eval for this, because that could inject arbitrary code into your program via the data file.

Most likely you only support a relatively small set of element types, in which case you can use a dictionary mapping supported type strings to the corresponding types. Dict("Float32"=>Float32,"Int16"=>Int16,...)

2 Likes

Yeah, I might end up doing something like the dictionary.

I don’t really understand why DataType isn’t already a parse target, since they’re such an important part of Julia. Simple data types (Float32 etc…) usually parse to Symbols, so I could alternatively just eval only if the string parses into a symbol. Sadly, things like "Complex{T}" don’t parse into Symbols.

How about

julia> macro datatype(str); :($(Symbol(str))); end
@datatype (macro with 1 method)

julia> t = @datatype("Float64")
Float64

julia> typeof(t)
DataType
julia> x = "Float64"
"Float64"

julia> @datatype x
"Float64"

julia> @macroexpand @datatype(x)
:(Main.x)

There’s no way around eval if you want to handle arbitrary types. Many people seems to think macro gives you magic power to accomplish things (edit: at runtime) that are not possible without them. They don’t, they are just a fancy way to save repeated typing.

Going from string to value is not parsing, its evaluating (ok, parsing (to AST) plus evaluating (to value)). The only way to evaluate something from a runtime expression is eval, everything else is just eval wrapped in one way or another, including include, include_string etc.

The real question is what property you want. Do you want to handle all possible types? If yes, then you have to use eval. Do you want to avoid code injection attach? Then you can’t use eval, you can use a table (Dict) or use an arbitrary set of operations you support and are safe for your application.

They are not. Not for the parser and not any more than basically anything else. Types are values, they are special in dispatch but that has nothing to do with syntax/parsing. Designing a system that give them more coupling is possible (C++…) but that’s usually not a desired property and makes parsing so much harder.

As an example, what do you think the last B in the following code should parse to?

a = 2 + 2
struct B
    x::NTuple{a,Int}
end
B

It’s obviously a type (which is clear in this case but does not have to in general in julia) but how do you want the parser to figure out all it’s property before the code is evaluated?

1 Like

I must have stumbled on a corner case:

julia> macro datatype(str); :($(Symbol(str))); end
@datatype (macro with 1 method)

julia> @datatype "Float64"
Float64

julia> @datatype "String"
String

julia> @datatype "Int"
Int64

Looks like “runtime magic” to me.
However

@macroexpand @datatype "Float64"
:(Main.Float64)

@macroexpand @datatype "Int"
:(Main.Int)

makes it look as if the macro itself is being “re-compiled” at runtime. What is going on here?

No. All of that is accomplished at compile time.

The trick is you are passing a string to the macro, but when you pass a variable to the macro it doesn’t get the value of the variable, it just gets the symbol of the variable name.

julia> macro mydump(x)
       dump(x)
       end
@mydump (macro with 1 method)

julia> x="Int"
"Int"

julia> @mydump x
Symbol x

As you can see, @datatype "Float64" is basically the same as writing Main.Float64. It doesn’t allow you to parse a runtime string and turn that into a value without eval. It just allow you to spell something differently.

Also, FWIW, :($(...)) is always the same as ... itself…

Thanks for the responses guys!

Ideally I would like something that takes a string, a module (defaulting to Main), and returns the type if the string matches an existing type in the specified module, and errors if the string does not match an existing type.

Yes, but with there is no easy and safe way to parse strings into those values, unlike Int or other values.

Yeah I understand why it’s impossible to parse an arbitrary string into a arbitrary datatype, but I do think parsing into existing DataTypes in some specified module seems possible, without having to roll your own lookup table.

Something like this I guess, except it’d be nice if it handled parametric types:

macro datatype(name, namespace = Core)
    name_sym = Symbol(name)
    quote
        if Symbol($name) in names($namespace) && isa($namespace.$name_sym, DataType)
            $namespace.$name_sym
        else
            error("Not a datatype")
        end
    end
end

foo = @datatype "Float32"
foo == Float32 # true

foo = @datatype "println(\"Mwahahaha\")" # errors 
1 Like

First, again, macro buy you absolutely nothing here. You’ll also have a hard type passing in the namespace parameter to the macro.

See the examples I gave, these does not answer the question sepecific enough. In particular what does “existing type in the specific module” mean. Esepcially for paramatric type, what value do you want to allow to be the type parameters.

Types are nothing special here. There are infinite number of types that cannot be parsed from. The exception here is actually types that are has literal representations. There are only a few types that has this property (Int, UInt*, Float*, String, is there any others that I missed?) It’s possible for these types because they have a context independent representation. There are indeed many other types that has context independent representation without a literal syntax but the context independent representation is a hard requirement for having a literal syntax. (By literal syntax I mean it can be parsed to a value in the AST. I’m not counting [] as literal here). DataTypes does not have this property. You cannot express a type without referencing where it’s coming from and what you think as a type is barely a name that the type is bound to.

No. For one, there’s absolutely nothing special about type here. Also, although scope resolution can be done at parse time, value resolution can’t. It’s not useful as an optimization and it’s not useful as an API for exactly the reason I mentioned above, the expectation for the feature is way too vague.

As another example of why your request is way too vague, consider which one of the following types do you want to support?

Val{()}
Val{(1,)}
Val{(1,:a)}
Val{(1,:a,pi)} # which prints as `Val{(1, :a, π = 3.1415926535897...)}` btw

These are still fairly simple types, and with values in the type parameters that are even harder to construct, you quickly need a full interpreter to handle all of those.

Put it another way, obviously it’s possible to create a version that handles only the simple cases that’s useful for you as long as you precisely define what simple means. You can create a version that has an arbitrarily defined limitation though the more feature complete you want it to be the harder it is to write it. Since the definition of simple/the limitation you can work with is in general very arbitrary, it’s not useful to have any of them as a base function. Two exceptions are empty set, i.e. only parsing, and full set, i.e. the interpreter, which are special, well defined, useful and are indeed implemented in base.

5 Likes

I see, thanks for taking the time to explain it.

1 Like