Boxing obvious cases where the type is clear

Im new to Julia but this whole world of dynamic and static typing is becoming very disconcerning. Take a look at this:

Regardless what i do i get:
Locals #15::var"#15#17"{String, Int64, Int64, Int64} n_markets::Core.Box current_files::Core.Box current_sizes::Core.Box market_data::Core.Box market_dates::Core.Box
Theres no reason those to be boxed whatsover from what i can tell from this code.

Bigger picture Q is there a compiler flag or mode where i can just opt out of dynamic types altogether? So when the compiler tries to convert something to Any etc instead of doing it transparently or having to look every type at the code_warntype it just fails for me ? Even better if the language server did it, essentially treat my deployment as static typed and warn for everything where the type cant be deduced by static analysis, Im not sure who uses Julia in the opposite way really).

As an added bonus the debugger doesn’t work in open do blocks, it just skips over them, for which i opened a different Q.

Another suble issue that arises only in Julia and not Rust or C++ is how floats are parsed:
20110628,63.73093027151999,63.99925613403319,63.58644751856259,63.9992561340332
This line represents date, Open, high, low, close and this is the check:

if high < open || high < close || low > open || low > close
                        println("\nERROR... Open or close outside high/low bounds in market file $file_path line $(line_number)")
                        return
                    end

this check passes in C++ and Rust but not in Julia and technically Julias behavior is correct, but what i dont get is how if they all implement the same floating point standard?

fyi do creates an anonymous function, so
open("path") do fpReport < insert logic here > is equivalent to open(fpReport -> <insert logic here>, "path")

thus you are capturing an outer variable into an anonymous function then mutating it, hence the Box

is there a compiler flag or mode where i can just opt out of dynamic types altogether?

Unfortunately no not really, at least not anything user-friendly. Although this is a pretty commonly-requested feature so chances are nonzero it will happen in not too distant future.

re: the last question I think it will be difficult to answer without an example reproducer

So how can the variables be passed in the do blocks in an idiomatic way such that they’re captured correctly ? I would imagine this is super common scenario?

Re repro, here’s a file let’s call it XYZ.csv:

20110609,62.85717263410146,63.593349900812,62.60260470107144,63.269981384277344
20110610,62.91223916924107,63.22872617325794,62.36182379320736,62.42374801635742
20110613,62.70579471253141,63.20804883745886,62.54067229780943,62.815879821777344
20110614,63.49012397931031,63.97861477357416,63.325001587382886,63.703407287597656
20110615,63.09106590160633,63.35251534437989,62.320491331273054,62.630096435546875
20110616,62.65766266527435,63.21495435250544,62.37557862884048,63.00855255126953
20110617,63.503889483210116,63.730930811698215,62.97411413539633,63.1461181640625
20110620,62.89844689691775,63.89607364985452,62.89157051352164,63.68278503417969
20110621,63.854784857686504,64.52904307515169,63.524540005613694,64.28135681152344
20110622,64.21254810781124,64.61847511399317,63.79285783919967,63.847900390625
20110623,63.325019923987846,63.58646422433981,62.423720264348994,63.53142166137695
20110624,63.53142745844823,63.66215224512635,62.38932304306761,62.53380584716797
20110627,62.66450488022188,63.74468989954973,62.49937724412941,63.469482421875
20110628,63.73093027151999,63.99925613403319,63.58644751856259,63.9992561340332

creating a file file_list_names that has only a single line having the path to XYZ.csv should make the example of executing julia> @code_warntype choose("data/markets.txt", 1000, 100, 100) working.

The float error is on the last line in that sample CSV. Debugging has something to do with that lambda too, it’s completely ignored by the debugger :frowning:

in this case you don’t use them outside the do block, so the easiest fix would be to move the initialization inside the inner do

however, I do feel like I should ask — have you actually observed the presence of these Core.Box to cause any relevant performance issues in this instance? I would be surprised if that’s your bottleneck here

The float error is on the last line in that sample CSV.

It appears Julia does the right thing here, so you might have to help me understand the problem better.

it’s completely ignored by the debugger :frowning:

very unfortunate, but I don’t doubt it. I have also not had a reliable experience with the debugger.

Personally I just don’t use do blocks. They’re unnecessary syntax sugar anyway, and arguably make the code more diffcult to follow.

1 Like

@nsajko is the solution explicit lambdas or something else ? Lambdas as a means of RAII everywhere seems rather ugly and not super programmer friendly either.

@adienes well yes in this case that’s not a performance issue but it’s annoying that things can switch under the hood with zero warning so what you end up writing is absolutely not what you expect. In pretty much every other programming language this will be a clear stack allocation, when you explain it and you look at it after the fact it does make sense, but this is completely un-obvious with detremental performance in pretty much every case, so im just trying to find a way to proactively prevent it.

With regrards to the float, yes Julia is doing the correct thing, Im just wondering why, reading the string with high = (float) atof in c++ and rust and then doing the same comparison results in identical behavior of lost precision, while in Julia no, when both should have the same float implementation, especially with Rust.

Yeah. Your biggest problem is that your function is way to huge, try decomposing it.

float is single precision. The corresponding type in Julia is Float32. You are probably parsing the numbers as Float64 in Julia.

Not so for closures, which is what you’re doing here. An excerpt from the link:

A language implementation cannot easily support full closures if its run-time memory model allocates all automatic variables on a linear stack. In such languages, a function’s automatic local variables are deallocated when the function returns. However, a closure requires that the free variables it references survive the enclosing function’s execution. Therefore, those variables must be allocated so that they persist until no longer needed, typically via heap allocation, rather than on the stack, and their lifetime must be managed so they survive until all closures referencing them are no longer in use.

It might appear that the do block closure only executes within the enclosing function’s scope and thus does not need to allocate outside its stack, but in general a higher order function could cache the closure somewhere that outlives the scope, so there’s no way around this.

The type inference limitation is orthogonal to and would arise under both static and dynamic typing. I’ve written about this before but long story short, it’s obviously impossible to infer a finite set of types for a captured variable that is reassigned by a closure because a method can be compiled for infinitely many call signatures and thus assign instances of infinitely many types to the captured variable. Hypothetically it’s possible that a typed box can be used for the variables you annotated, but right now it’s stored in an untyped Core.Box, and type conversions and assertions are inserted to enforce its annotated type.

For now, there’s a couple things you can do.

  1. Refactor to storing the captured data in Ref{T} and mutate that instead of reassigning the variable directly. If you don’t ever reassign the variable, the captured variable will be inferred as Ref{T} and the contents inferred as T. This isn’t conceptually different from annotating and reassigning the variable, but it’s better than a Core.Box surrounded by type conversions and assertions.
  1. Refactor so the variables aren’t captured, but provided to the closure as arguments. It could be easier to avoid unintended captures if you wrote a globally scoped function to pass into open instead of a locally scoped closure via do block. This is probably the cleaner option given how large and nested these do blocks are.
1 Like

Thanks @Benny thats a great description. I did just go back to no closures and manually closing resources, still a pain in the ass as then i guess it all has to be wrapped in some try catch always and exceptions and dealing with them is another thing im not fond of.

One side question, the only remaining type instability now is what’s inside of row as returned by the CSV parser. Im not sure if this is the recommended way to parse a CSV in Julia but it definitely is the most intuitive. There are several issues, why the data inside of row doesn’t default to bytes or at least String, why is there no way to reuse the row buffer and maybe even the byte buffers for each cell, or at least opt into it. Whats a recommended fast streaming parser that doesn’t allocate?

I also changed it to take this form now:
for (line_number, row) in enumerate(CSV.Rows(file_path, header=false, reusebuffer=true, types=[Int, Float64, Float64, Float64, Float64]))

but row as well as


            open = row[2]
            high = row[3]
            low = row[4]
            close = row[5]

are still Any:

  row::Any
  line_number::Int64
  close::Any
  low::Any
  high::Any
  open::Any

Removing the enumerate slightly fixes the issue and we get:

 row::CSV.Row2
  close::Any
  low::Any
  high::Any
  open::Any

just a puzzling.

I think after experimenting a while and trying to consider Julia as a viable option to port my Rust code in a few instance im stating to lean towads no, the compiler does wayyy to many things that make very little sense to me as a developer with a fair bit of experience, worse they are completely hidden and there’s no warnings and i find that i have to keep running code_warntype every time i make a structural change to make sure it didnt rewrite my code in an unexpected way, which basically erodes any kind of productivity gain over Rust + evcxr. :frowning: .

It’s true that the compiler takes some time to get used to, but you will learn. In this case, your issue is that the Row structure isn’t parameterized based on the types you provide, so there’s no way for the compiler to know what the types will be.

You can provide type assertions to fix this:

open = row[2]::Float64
high = row[3]::Float64
low = row[4]::Float64
close = row[5]::Float64
2 Likes

By the way Cthulhu.jl can show the types in the source code itself, so you might want to consider that as an alternative to code_warntype>

1 Like

For something with ergonomics closer to that of a static compiler/linter, GitHub - aviatesk/JET.jl: An experimental code analyzer for Julia. No need for additional type annotations. may be preferable.

julia> using JET

julia> function f()
           x = 1
           x = 2 # causes boxing
           () -> x
       end
f (generic function with 1 method)

julia> JET.@report_opt f() # usually a top-level function call of your choice
═════ 1 possible error found ═════
┌ f() @ Main ./REPL[2]:2
│ captured variable `x` detected
└────────────────────
2 Likes

These two are unrelated, you can pass local or global functions that don’t capture variables into open so you don’t have to manually close files.

It does, or rather a PosLenString by default. You opted out of it by passing types. Instead, you could index row to get a PosLenString then parse it to a specified type; the docs suggest Parsers.parse(type, string) from Parsers.jl. Bear in mind these are runtime types, not what the compiler infers. There’s no such thing as an instance of Any, that’s just the narrowest supertype the compiler can narrow down.

Despite type inference being good enough to reach “as good as static typing” in most cases, like I said before, type inference has limits. In this case, it’s built into CSV.Rows. This is actually where dynamic typing comes into play: some languages handle types at runtime instead of erroring at compile-time if a concrete type is not known at compile-time. The advantage is that this opens up more kinds of programs without repeated compilation or complicated boilerplate; one quintessential example is runtime eval ( Meta.parse then Core.eval in Julia), which must be type-unstable because it starts at a string and can result in anything. The disadvantage is doing more at runtime, hence worse performance. You are right to think that this is fundamentally different from statically typed languages, even those with type inference and its limitations.

Fortunately there are practices to help the compiler in Julia. As mentioned earlier, annotating variables inserts converts that steer to the right type at runtime and inserts typeasserts to inform the compiler. “Function barriers” are more particular to Julia’s call-wise compilation scheme; the strategy is to isolate the inherently unstable code from the possibly stable code in separate functions. Once the type-unstable part is done, the data can be passed into runtime-dispatched function calls that are internally type-stable. However, if the possibly stable code is short and doesn’t run in a hot loop, it wouldn’t be worth the barrier.

1 Like