Difference b/w include("foo.jl") and eval(parse(readstring(open( "foo.jl" ))))

djsegal · December 16, 2017, 10:31pm

What exactly are the difference between the following calls?

Are some of them actually the same?
Or do they have different underlying C-language representations?

# setup
file_name = "foo.jl"
open_file = open(file_name))
cur_string = readstring(open_file)

# similar calls
include(file_name)
include_string(cur_string)
eval(parse(cur_string))

cur_parse = parse(open_file)
while cur_parse != nothing
  eval(cur_parse) 
  # could also do something like:
  #   include_string(string(cur_parse))
  cur_parse = parse(open_file)
end

One thing I’ve noticed for sure is there are different levels of error checking done between include and eval.

ihnorton · December 17, 2017, 1:21am

include and include_string go through basically the same code path and call jl_parse_eval_all, which loops over expressions one at a time in a JL_TRY block, calling jl_toplevel_eval_flex.

eval goes through jl_toplevel_eval_in first and calls jl_toplevel_eval_flex directly.

So two differences are: parse and open can throw directly, whereas include/_string call some lower-level i/o functions directly; and I think the while loop over expressions could surface errors earlier.

(I’d also suggest to read through the devdocs for some additional information and more grep keywords)

djsegal · December 17, 2017, 4:53am

Thanks for the info! Two followup questions before I delve any further though:

why is there not an include_expression function?
what’s the best way to hook into the julia C code?

ihnorton · December 17, 2017, 5:09am

I’m not sure I understand the question – include from where? Wouldn’t that just be eval?

ccall.

(get cozy with grep, it is your friend – you can see many example ccalls into libjulia, including for some of the functions I mentioned above, in base/)

djsegal · December 17, 2017, 5:17am

Let’s say you first entirely parse a file stream into an array of expressions.

Then you eval the parsed expressions one at a time.

There are some cases where macro nodes get compiled with currently undefined structs, generated functions, inlined functions, etc.

If you tried to do an include_string on a string composed of the same data (as the expression), an error would be raised.

Therefore it is sometimes preferable to load the expression naively (expecting the worst) instead of an eval?

edit: this might deal with the state of a package’s compiler (and stale parsed expression trees)?

maybe it just goes back to:

Way to get string representation of an Expression?

ihnorton · December 17, 2017, 5:36am

I figured as much… I guess what you want is to pre-parse all the files in a directory and do a dependency analysis to decide which file is a pre-cursor, so you can load that one first? Hmm. It’s kind of an interesting problem, but I don’t have a good suggestion right now. Maybe a minimal example would help, but if I think of something I’ll post again. Or correct me if I misunderstand what you want to do.

(I agree with all of the “what, why?” responses to Boot.jl but I’ll still try to answer your questions if I can grok them. I have strong reservations about encouraging people to actually use something like that, but as a learning exercise, knock yourself out )

djsegal · December 17, 2017, 6:08am

Fair enough. Thanks for your help though! And I’ll work on a simple example

Also, the approachis a little more brute force than that:

Keep cycling through all the files’ expression arrays and load every parsed blocks it can
Terminating when no block is added or no blocks remains (in any file’s expr array)

To keep every file happy (and to prevent func overwrite errors):

you remove consts, functions, and structs from all non-module nodes and their subnodes
but keep local variables in place (i.e. you load whole files up to the current expression shard)

// obviously some optimizations are in place (e.g. skipping files with prev undef vars that are still undefined)

And to settle some nerves, i think of Boot.jl as more of a tool than a solution:

it gives you a hook for loading folders robustly.
but still allows efficient loading when needed (obviously with more include/_folder calls)

// one use case is an MVC web framework that wants to load all controllers into its workspace

ihnorton · December 20, 2017, 1:45am

You might be interested in reading about Include What You Use – they’re solving kind of the opposite problem: figure out what headers are not necessary for a C++ file (it’s roughly “what headers are already transitively included,” but there are some weird details and complications due to the preprocessor).

djsegal · December 26, 2017, 8:29am

This seems like a next step item?

I think you have to get down a brute-force (robust) loading scheme before you can move onto minimal loading.

Besides telling developers about unused blocks of code, this could also cache optimum load orders for files.

// this is all conjecture at this point. i just think it’s an interesting problem

Also, in terms of progress on the project, loading modules inside packages has been a difficult process

(especially within Bio.jl where you need to determine if all prerequisites are loaded and everything is defined)

This loading of submodules, though, could at some point be brought up to the package level and probably accomplish what you’re talking about.

ihnorton · December 26, 2017, 6:00pm

I’m not sure if there is a question, here so please let me know if I missed it!

FWIW, I would expect this to be fairly difficult, if it is possible. You probably need to really constrain the problem or do some semantic analysis and figure out how to determine ordering for minimal examples (like only type definitions). It’s not quite clear to me if the Boot package is trying to:

a. discover implicit file load-order dependencies? (there is a necessary internal ordering – you just don’t want to have to write it down)
b. eliminate ordering entirely (so anything can be defined in any file and Boot figures out what comes first – this sounds impractical to me. Are you aware of any other languages which do something like this?)

djsegal · December 26, 2017, 6:47pm

Doesn’t rails load everything? Or at least give you the means to efficiently load everything (simply):

edit: from that article, maybe they’ve constrainted the problem to be solvable. i will say though that i’ve had some success adding Boot.jl to PyCall, JuAFEM, Tensors, Gadfly, IJulia and Mocha (i.e. the tests pass)

Also to tie some stuff together, this is why I asked:

Is there a way to make module in anon workspace & then ingest it into a package?

If there was a way to load submodules outside of the parent module and inject it, the problem might be easier

ihnorton · December 26, 2017, 7:47pm

Neat - TIL! (I’ve only written a few lines of Ruby, ever, and it was last week trying to hack up something in Jekyll).

If I understand correctly, the constraint there is one-class-per-file, at least by-convention, which is probably a bit easier because every class is a unitary namespace.

Isn’t this effectively doing (a) in my post – discovering the file ordering which already exists by necessity of the design of “default code-loading” (because those packages were otherwise written without Boot).

ihnorton · December 26, 2017, 8:11pm

Which is neat, btw! What I’m wondering is whether that’s the main goal. Clearly you’ve had success, so discover-ordering seems doable (it’s (b) that I think would be hard).

djsegal · January 3, 2018, 10:19pm

Sorry, some of the underpinning load process might have been lost in translation!

All *.jl files are parsed into expression arrays (named unloaded_shards)
Each file is loaded incrementally till an error appears in it
Two arrays are kept for each file: loaded_shards and unloaded_shards
- unloaded_shards are where all expressions start
- loaded_shards is a collection of cleaned up shards that have successfully been loaded*
  - innocuous lines like a = 1 are kept
  - but structs, methods, macros, etc are removed (to prevent reloading)

*loaded_shards are reloaded every attempt at loading from unloaded_shards

This brings us to the interesting part:

what specifically do you think is impossible in the (b) approach?
if the loading was flipped on its head to attempt (a),
- do you have any gut instincts on ways to attack the problem?

// I thought the (b) approach would be easier, even though it’s (a) that we’re after

edit: thought it might be nice to include another reference to autoloading:

ihnorton · January 4, 2018, 4:33pm

Think about the worse-cast complexity of unordered “shards”. Think about the semantics of commands with side-effects (it seems like the order would be essentially non-deterministic from load to load).

Each file reduces to a set of definitions (types, functions, constants) paired with a set of uses. The simplest comparator is “a given file’s definitions must come before another file’s uses” (assuming no methods are actually called during loading…).

In that case it seems much easier to just write the ordering down . Otherwise you and the loader are playing a weird game of telephone for uncertain benefit.

That page seems kind of circular to me. Basically it reads as:

“files” define a specific interface (a class for PHP; a function for IDL and older Matlab; a +x file for the Unix shell; etc.)
you tell your compiler where to find things (load path)

But by that definition, Julia and Python have auto-loading for modules (assuming you have your load path set correctly, of course). C even seems like it has auto-loading for includes (again assuming you’ve told the compiler where to look!).

In that case, for your loader or packages you could enforce a rule similar to IDL in that example (and older Matlab versions), which is “one global per file with matching filename”. Then the problem is easy. But that sounds really annoying to use (and it was really annoying in Matlab). Personally I would rather just write down the ordering explicitly, but I suppose it’s a matter of taste.