Best way to specify a filename inside a function

I’m writing a package containing a function that will load a data file, let’s call it data.txt. I’m assuming the user has already obtained data.txt from some other source, and the file exists somewhere on their computer.

What’s the best way to tell myfunction() where on the user’s computer to find data.txt?

Here’s how I would do it if I knew where users put their data:

function myfunction(input1,input2)

    filename = "path/to/dataset/data.txt"; 

    f = open(filename); 
    # Do some cool stuff...
    close(f)
end

I could alternatively make filename one of the inputs to myfunction; however, I don’t want to force the user to type a long and difficult-to-remember filename every time they use my function.

Is there an elegant solution that would allow users to point myfunction() to the correct filename one time, then never have to think about it again?

1 Like

I don’t think it’s very elegant, but a simple solution would be to have the user declare the file path in an environmental variable. See Environment Variables · The Julia Language

Preferences.jl allows something like this:

function set_datapath(path::String)
    @set_preferences!("datapath" => path)
end

function get_datapath()
    return @load_preference("datapath")
end

If data.txt can be downloaded from a fixed location over https, DataDeps offers a more robust & elegant solution.

3 Likes

Tell your users to define a variable filename = ... and then pass this to myfunction(input1, inputs2, filename). That way, they only have to type it once. (You could even put this in a TOML file or similar and load a whole dictionary of preferences at once, ala Preferences.jl as suggested above … but I would still recommend passing that preference dictionary explicitly rather than using globals.)

Of course, you could alternatively use global state, e.g. an environment variable as suggested above, or better yet:

default_filename = nothing
function myfunction(input1,input2; filename=default_filename)
    global default_filename = filename # remember it for next time
    # ... do stuff
end

so that you only have to pass the filename once and it “remembers” it on future calls. Or the user could even just set default_filename directly.

But this style, while tempting, leads to fragile software. It means that the behavior of myfunction can be unexpectedly altered by changes in hidden global state. It makes it hard to compose different pieces of code because they can suddenly interfere with one another. Google “global variables bad” for lots of discussions of why you should be cautious about using global state in any language.

This is not to say that you should never use global state. There are situations where it is impossible to avoid, e.g. system-configuration details that can only have one set of values in a given Julia run. But it is good to be wary of globals, and it is good to have an “escape hatch” where the user can pass an explicit parameter to override the global default (as in my example above; see also the rng parameter to Julia’s random-number routines).

3 Likes

Have you thought about using a pop-up window to navigate and choose the file? For example:

using NativeFileDialog

function myfunction(input1, input2)
    # code
    filename = pick_file(filterlist = "txt, TXT")
    # more code
end

myfunction(1, 2)
6 Likes

Thanks for the suggestions, but as I understand it, users would then need to manually enter "path/to/dataset/data.txt" every time they call myfunction. The goal is to let users enter this file path just once, and store that information somewhere, even if the user restarts their computer, or if the package repo gets updated.

If I could, I would simply tell users to “Edit the filename on line XX of myfunction.jl”, but Julia functions are read-only, and I don’t entirely understand how future changes I make to the package might (or might not) overwrite any manual changes the user makes.

If I add a line to Project.toml like:

filename_for_myfunction = "path/to/dataset.txt"
  1. Can users manually edit the path?
  2. Will users’ manual edits be overwritten if the package repo gets updated?
  3. How do I get f = open(filename) to point to filename_for_myfunction in Project.toml?

This sounds like a configuration file. Reddit - Dive into anything If there’s not enough state to justify saving to a file, perhaps the user can set an environment variable.

The environment variable solution seems like it’s the way to go. I could easily tell the user to type this into the REPL:

ENV["MyFilename"] = "path/to/dataset/data.txt"

and when I do it, it works beautifully at first:

julia> ENV["MyFilename"]
"path/to/dataset/data.txt"

but when I restart VS Code and try to access MyFilename, it has now vanished:

julia> ENV["MyFilename"]
ERROR: KeyError: key "MyFilename" not found
Stacktrace:
 [1] (::Base.var"#623#624")(k::String)
   @ Base ./env.jl:79
 [2] access_env
   @ ./env.jl:43 [inlined]
 [3] getindex(#unused#::Base.EnvDict, k::String)
   @ Base ./env.jl:79
 [4] top-level scope
   @ REPL[6]:1

Is there a simple way to make an environment variable permanent?

Put it in Julia’s startup file maybe? On Linux, it is ~/.julia/config/startup.jl.

You can also set the variable in the shell when invoking Julia:

$ MyFilename="path/to/dataset/data.txt" julia

Sounds like you want a to set a global package preference? See GitHub - JuliaPackaging/Preferences.jl: Project Preferences Package

Basically, the question is whether you want this file path to be globally set for every time they use your package in any context, or globally set per project environment, or whether it is local to some script, or …

1 Like

I’m excited because this seems like a very simple and straightforward solution, which won’t add the clutter or potential breakpoints of extra packages like Preferences.jl. The only hangup now, is how do I actually open startup.jl? I feel like I’m missing something obvious.

julia> @edit("~/.julia/config/startup.jl")
ERROR: expression is not a function call or symbol
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] top-level scope
   @ REPL[13]:1

julia> @edit startup
ERROR: expression is not a function call or symbol
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] top-level scope
   @ REPL[14]:1

julia> @edit startup.jl
ERROR: UndefVarError: startup not defined
Stacktrace:
 [1] top-level scope
   @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/InteractiveUtils/src/macros.jl:72

Use edit instead of @edit. Also, it seems that edit does not map ~ to your home directory, so you may have to run edit("/home/<username>/.julia/config/startup.jl").

1 Like

Someone has suggested changing the julia startup file, but in general (ie., independent of Julia) persistent environment variable settings must be made somewhere like .profile on Unix-like systems or in system properties in Windows.

I’m not sure from your question whether data.txt is the same for everyone or different for each user? If it’s the same maybe you could just distribute it with the package as an artifact?

https://pkgdocs.julialang.org/v1/artifacts/

@chadagreene if you don’t want to mess with environmental variables, what about having a function that reads in a dependencies txt file from the package location with all of the unique paths that are needed from the user. If the path is not found -or- the path is to the wrong file then the function prompts the user to input the correct path which the function then saves to the dependencies txt file? Maybe this already exists somewhere or is simply duplicating suggestions from above.

That was my first line of thinking, but aren’t files in Julia packages read-only? Also, the location of files in a Julia package is pretty opaque (kept in hidden folders), which adds an extra layer of burden that I don’t want to put on users.

This is why you should use Preferences.jl, which is the official solution for package-wide preferences.

Having a dependency on an additional package is extremely easy in Julia and is mostly transparent to users. I’m not sure what you mean by “clutter” … it doesn’t even appear in the package status listing if you only list something as an internal dependency of your package. (Indeed, since Preferences.jl is currently required by 4376 packages it is probably already installed on most people’s machines.)

As a centrally supported solution (Preferences.jl is by a core Julia developer and is used by thousands of packages), it’s much less likely to lead to breakage than something you cobble together yourself in your first few weeks of using Julia.

6 Likes

If you expect users to call the function a bunch of times for the same file input but potentially expect users to also want to do this with multiple files (and really don’t want to add a dependency), you could have an inner function that returns a closure:

function myfunction_forfile(filename)
  return (args...)->begin # this should probably be in a let-block
    f = open(filename)
    # do some cool stuff...
    close(f)
    return #something
  end
end

myfunction_f1 = myfunction_forfile(filename)
# now call like normal:
myfunction_f1(args...)

Or you could create a struct that carries the information of the file and has the appropriate methods:

struct FileForThing
  # fields
end

function FileForThing(filename)
  # open that filename, parse it or whatever, put what you need into a struct
end

function (F::FileForThing)(args...)
  # do your cool stuff
end
 
# now like this:
struct_with_file1 = FileForThing(filename)
# call like normal:
struct_with_file1(args...)

In both of these cases, your user just needs to put the file name in once and then they don’t have to deal with it again.

But depending on how often you expect that file to change or how often the function that depends on it gets called or whatever, this would probably make more or less sense. Unless you expect users to depend on several different files dynamically, it’s probably better to just use Preferences.jl as suggested above.

I appreciate you pressing this, as I’m trying to learn how to think in a more Julian way. I suspect that Preferences.jl might ultimately be the best way to solve this problem, but I’m trying to understand why such a complicated solution is necessary for such an incredibly simple task.

I’ll also note that part of my reluctance to using Preferences.jl is that I don’t totally understand how it works. It’s sort of a black box to me, so I’m just a little unsure about depending on it.

More stuff. More dependencies. More moving parts. More things to go wrong. More packages to load. More code to maintain. More questions about how different users of my package might experience different issues, depending on their setup.

It’s the KISS Principle or Occam’s razor: I have a very simple task to accomplish, and dependence on extra packages that I don’t understand feels like I’m violating the “keep it simple” rule.

No doubt! But even incorporating Preferences.jl feels like one extra thing I could screw up. That’s why I was hoping for a simple solution that wouldn’t require extra packages for such a simple task.

The solution to this exact problem in Matlab has always been straightforward and robust: I tell users to type open myfunction into the command window, and I say “edit Line XX of the function to indicate the correct file path to the data file.” And that’s it–it’s easy for users and doesn’t require installing extra packages or anything overcomplexificated. I guess I’m a little bit confused about why such a simple task would require any extra packages or “cobbling together” anything in Julia.

So again, I will probably end up using Preferences.jl thanks to your suggestion. It’s probably the best solution available, but getting comfortable with it require a learning curve.

:astonished: In my 30+ years of commercial software development, robust and telling users to edit source code have never gone together. But then again, I’ve never worked with Matlab.

5 Likes