Custom module to import, manipulate and export dataframes issue

I can’t reproduce this - are you sure you didn’t change your module and were still running an old version of the code when you got the error?

shell> cat "Documents/Julia/dfprep.jl"
module df_prep
using Pkg
Pkg.add("DataFrames")
using DataFrames
function df_preps(df1::DataFrame)
    exported_df = df1[findall(in(["b"]),df1.a),:]
end
export exported_df, df_preps
end

julia> include("Documents/Julia/dfprep.jl")
    Updating registry at `~/.julia/registries/General`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
Main.df_prep

julia> using DataFrames, .df_prep

julia> df = DataFrame(a = ["b", "a"], b = ["a", "c"])
2×2 DataFrame
 Row │ a       b      
     │ String  String 
─────┼────────────────
   1 │ b       a
   2 │ a       c

julia> df_preps(df)
1×2 DataFrame
 Row │ a       b      
     │ String  String 
─────┼────────────────
   1 │ b       a

A few additional comments:

  • I’m assuming your actual df_preps function is more complicated, but I’d say in general it is uncommon to have a separate module for a single function. Are you coming from Matlab where all functions live in a separate file? You could have just defined that function inline.
  • If you do stick to a module, no need to export exported_df - it’s just a local variable inside the df_preps function so exporting it won’t work (calling exported_df after including your file just gives an undefined reference error)
  • It’s also unusual to have modules to package management like you do above with adding DataFrames. If you think df_preps should have its own dependencies, you should probably turn it into its own package with a Project.toml file. Otherwise you can just using DataFrames in Main and remove all package operations from your module.
  • On your function itself, df1[findall(in(["b"]),df1.a),:] seems an awfully complicated way to express df1[df1.a .== "b", :], or alternatively using DataFrames functions filter(:a => (==)("b"), df1). You also don’t have to assign this to a variable exported_df given that you never use that name anywhere else. Most style guides for Julia recommend an explicit return at the end of a function. Finally, there’s no need to type annotate your function like df_preps(df1::DataFrame), unless you want to define other methods df_preps(df1::SomeOtherType). Julia will always specialize on the concrete type of df1 passed to the function, so there’s no performance benefit to the type annotation.

To summarize, my module would probably look like this:

module df_prep

df_preps(df1) = df1[df1.a .== "b",:]

export df_preps

end

although of course in this case I would have just used this one line directly in my main script rather than writing any functions or modules…

2 Likes