A newbie here, from OOP background. I am looking to put together a project which will:
Download data from Internet
Do some manipulations on the downloaded data
Do some analysis on the downloaded data
Please note the downloader, analysis would be reused across other projects as well.
I have been reading up Julia docs and looking at how other Julia projects have been implemented. In Java, I would have organised these into separate packages and class files.
Create Modules: Julia docs suggest modules as a way to organise code in coherent units. it also seems to offer advantages like precompilation etc. When should I use Modules? what are the best practices?
Create multiple files and “include” them in another file. for instance, the Indicators.jl project.
Considerations while selecting an option:
Performance: this is key
Debugging: since I am new, I would like to be able to debug quickly
Readability and Reusability
Code defined in one file would be used in another file.
please do consider that this project would/could become quite big and comples
I am wondering which is the best practice on organising my code.
The most common pattern is a single module per package, but split up the implementation into as many files as you want, and just include them from the top-level MyPackage.jl file. (Each file is included once. It is no problem to call code from one file in another file of the same module.)
The reason to use submodules within a package is if you are running into namespace collisions from different parts of your implementation, but you can always introduce submodules later if this becomes an issue.
Generally, split your project into multiple packages once you have functionality that could reasonably be used on its own. You can always do this later.
(Use Revise.jl during development, so that you can edit the package and execute the new code without having to re-load the whole package.)
Presumably, they want to do a similar analysis more than once — if @roh_codeur is asking about large-scale code organization, they must not be talking about a one-time throwaway script. If they have a non-trivial amount of code, separating it in a package makes it easier to re-use in multiple different projects.
That is, create a MyDataAnalysis.jl package (hence a module) that has the re-usable parts of the analysis and data acquisition that you want to do. Then, for any particular project or task, write a script or Jupyter notebook etcetera that does:
using MyDataAnalysis # & other packages as needed ...
#... acquire some specific data ...
data = ...
# run some analysis
To selectively execute code (i.e. re-use pieces), the best approach in the long run is generally to separate it into functions that you call individually, not by commenting out or selectively executing sections in a big script — that way lies madness for large projects.
For small throw-away scripts, of course, I agree — just make a Pluto/Jupyter notebook or similar and edit/execute it in chunks until it does what you want.
@stevengj : thanks for your suggestions and the links. I will work on a solution and post back. the only bit I am not sure how debugging would work with “include” files in vs code. I will give it a whirl.
thanks to everyone who pitched in, this is quite useful indeed. So far, I have come up with the below:
Base.@kwdef struct SomeType
println("using sometypess $someType")
df = DataFrame()
export SomeType, someFunc, runModel
baseDir = @__DIR__
@info "Starting in $baseDir"
See how I can remove the push! statements in the script
I haven’t had a chance to look at all the resources, have ordered the book. I am afraid I am still not clear on benefits of creating each file as a Module vs include in the files. the only bit I can see so far is that if I wanted to use MyUtils in a different project, I wouldn’t be able to use it, since it uses my-types.jl. so, if I wanted to reuse across projects, I will have to create each file as a module. alternatively, follow the same pattern as above. if I created each file as a module, it takes a while to load the file.
3, Does it have any performance impact using module vs include approach? I expect my project to be a few thousand lines in code alteast.
While looking through the documentation, I found the below link. I feel this answers most of the queries, except the one about performance, i.e. performance benefits of one module with include of multiple files vs multiple modules.