A nice collection of tutorials is GitHub - PaulSoderlind/JuliaTutorial: Julia Tutorial for Finance and Econometrics Students
Fair enough. I agree with @jlperla that there is a massive benefit to a centralized econ organization which provides these useful tools with good documentation and tutorials (though SciML benefits from lots of pharma money, and I donāt think there is a demand for niche performance tools in the private sector of economics)
Itās still a nice package that hopefully can help out OP.
Not sure about that. But I will do one last list of issues for those who might care.
People primarily driven by research (which might not even use Julia if they change the research direction) probably arenāt the key here.
I think what would make a huge difference would be someone with professional coding experience and a big open-source software grant who can go through and fill in the ecosystem. The SciML crew is doing their best, but it is an enormous undertaking and they have papers to ship. On that note, it seems to me that the backbone of many open-source projects out there are people who are not pure researchers (e.g. researchers from organizations who appreciate this sort of service work, teaching faculty, research software engineers) who apply for big grants to do the work.
If anyone out there loves this kind of stuff, they should apply for grants (I suggest working with the SciML organization in doing so - and if you are relatively junior they could even be a PI). Unless one has a big tech company who requires your platform for their research and operations (e.g. pytorch or jax) somebody needs to pay for professional software development. Search up on the funding of jupyter+binder/jupyterbook, Stan, R, numpy/scipy, etc. sometime to see the resources they have dumped in.
Any problems in the julia ecosystem are about resources (and ensuring there is focus given limited resources), not the language itself.
First, a shout out to all of the people (e.g. @tbeason and @mcreel and many others) who have created amazing tutorials. I think those are all necessary but not sufficient for success.
As I said, my perspective here is that the source of issue is not training, discovery, or (directly) documentation. Instead, it is about inconsistent code quality and uniform code interfaces/testing/benchmarking/maintenance of key packages. Economists may have less training
in general, but even the ones who have plenty of experience (and are even told which package to try!) are having trouble. That certainly includes me.
Of course, people need to use different algorithms in different circumstances and there is no way to avoid teaching them those tradeoffs but with julia there is a signal extraction problem. Is it failing because you chose the wrong package, there is a bug in that package, a bug in an upstream package you didnāt know existed, because you used the package wrong, or because it wasnāt the ideal method for your problem? It is hard to triage that - even for people who know what they are doing.
In something like matlab it just amounts to changing flags in the optimizer and/or supplying gradients to choose between different algorithms. It wonāt have bugs and you wonāt be calling the interface wrong, so you can focus on trying out different algorithms. No signal extraction problem or triage process required. Similarly, in matlab or python if I want to try various types of interpolate I donāt need to search around between 5 different packages - each with their own missing features and quirks and possible performance regressions.
Not to say that these happen all the time and with every package, of course. But it is a lot more frequent than other environments.
And just to be clear: key packages here could just call out to existing C code like NLopt or Dierkx if it was found to be the most robust. Julia long ago fixed the binary dependency installation woes so there is no reason to fear binary dependencies and plenty to love about them. It means someone is maintaining a separate implementation which probably means more testing! Pure Julia implementations sometimes have benefits (e.g., it can achieve better inlining, work with static arrays, handle generic types better) but the baseline Float64 vector versions can all call out to C or Fortran and a user wouldnāt care. Even auto-differentiation often works fine with external packages because you want standardized custom AD rules (e.g. https://sensitivity.sciml.ai/dev/ calculations donāt require you to use the primal calculation in julia).
What matters, then, is having a consolidated interface to swap algorithms and good coverage/maintenance of essential algorithms. And to deprecate packages from that common interface when they fall into disrepair.
I think the first step is to focus. As far as I am concerned, what everyone is doing with DataFrames/FixedEffectModels/etc. is all awesome. I donāt see any reason to push hard on that because it isnāt Juliaās bread and butter. If all someone is doing is data work where specialized estiamtion pacakges already exist and they know R, they probably should just use R rather than switch to Julia.
If someone knows julia really well or the data stuff is a small part of a project, then I think those packages are all on the right track. Just make sure they donāt break (e.g. CSV woes earlier this year).
For bread-and-butter scientific computing there is no reason that Julia should be more frustrating than matlab/scipy. From my side, the high level list and coordination device for this is mostly there: https://docs.sciml.ai/dev/
The problem is that this is a lot more than just listing out packages and adding in documentation.
- Some of the things listed (e.g.
Interpolation
) are deceiving because the interface doesnāt handle the diversity of the tasks.- The
DataInterpolations
package does what it does well, but it doesnāt cover multi-dimensional interpolation. But it isnāt like you can just switch toInterpolations.jl
because it doesnāt support irregular grid cubic spline - etc. - So you need to go through and ensure all of the interface support the variations that are required without loss of structure (e.g. you need to distinguish between a regular and irregular grid) and then ensure all of the backend packages actually work, then benchmark and document the algorithm choice in different circumstances. Hopefully documentation + benchmarking + downstream tests will keep the ecosystem in check without monorepos.
- The
- Others might not implement the algorithms that economists really need and additional backend algorithms or features are required
- For example, in https://integrals.sciml.ai/stable/basics/IntegralProblem/ there are great options for adaptive quadrature schemes, but what many economists really need are fast, low-dimensional gaussian quadrature. The reason is that these things are often evaluated in an inner loop and often are implementing expectations of some functional form.
- Somehting like GitHub - QuantEcon/Expectations.jl: Expectation operators for Distributions.jl objects could be deprecated and rolled into the integrals setup, but it would take work.
- The https://docs.sciml.ai/dev/modules/Optimization/ wrapper is essential, but it is just getting started. Finding a consistent call interface and solution return type is tricky in that domain.
- As one small examples, doesnāt have good ways to swap the
sense
for all of the optimizers consistently. - It doesnāt have the interfaces to deal with constrained optimization especially well - especially if you want to use AD to get a sparse jacobian of the constraints and the objective separately. There is no support for Complementarity conditions - which are essential in a lot of economics applications.
- For people solving ārealā problems, they often find the commerical optimizers significantly better. We need to show those examples and test things like accessing with Knitro (and add in software if the Optimizers.jl to MOI interface doesnāt work).
- This needs tonnes of tests and examples to ensure that it works seamlessly - which is a lot of work because every optimizer has different arguments and return types.
- It is necessary to reconcile it with other great ecosystems such as JuliaSmoothOptimizers and it isnāt obvious how to achieve that.
- As one small examples, doesnāt have good ways to swap the
- Showing consistent patterns of new things that are available by using Julia is a lot of work.
- For example, when and how can you use symbolic expressions and how does it generate jacobians/gradients/etc.? How to use āLabelledArraysā in each of them (if appropriate).
- How can we get gradients of solutions with respect to parameters, and what does that open up.
- Also, let me point out the importance of gradients.
- The biggest benefit that economists can get is to start providing gradients to their algorithms, which lets them use new classes of algoritms. In principle, with things like Symbolics and upcoming projects like Enzyme it will be far easier to do that then anything in matlab or scipy.
- Furthermore, people may realize that if they can differentiate their components of the model with respect to the parameters themselves through smart use of AD rules, then Julia can enable whole new clasess of algorithms and methods.
- On that note, cross your fingers that Enzyme.jl works out, and provide as many bug reports and use-cases to that project as you can when it is ready to be released into the wild. Zygote is coming towards its end-of-life and ForwardDiff is limited. Without good AD the best Julai could hope for is to match the stability of python/matlab packages. With good AD and the combination of symbolics+AD it can vastly surpass it.
So my list is basically that of Chrisās (and I am purposefully leaving out things like linear regressions which seem on track for now). Do things like pick up a copy of CompEcon ā Paul L. Fackler and ensure that all of the key general-purpose algorithms are covered, etcā¦ I think it starts with getting the downstream packages solid and maintained for all of the things people need, and then the next step is filling in that documentation/unit tests/benchmarking. In that process people will find a lot of bugs.
For sure. But SciML is an amazing organization that is willing to shoulder the burden of consolidating the scientific computing ecosystem here - and there are many complementarities and spillovers from pharma applications. It just isnāt a perfect fit, which is why it wonāt happen without our involvement.
Just to repeat my statement at the top: package quality/documentation happened with Matlab because academics paid $$$ to them to write/test/document/maintain packages, and it happened with scipy/jupyter/etc. because someone went to ask grant agencies for a lot of money. My understanding is that these didnāt require huge numbers of people, but rather they required a few talented professional developers and enough time to focus rather than writing papers. If that is of interest to anyone who wants to take on components of the ecosystem and they can dedicate the time, then ask agencies for money (within the SciML umbrella)!
Sorry for the long list here. It is my last because I think it fully summarizes my feelings on where the source of the real issues are and the best approaches to fix them. Listen to other economists opinions as well (especially economists giving a āI tried Juliaā report), since I may have a limited perspective and already know julia too well.
What do you have in mind? At least both R and Julia are 1-based (and column-major), so seem like a good match. Iāve just tried to use RCall.jl but donāt use R regularly. I however also use Python (and through PyCall at the time, would now use PythonCall), and with Python (and e.g. C) 0-based, then that could be a source of bugs (as with OffsetArrays.jl). Python and C/C++ being row-major however, when using with or converting code to/from Julia, wouldnāt be a bug, only a performance pitfall to know about.
I would have thought RCall, with its R REPL mode being pretty sweet, for those knowing and sometimes preferring that language. [Is such also needed in PythonCall?]
I see (non-default, unlike for PythonCall that download Python for you by default):
When
R_HOME
is set to"*"
, RCall.jl will automatically install R for you using Conda.
@cjdoris PythonCall is pretty sweet (compared to PyCall), in the sense that it take care of Python package dependencies for you (or indirectly with Conda/Mamba). I suppose RCall could do similar, and even reuse the same infrastructure, to make installation transparent. I donāt know what most R users use for CRAN, Conda (or RStudio Package Manager, which also supports PyPi)?
Iām a big believer in using Julia and other languages, e.g. Python or R (for its libraries, but also Julia from other languages, if people can be persuaded to do so). Possibly we need more wrappers for their code?
@y.lin can you clarify:
The reason is that while Stata offers much better algorithms than any of the four languages, users engage with algorithms written by Stata rather than writing their own.
Which one or more algorithms to you have in mind (just curious what could be improved in e.g. Julia)? Iām a bit surprised that slower or somehow worse elsewhere (in other more established languages, e.g. R). Did you have something else in mind, than slower, something workflow-related and āalgorithmā then not right or misleading term?
Thanks for all the advice!!! Iāll definitely explore all the suggestions you made in the next few days, when Iāll retake the computation of the model.
Coming back to the topic here, I think there are two matters:
- how to improve packages, which @jperla talks about and I canāt say anything meaningful about it.
- how a new user could learn these tools more smoothly. Iāll refer exclusively to economists (rather than for general advice for any type of user), since others could have different needs.
Iāll refer only to 2). I cannot give a solution, but provide an example of a path I think would be worth exploring.
First, let me emphasize some aspects. I imagine anyone who is reading this post is interested in the topic. By this, I mean that all of us will invest enough time until it learns the tools. However, itās not that type of user that Iām concerned about. Even, I know Iāll spend weeks and months until I learn all the tricks.
Rather, Iām thinking of a new user, who doesnāt necessarily have time or will use Julia/R every day. And even for people who have to learn the tool, itās about making these peopleās lives easier and simultaneously not getting bad habits. Reaching a broad audience is what you eventually need to support Julia, since it creates the necessary scale to attract financing. So, in this sense, itās also related to what @jlperla was saying.
Second, my perspective is not about defending Julia. As this postās title, Iām thinking of whether people should use Julia/Python/R.
For the example, I imagine that someone asks me whether he should learn Latex, and how to do so. Given my experience with Latex, this is how Iād tackle the matter.
First, Iād ask him āwhat will be your use?ā If youāre aiming at industry/use it too sporadically, donāt learn Latex. I donāt think youāll need it, and quite likely everyone will be using Office/LibreOffice. So, learning a new tool is always good, but the time is finite, so I donāt think itās optimal.
Second, assume youāll work in academy or any sector that uses a lot of math. Then, learning Latex is a valuable tool. So, the question here is: how to spend your time learning, without wasting too much time on finding an answer? Thereās no unique answer, but itās really valuable to know different peopleās experiences.
What would I recommend? I use a hybrid method between Lyx and pure Latex. I use pure Latex for the preamble and some specific functions. Instead, Lyx allows me to use keybord shortcuts. Right now, I press ctrl+o
for all the operators I commonly use, ctrl+g
for all Greek letters, etc. You can even modify its menu and calling it with keyboard shortcuts. For instance, if I press alt+1
, I get the menu āDecorā I created for my version of Lyx:
or you can press ctrl+d
and get:
Why Iām recommending this? Because youāll increase your productivity. For instance, I do all the algebra in Latex. Yesterday, I literally wrote 60 pages of pure algebra.
And if you need pure Latex, e.g. to create an environment, you can do it. Moreover, Lyx allows you to create then a keyboard shortcut for that environment (menu āRemarkā in my case).
Point of the example: This way of using Latex worked for meāitās not the truth, but just an approach that suited me and could work for others. Having notes with different opinions is more valuable than discussions about whether Julia is faster than Python. And I think laying out the motives behind the recommendation is key: it can let you know whether the approach suits your needs. For example, people used to tell me that I should learn pure Latex. Why? just because. And, until I found Lyx, I was finding that was answering my concerns about productivity.
Right now, regarding the choice of Julia/R/Python, I feel like Iām spending more time finding those opinions. You only tend to find discussions oriented to package creators/experienced programmers, which are not key for new users from economics. I think I started using Julia for the wrong reasons, but kept using it because of good reasons (eg, I chose it because of its speed, but stayed because it has a more natural syntax for scientific matters).
Maybe Iām stating the obvious, but itās generally a good idea for new researchers (=PhD students) to ask their advisors about choice of software, as the advisors will have a better understanding of the problems that the student will be facing (compared to, say, people on an internet forum, or the authors of a VoxEU column).
Although thereās also a good chance you then end up progamming in FORTRAN or GAUSS
Yes, that thought crossed my mind as well, but I wouldnāt feel that itās right to recommend a student to disregard the instructions of their advisors even in that case. Iāve seen advisors (in structural labor, where Fortran is still being used a lot) go over their advisees Fortran codes to an extent that probably would not have happened if they had chosen a more modern language.
Bottom line (and then Iām out), when you decide which language to learn to do economics research, Iād advise speaking to people who understand what you want to do (in the medium term, and in the long term), what your background is, and what your situation is (including the support you can get from advisors, colleagues, friends, documentation, and the internet).
Actually I always wondered why there werenāt more structural people around in the Julia community, it seems their usescases would be well addressed by a language that allows coding high performance models from scratch (not unlike quant macro)? Iām aware of the pyBLP project, something like that feels like it would have had a more natural home in Julia?
Macro, macrofinance, and corporate finance structural models are beginning to pop up in Julia. I think you are right that micro models like demand estimation that you mentioned have not come around as much. I would consider it only a matter of time.
I get 2.85x speedup on top of your 2.6x for 7.4x over @y.linās, and yet 2x more should be possible. I was trying out different things (e.g. log/exp to get rid of division, scroll to right to see), that didnāt pan out, and also just lower precision (did work but uncovered a minor bug in LoopVectorization):
julia> @time ( function likelihood4(o, a, b, hā, y2)
h = zeros(Float32, length(y2)) # h = similar(y2)
lik = Float16(0.0)
h[begin] = hā; my_range = eachindex(y2)[begin+1:end]
@inbounds for i in my_range
h[i] = o + a * y2[i-1] + b * h[i-1]
end
@turbo for i in my_range
lik += log(convert(Float16, h[i])) + convert(Float16, y2[i]) / convert(Float16, h[i]) # log_h = log(h[i]); lik += log_h + āÆ^(log(y2[i]) - log_h)
end
return lik
end )
0.000485 seconds (54 allocations: 9.308 KiB) # that timing is bogus!
julia> @btime lik2 = likelihood4($o, $a, $b, $v, $y2)
22.606 Ī¼s (1 allocation: 11.88 KiB)
26360.732f0
julia> @btime lik2 = likelihood2($o, $a, $b, $v, $y2)
64.436 Ī¼s (1 allocation: 6.00 KiB)
26360.52849330753
@elrod, For some reason, I get Float32 number back, but should get Float16 back, and do if @tturbo
or @turbo
arenāt used, so thatās the minor bug (could lead to type-instability elsewhere, not a bug, just inconvenience).
I see you updated the appendix:
Post publication p.s.
After our article came out, Jorge Vieyra suggested here some ways to speed up the Julia calculation. It makes use of the @turbo macro [ā¦] While we think this proposal is interesting, it is not a fair comparison with the calculations above because they depend on specialist knowledge [ā¦]
Note how @turbo requires rewriting the likelihood function by summing the likelihood separately
I did first:
y2 = rand(Float16, 3000); y = y2; # the rest is the same, I didn't have "data.dat" or sensible data, not sure it matters, or the size)
@time using LoopVectorization
3.918259 seconds (5.64 M allocations: 372.844 MiB, 36.23% compilation time: 80% of which was recompilation)
To be fair, that time alone swamps the runtime of all the actual likelihood functions, but you should be able to eliminate that fixed small startup cost with a sysimage. Actually compiling the code took also almost 10 sec, for combined 12.44 sec. while compiling the original likelihood function was immediate. That compilation overhead should also be possible to reduce to zero. Itās unclear to me if
Note, for the original @time function likelihood ...
is immediate, and LoopVectorization is missing some precompilation statements, at least if the code were in a package code would be precompiled (not fully for packages, a TODO for Julia, but I think already if that package were also in the sysimage).
Is all this āspecialist knowledgeā, I suppose so for the sysimage, how to reduce latency, and reducing precision. But you only need to do it once in your code (for the h array). For the y2 input array, itās best itās already in the reduced precision or youāre just shifting work around. For some code, that not O(n), it might still be worthwhile to make a lower-precision copy (e.g. in the function), for other stuff, it would be best to handle with something like:
readdlm(Float16, "data.dat") # Not (yet), possible, Float64 is a sensible default, and I'm not sure this is possible with some other syntax, likely in some other package, CSV.jl?
āHalfā (aka Float16) isnāt defined in C standard (so unfair to use? I donāt think so, just Julia better). It is available for, and at least in GCC extension.
Thank you all for the thoughtful discussion. If we can insert a link into the published VoxEU piece with a curated list of computational resources for economists, what would you recommend? If no such list exists, we would be happy to host it on our server if you send us recommendations, but it would be better if there was a GitHub page we all could contribute to.
We are here! I think that a lot of this is inertia. More senior people tend to use fortran still, but thereās a solid contingent of younger researchers who mostly tend to use julia. Almost all of the structural people in my phd cohort (who just went on the market this past year) use julia for instanceā¦
That is very fair. I will try to draft some language for that, although, after thinking about it, I think it would be better suited for the documentation in Pkg.jl as a separate PR. Will post back later.
Also, to refresh my memory, if you are using a Project.toml to manage the test
environment of some package environment (Iāll call mypkg
), do you not have to specify dependent packages in the test
Project.toml even if they are used in the mypkg
environment per you point in 1? My understanding was that, yes, you can have conditional test dependencies that are only available in a test
environment but that the activated test
environment could not āpull inā packages from the mypkg
environment without explicitly adding them (except for mypkg
itself). I may be misunderstanding your first point but I do understand your second.
Thanks!
I am very confused about the test
environment, especially after I have read the Pkg.jlās documentation which says if I remember correctly test
environment is experimental. Therefore, the tutorial you have drafted shall help me a lot. I never actually have faith in using test
functionality correctly.
I tend to run include("any_test_scripts.jl")
in mypkg
environment instead of using test
command.
Hopefully relevant enough to necropost: just come across this
https://nittanylion.github.io/Grumps.jl/dev/
which implements a cutting-edge estimator for BLP-style models described in this paper:
so it seems that yes, structural IO people are using Julia after all!