Hello,
I mean this Python package
Thanks.
Which part are you interested in? Unlike Python, out-of-the-box Julia supports reproducible environments, so if that’s your goal, you don’t need a package.
Thanks Tim, I am interested in the part in which you generate a paper PDF with a link to the specific script that generates each plot. For example, in Python the script is included like this:
\begin{figure}
\begin{centering}
\includegraphics{figures/mandelbrot.pdf}
\caption{This is a pretty visualization of the Mandelbrot set.}
\label{fig:mandelbrot}
\script{mandelbrot.py}
\end{centering}
\end{figure}
I have used the jlcode
package, with some tweaks. Here is an example:
And the repo with how to use the code: GitHub - m3g/jlcode_example: Example of the use of jlcode and the JuliaMono font to write Julia code in LaTeX
Thanks for this resource.
Here are a few more packages you may be interested in to show output alongside code:
- Literate.jl: generate markdown from Julia script
- Weave.jl: generate PDF or HTML from Julia script
- Pluto.jl: web notebook that can be exported to HTML or PDF
Note that in Julia it is best to put most code in functions rather than scripts to obtain good performance. So instead of referencing a script mandelbrot.py
, you would want to write and then call a function plot_mandelbrot(...)
. (That function definition may be in the same file or may be in a separate included file.)
To take your organization one step further you may want to define all your functions in a package hosted on Github. (There are templates set up to make this easy.) Then you could either just reference the package URL in your output figures, or you could use weave
on a small script containing only simple function calls to your package to give more explanation.
I think there are some misconceptions in this thread about what ShowYourWork does. I’m one of the maintainers of the package - it does quite a bit more than make code reproducible or generate TeX. While it is written in Python (and JavaScript), it’s actually language independent and you can execute arbitrary scripts at each build step, including Julia code.
It’s basically a way of involving code execution/data processing/etc., into the source of a LaTeX project in a reproducible way (e.g., \variable{mytable.tex}
would declare mytable.tex
as a node in the data processing graph, and it would be generated by some script).
It uses Snakemake (a modern version of make
, but which also allows Python syntax inside the make file) to declare dependencies in the build process, which is supposed to lazily run your entire research pipeline from raw data (which might stored and versioned on zenodo) to processed data (which might be uploaded to cloud as well) to final plots/tables/or even single numbers.
The goal is for the command showyourwork
to run your entire research analysis pipeline all the way from raw data to final PDF compilation in a reproducible way. Changing a single version number of any dependency would result in all dependent tasks to be re-run.
SYW also has really nice GitHub actions integration, and will re-generate dependencies of your paper each time something changes. There’s even an action that will generate a latexdiff
whenever there is a PR to your SYW-based repo.
It uses conda for version management, which can install specific versions of julia
, and you can totally include a {Manifest,Project}.toml
and have the Snakemake file re-compute every Julia step whenever those change. (Similar for whatever other languages are used in your analysis).
I thought it might also be helpful if I demonstrate how Julia integration would work. Here’s an example repository: GitHub - MilesCranmer/showyourwork_julia_example. Just fork it.
The following modifications were made to the default template:
- Defined
src/scripts/paths.jl
, replacingsrc/scripts/paths.py
(just a convenience file which defines paths when youinclude()
it). - Created a
Project.toml
to define Julia dependencies. - Created two example scripts in
src/scripts/
:data.jl
, to create a dataset and save it tomydata.csv
, andplot.jl
, to plot the dataset and save it tomyplot.png
.
- Created three Snakemake rules:
julia_manifest
createsManifest.toml
from theProject.toml
.data
callsdata.jl
, and depends onManifest.toml
.plot
callsplot.jl
, and depends onmydata.csv
andManifest.toml
.
- Configured
showyourwork.yml
to map.jl
tojulia
.
The Snakefile
also defines the JULIA_PROJECT
as "."
. These three Julia jobs are dependencies of the final rule, which compiles the LaTeX document using tectonic
. The generated PDF and arXiv tarball will contain myplot.png
.
For example, the rule plot
:
rule plot:
input:
"Manifest.toml",
data="src/data/mydata.csv",
output: "src/tex/figures/myplot.png"
script: "src/scripts/plot.jl"
This Julia script is then able to reference the variable snakemake
:
using Gadfly
using Cairo
using CSV
using DataFrames
input_fname = snakemake.input["data"]
output_fname = snakemake.output[1]
data = open(input_fname, "r") do io
CSV.read(io, DataFrame)
end
# Plot x vs y:
p = plot(data, x=:x, y=:y, Geom.line)
# Save:
draw(PNG(output_fname, 10cm, 7.5cm), p)
In ms.tex
, we can define the corresponding figure as:
\begin{figure}[h!]
\centering
\includegraphics[width=0.5\textwidth]{figures/myplot.png}
\caption{A figure.}
\label{fig:fig1}
\script{../scripts/plot.jl}
\end{figure}
Which will add a hyperlink to the script used to generate the figure inside the compiled PDF:
(This hyperlink also refers to the exact git SHA at that point in time!)
Hi Miles, thank you very much for your explanation and Julia example. I like the idea of being able to re-run the whole work pipeline that goes from raw data to the paper PDF. I didn’t know that ShowYourWork is language independent.
If instead of conda I use juliaup for version management, could I still use this package?
Thanks.
If instead of conda I use juliaup for version management, could I still use this package?
Definitely. In that example repo I didn’t even use conda. Snakemake will just use the first julia
on PATH
.
Great, thanks!
Isn’t this workflow exactly what make
is made for ? I think that latexmk
allows very good integration with makefiles, and actually can do all this under the hood while still producing auto-updating preview pdf.
More precisely, how is showyourwork
a progress over a latexmk + make workflow ?
make
is designed for compiling programs, but snakemake
is designed for data analysis workflows. See more on the snakemake docs. (You could try to do complex data analysis with make
but it would not be fun time, especially if you need to allocate cluster resources or cloud services for running expensive steps of your workflow.) Snakemake also has native support for Julia/Python/Rust/Bash scripts which is a nice bonus.
Also, rather than latexmk
, showyourwork uses tectonic
, which is a modern self-contained LaTeX engine. (Again: to emphasize reproducibility)
So at its core SYW is an integration of snakemake and tectonic. But it does more than that. There’s a good diagram on the docs which has some of the features (this diagram should be updated to include julia/rust logos for the “scripts” as well, since I think it might confuse people)
The overleaf + zenodo + github integration is handled by SYW, and the final PDF’s figures are tagged with hyperlinks to the analysis script which produced each figure.
Of course if you are just writing a theory paper you don’t need any of this, it’s more for if you want a reproducible way to work with versioned datasets & versioned analysis pipelines & versioned papers.
The GitHub integration is emphasized a lot, with specific actions to build a version of your paper at each commit (both PDF and arXiv tarball), build a latexdiff version for pull requests, etc. For example, if you look at my demonstration pull request here: [demonstration] Change text and change sin to cos by MilesCranmer · Pull Request #1 · MilesCranmer/showyourwork_julia_example · GitHub, you will see that there is a PDF showing the highlighted changes in the paper:
Vanilla snakemake + tectonic is a good option too if you don’t want the other stuff. (But even when I don’t need it, I tend to prefer SYW because of all the automation and features)
I do the same GitHub actions + latexdiff on tagged versions and PR on my papers, this is a very helpful workflow indeed ! Especially, to send back to reviewers after the first round to show them exactly what changed.
But I do that “manually” by writing my actions, makefile and stuff myself (well, I reuse and upgrade them from one paper to the next). I guess that delegating all this management to a purpose-built tool is indeed a very good idea.
You made a fairly good propaganda, I might try SYW it on my next project !
Hyperlinks to scripts that made the figures looks like a very interesting idea.
Ps: is there a way to make SYW use something else than tectonic (at least locally) ? Having to download packages on the fly might cause issues when offline. Or maybe tectonic can be told to download packages from a local repo ?
I should be clear that I didn’t create SYW; I just like it enough that I help maintain parts of it. I get nothing out of it if I convince you to use it
Not that I know of. But tectonic allows you to set a custom package proxy so you could download CTAN yourself beforehand and link to it? But if you’ve ever previously used a certain a package it will be cached. ie., this is the same as how all other modern package managers work including Julia (it’s really LaTeX that is the weird one).
Could you please compare it with DrWatson.jl?
They have the same driving motivation of reproducibility, but they fit into the scientific workflow at different stages: DrWatson.jl while performing the research, and ShowYourWork when presenting it. I think you could even use them together.
For example, I think you could specify a certain DrWatson.jl-versioned simulation in the ShowYourWork Snakemake
file, and it could query DrWatson.jl for the raw data when running the analysis and compiling the results into the paper.
(Would be cool to even have a simple plugin between them)