Teaching class at University | Automatic Homework Grading


we are planning to offer a university course about image processing in microscopy which includes some exercises.

I was wondering whether anyone has some tips&tricks for automatic homework correction and what turned out to be feasible.

My current approach would have been:
Each homework consists of a new package like IP_Homework_01.jl which the students clone/download it via git. They locally ]dev it and load it (with Revise). They fill the missing gaps in the source code and can play around in the REPL/Jupyter/Pluto.
Once finished they ]test it and check how much of the code is really working.

Finally they submit it to the course portal and I evaluate all the submissions (again with ] test) and see how many of the tests are correct.

Is that something that is fair and works? We don’t expect too many students (<20) and therefore some issues might be solved by hand as well.

I would be happy to also hear your experiences and workflows!



I think often there are tests which you give the students but then there are also “secret” tests which only you have and run.

There are automated frameworks for this, for instance within Jupyter. No idea how easy it is to setup.


You mean something like nbgrader?

I think providing tests which check two different case is in most cases enough, if someone does over fitting ok, but then they won’t pass the exam anyways :smiley:

I feel like that doing the Julian way with testing and packages they already get quite familiar with the Julia workflow in general, doing the Jupyter way is probably easier in the first steps from a student perspective but then doesn’t give a nice intro into package manager, Revise etc.

1 Like

I’m also interested to hear people’s opinion about this.

What you suggest seems quite reasonable to me if the scope of the course includes some serious programming. Otherwise I think I would try something lighter, e.g. giving a Jupyter notebook for each homework (together with a Project.toml if required). The notebook would include a few test cells at the bottom, to be used by the student. For grading I would then use NBInclude.jl to include the student’s code and run my own tests with it.

1 Like

Yeah I agree. Jupyter might be easier since it’s essentially one file.

Hopefully we can find at least 1-2 students for the lab and after the course they already understand the workflow to develop a Julia package.

I’m also interested in automatic grading. I think @dpsanders has done work on this during the MIT course they gave. We’ll probably see some information about this at their JuliaCon talk!

I personally wouldn’t go into this complicated route of each exercise set being an individual “package” with a test suite and everything. Seems to me like a single Pluto script could get the job done (besides, now Pluto integrates package versions in a notebook).


My experience is that motivated students who are not experienced in programming often get the solution 85–95% right, so it kind of runs, but may be missing some corner (or not so corner) cases or have a small problem or two.

These I grade on the basis of how much change to the code would have been necessary to make it right. This is subjective, and does not scale, but my concern with automated grading based on test cases is that it would not differentiate between completely misguided solutions and tiny bugs. Let’s face it, writing software is hard, and I would find it is unfair to penalize too harshly for the kind of bugs I also make on a daily basis.

So I usually just accept Julia source files and Jupyter notebooks (up to the students, 90% choose the latter), and would run it step by step and eyeball the code and the results.


Yeah, I agree completely on your points.

We also wouldn’t let them work without any guidance for 2 weeks; we think about virtual sessions where they come and ask questions why something is not working etc.

By providing tests I feel that we can already solve a lot of problems during development and therefore the students can check whether something is completely wrong or not.

For example, in my bachelors we’ve had lectures with 1.8k people attending therefore an automatic test system was needed to account for all this solutions.
I remember by having access to the tests we have been working until we received the green signal which gave instantly a good feeling that we are not too far off. Without any pre-feedback students will submit anything to get points and might waste a lot of time with wrong approaches.
To have automatic testing, one definitely has to spend more time in providing a proper skeleton.

Of course, in an ideal world we would give code feedback to all solutions and judge them by hand but since we are not teaching in the CS department that might be too much…


Full automation is possible, but not advisable for the reasons given by Tamas above.

I recently saw a coding class from the Art of Problem Solving (AOPS). So what they do typically is there are about 10 questions per week. The first few are short and simple, e.g. a simple piece of code that returns an answer and you have to guess the answer without running the code, e.g. a simple piece of code with an error like an “and” instead of an “or” or a missing return, which the student is expected to fix. Somewhere between question 5 and 9 you have proper coding exercises. The AOPS platform allows you to run the code within a cell before submitting, so very much like a Pluto/Jupyter notebook. Questions 1-9 are graded automatically. That is, for some of the questions, entering something (anything) is good enough to pass. For some of the questions, you have to enter an exact answer, say some integer, and the system just checks if the integer is correct. It keeps a history of your attempts and penalizes you marginally for incorrect attempts. The last question (usually numbered 10) is a much more advanced exercise and it is coded by a person: no automation.

I found this system to have a good balance between properly assessing the student’s work and reasonably managing the grader’s time.

If a student notices that the system does not check the answers and automatically validates any submission and abuse the system, they won’t be learning and wouldn’t be expected to do well in question 10!


Btw, as follow-up.

We are using a package like structure where all deps are stored and students start the Pluto notebook from the environment.
The GitHub repo is here.

Inside the Pluto notebook we activate the same environment again.

We provide for each homework part several tests with PlutoTest which makes it easy for them to check whether they are roughly on track.

For grading I (thanks to fonsp) start all submissions with

import Pluto
s = Pluto.ServerSession()
nbfiles = ["a.jl", "b.jl"]
Pluto.SessionActions.open.([s], nbfiles)

After a few minutes I come back and scroll through the homework. Usually because of the red/green tests you get a quick impression how good the homework is.


Does the student who works out

sed -i 's/@test .*/@test true/' test/runtests.jl

get extra credit ?

The tests are within the student’s notebook, so yes the could also change the tests there.

But since I spent a few minutes in their notebook, I would discover that probably.

so it doesn’t make you look :slight_smile:

sed -i 's/@test .*/@test rand() > 0.8 /' test/runtests.jl

1 Like