Unit-test mapping/layout convention/style-guide

testing

#1

Hi,
As I’ve been working through some of the Base, Core and Julia packages, I’ve noted that the src-test mapping/layout conventions (or lack of), more often than not, increase the burden of exploring Julia and most Julia packages.

One way to discover Julia best practices and understand what ‘works’ is to look at test cases for a method/function/type. So this burden is particularly high for users new to Julia, but experienced in other languages or domains where TDD/BDD is common.

While seemingly trivial this ambiguity/confusion can elevate the level of frustration a first-time user experiences.
Given that an old-hand at Julia will often be a first time user of Package X, this frustration will not diminish with experience. Or will… only if I restrict myself to a subset of packages.

The Style-Guide section is silent on testing conventions, apart from recommending use of isa or :< for testing types.

One convention that makes it trivial to discover the tests related to some code is:
For the source file location

<pkg-name>/src/models/linear.jl

the convention is: tests are in location

<pkg-name>/test/models/linear.jl

A happy side effect is, apart from aiding humans, such a convention also aids tooling:

  • when src file ./src/dir1/a.jl changes
  • rerun the test file ./test/dir1/a.jl
  • etc.

My questions:

  1. Has using such a src-test mapping convention been rejected? Or is there some other src-test mapping convention I’m unaware of?
  2. If there is not a community wide consensus: Is there a convention used by Base and Core packages? Anywhere this is articulated for Base and Core packages?

Post-script:
This post relates to the mental load of a new user in the context of an IDE. the context here is once in your IDE (browsing Github) how do you efficiently get to view the tests related to the definition(s) in a source file.


#2

I am not sure about this, using grep (in particular, rg) I can find all invocations in an instant, without leaving my editor (Emacs).

That said, I think that most standard libraries have a decent layout and were easy to contribute to.

But I wonder if multiple dispatch makes a very strict convention somewhat difficult: eg if I define most of the generics in interface.jl, and a particular composite type and its implementation in foo.jl, should all invocations of a generic function go in a single file, or should I separate them by type? Both have their advantages and disadvantages, especially when testing can be combined for types sharing the same interface, so I would leave this to the package authors.

Can you give an example of a convention used in a language with multiple dispatch (eg Common Lisp, or Dylan?)


#3

Agreed, there is nothing that cannot be worked around.
Bear in mind your premise of a new user doing:

  1. git clone
  2. start emacs, navigate to project

Just to browse the test of something observed in file xyz.jl
Github search is OK, but in several of my cases searching for the Type/Method of interest returned 7 web pages… and you guessed it: the test was in a different file and on the 7th page Github returned :frowning:

This is an interesting question/point.
Or I may have been ambiguous in my opening post…
If interface.jl is located here (for example):

./src/api/rest/interface.jl

Then I’d know that any tests of the contents of that file are located here (by convention):

./test/api/rest/interface.jl

Perhaps this is what you meant:
MyType{T} where T <: MyPlotType = .... is implemented in file ./src/plot/generics.jl
However, because it make sense in the domain use case:
MyType{T} where T <: MyWireType = .... is naturally implemented in file ./src/wire/api/specials.jl

Under this (voluntary) convention, you’d know exactly where to look to find the tests relevant to each:
./test/plot/generics.jl to see all the behaviours of MyType{T} where T <: MyPlotType
and then
./test/wire/api/specials.jl to see all the behaviours of MyType{T} where T <: MyWireType

I have a very rudimentary understanding of Julia, I don’t believe there is anything about a convention about where test files are saved, that would be specific to, or impact, multiple dispatch languages.


#4

To elaborate, consider 3 files:

  • interface.jl which defines an interface on two objects, eg with the method stuff(a, b)
  • foo.jl which defines Foo,
  • bar.jl which defines Bar.

Where should one test stuff(::Foo, ::Bar)?


#5

If ./src/api/wire/interface.fl is where stuff(a, b) is defined then ./test/api/wire/interface.fl.
Anything funky about Foo or Bar should be specified in the test files for the source files foo.jl and bar.jl, e.g. the different errors they can raise, different return types, etc. etc.
All Interactions/side-effects of Foo with Bar, and vice versa, (these would be a code smell in Julia?) are in the category of “funky” :

  1. Things about Bar that change how Foo behaves, change the Foo test results, hence belong in ./test/foo.jl, e.g. what exception does Foo return in the state that Bar raises KError
  2. Things about Foo that change how Bar behaves, change the Bar test results, hence belong in ./test/bar.jl
  3. Only those final interactions/side-effects of Foo and Bar that affect the end result produced by stuff(a,b) would go into ./src/api/wire/interface.fl., e.g. what exception does stuff(a,b) raise when Bar raises KError and Foo raises LError, etc. etc.

The intuition is: if you’re writing a case that tests the behavior of type F or method f() then the test result relates to the source file where that behavior is implemented.

I think this case is a std scenario of unit testing how methods interact with each other, and is not unique specific to multiple dispatch - unless I’ve misunderstood some Julia-fu?

I should note that given Julia does not (currently) have something like Cucumber (Given/When/Then), I find myself writing functional tests and unit tests in the one file, and this can be confusing unless test set comments are informative.
Integration tests are best left to test-kitchen (and the like) - in fact while exploring Julia I’ve found myself thinking more about how to push functional tests to the integration category just to take advantage of all that tooling, but there is a limit to this workaround.


#6

I would consider this possibility. I think what you call iteractions/side-effects are a red herring.

To make things concrete, consider LinearAlgebra.

Following your recommendation, where would you put tests for *(a, b)? Base.* is defined in,well, Base, so strictly speaking there is not even a matching file. Should they nevertheless go in a single file, with all tests for all Base methods? That would be a very large file.

I think you are trying to carry over thinking from another, possibly OOP, language. This can occasionally be beneficial, but can also lead to problems. Also, for reasons given above, I think that multiple dispatch is crucial.


#7

Point taken.

Where is that function defined in that library?


#8

It isn’t. As I said above, it is defined in Base.


#9

Then the Base tests suffice.

Until they don’t (src/matmul.jl#L38):

In which case a new user would intuit to look in ../stdlib/LinearAlgebra/test/matmul.jl to find any/all tests related to that:

In this case the convention would require no changes.


#10

Let me give a slightly different example. I define stuff(::Any) in interface.jl, stuff(::Foo) in foo.jl and stuff(::Bar) in bar.jl. For my own brain, having different test sets for stuff() scattered across multiple files would be a pain in the butt, especially if there’s a bunch of setup required for my tests.


#11

Maybe the convention could simply be: add the information of the test file location to the docstring? If done right, would it also work as a link in github?


#12

That has the added problem of having to keep track of where the tests for a specific thing are at. It seems to me that this gets old FAST, especially with the number of functions from e.g. Base.


#13

To be clear, having different definitions of a function/method across multiple files is to my mind useful, and may even be best practice. I don’t want to be thought to be arguing for anything that constrains where you write your implementation code :wink:

So where you locate your tests:
I’m not sure I understand how this can scale beyond small, single author packages - and this was part of my opening question: what logic convention you are using?

Consider this:

  • by good/proper design you have multiple src files:
    "… stuff(::Any) in interface.jl , stuff(::Foo) in foo.jl and stuff(::Bar) in bar.jl"
  • Can you state the logic that leads anyone else to open the correct of three possible test files?

It seems predictable to me that what will likely happen is either:

  • one ends up getting frustrated at opening the wrong file, so one ends up putting all the tests in runtests.jl or some such monster.
  • throw reason, logic and discipline away and just search for all occurrences of a string as suggested by @Tamas_Papp, and hope you never run into a large, multiple author, years long project (we can always work around large packages by splitting it and delegating management to Pkg).

The problem with all tests in one file is test run times become prohibitive, and eventually dev-teams start resenting the test suite (it makes them appear less productive than they are), start making shortcuts and spend their skills on thinking of ways to minimize test usage or worse, subverting the role of coverage metrics.

Test set up and tear down times are a perennial issue, and constantly revisited and refactored. You can’t escape it - learn to use it to your advantage (e.g. parallel or distributed test runs).
Again, multiple test files are unavoidable in anything other than small use cases.

Ack: Julia’s test infrastructure packages are still in their infancy, and I’m not critical of that.


#14

This could work in theory. In practice? Not so much - were that I could change human nature and developer incentives.

This would also be a change requiring massive effort - to see this just look at how many functions here have doc strings:

Unfortunately, the reality is that many critical functions never rise to the level of having docs generated. These often are the subject of extensive tests and also are often refactored (usually for performance reasons, but also a project or package functionality changes) - that leads to the documentation locations changing.

I think it this approach you will have swapped hunting for the test files to hunting for doc files. Or in @Tamas_Papp workflow grep’ing for doc strings.

If this approach was adopted as recommended convention/practice Julia would likely need to support, out of the box, and painlessly dual documentation management tools.


#15

Apologies for being obtuse. What does “That” refer to?


#16

Sorry for not being clear about it, I was referring to @Tero_Frondelius post here.


#17

I think this is a very loaded way of phrasing the question.

For these problems, style guides and tooling (of which grep is the simplest and most readily available) are somewhat complementary, and one should be ready to pick the solution that’s the best match for a specific setup.

Depending on team practices and the context (think: open source collaborators, composed of a small core team and a bunch of others just making the occasional PR), adhering very rigidly to style conventions can also impose a significant cost. Substituting them with other tools is not “throwing reason away”, it is just a choice.


#18

I don’t believe I’ve advocated rigid adherence. If I did, apologies, that wasn’t my intention.
I would expect that in base or stdlib, if any convention existed and was deviated from, there would be some kind of comment or note indicating where the remaining test case(s) could be found.

I have asked how people ensure anyone else can look at a piece of source code and from that reliably reason where to find all the test(s) for that code.

I think its reasonable and accurate to describe grepping an arbitrary large number of files, and then visually scanning the result set as not employing reason, or even deduction - in the common understanding of those words.

When you’re paying for people’s time on large projects such issues are non-trivial, and even can be make or break - CPU cycles are dirt-cheap - and getting cheaper. Human cycles are extremely expensive - and getting more expensive.

When I or others have discussed Julia as not ready for prime time it is usually around such human-efficiency issues. Think of project(s) large enough that a dedicated skilled, experienced and intelligent person looks after tracking issues related one of many components.

I think I’ve responded to the questions posed such that it is fair to conclude from the responses:

  1. There is currently no suggested or recommended community style or best practice.
  2. There is nothing about a multiple dispatch language that constrains how you arrange code, nor map src code to test cases.
  3. There is currently no agreement of the need, nor appetite, to adopt any convention(s).
  4. The best advise (def: least likely to overlook important test cases) I could give an evaluation team, or anyone exploring the Julia world of code, is to adopt @Tamas_Papp 's suggested workflow:
  • Git clone the source
  • grep or the equivalent in your environment
  • file by file evaluation of the result set (planned/budgeted under the assumption/expectation this set could be large and idiosyncratically organized - depending on developer whim)

An open question is whether the cost of setting up parallel/distributed test harnesses could be defrayed by the benefit of testing third party packages - there are good incentives to write test files larger than they might otherwise be - again this will depend on developer whim.