Why does append!(df1, df2) return a DataFrame, as well as modify the argument df1?

world-peace · November 6, 2024, 11:03am

It seems like the append! function, as defined in the DataFrames package both returns a value and modifies the value of one of its arguments.

This seems strange and I am struggling to understand why this happens.

I picked up on this because I had a function which called append! just before returning.

function example(df1, df2)::Nothing
    # some stuff here ...
    append!(df1, df2)
end

This function produces an error because it does not return nothing. It returns the value returned by append!.

kevbonham · November 6, 2024, 11:55am

I’m guessing that DataFrames is just following the semantics that Base uses - when you append! a Vector or something, the modified object is returned as well.

Can you say a bit more about why it seems strange to you?

One solution to your specific issue is that you can add an explicit return nothing to the end of your function of you want. But unless you’re often wanting to grab that nothing, there’s no harm in leaving the return value as a DataFrame (or just not specifying it)

world-peace · November 6, 2024, 12:15pm

Functions usually either modify one of their arguments, or do not modify their arguments and return a new value.

Normally you would expect one of the two semantics:

isnothing(sort!(df1, df2))
# df1 modified, `nothing` returned

or

df3 = sort(df1, df2)
# neither df1, df2 modified, new value returned

This is not a Julia specific thing - it’s just sort of how most languages / APIs work.

carstenbauer · November 6, 2024, 12:19pm

The behavior is useful for chaining in-place function calls.

mrufsvold · November 6, 2024, 12:34pm

By returning the modified object, in-place operations can be chained:

vec = map!(x->x^2, append!(get_vec(), (3,4)))

Edit: @carstenbauer beat me to it! Sorry!

lupemba · November 6, 2024, 12:37pm

I am not so sure about this statement. My experience is that the function pop normally removes the last element from a list and return it. That is a function that both modify the input and returns a value.

As example is here JavaScript pop and python pop.

kevbonham · November 6, 2024, 12:38pm

I mean, I wouldn’t, but maybe that’s because I’ve been using Julia so long . sort! should indeed return the modified value.

I know that one thing that’s different about julia is that all functions return, even if what they return is nothing. For example, I think in python you can’t actually do foo = print("foo") (though it’s been a long time, I confess I don’t remember).

I think the chaining above is a nice side effect, but I don’t know if that’s the motivation. For me, working in the repl, it’s a nice feature that I see the outcome of my operations. Also, AFIACT, there’s no downside to returning the value, so why not do it?

kevbonham · November 6, 2024, 12:44pm

This is wrong, you can and it’s None.

DanielVandH · November 6, 2024, 12:47pm

This behaviour is also documented in the docstring for append!

help?> append!
search: append! prepend! swapfield!

  append!(collection, collections...) -> collection.

  For an ordered container collection, add the elements of each collections to the end of it.

  │ Julia 1.6
  │
  │  Specifying multiple collections to be appended requires at least Julia 1.6.

It’s expected to return the collection. As does e.g. push!. This extra return doesn’t give you any extra allocations or anything either so, it doesn’t cause you any performance problems if you don’t want to make use of it.

world-peace · November 6, 2024, 1:09pm

A fair point. It’s a shame there wasn’t a second function introduced for this purpose however. (Probably with a different name?)

It could have been append and append!, possibly.

This isn’t the same situation. If pop! returned the input, you would say it was strange. It is normal for a function to take some inputs and return some different thing. What is weird is for a function to take some inputs and return those same things.

For example, the equivalent behaviour in this case would be for pop! to return not just the removed element but also the input container itself, presumably as a tuple or a pair. If this was how it worked, it would be strange.

DanielVandH · November 6, 2024, 1:18pm

But then you’re having to allocate into a new collection. The point is to reuse the same collection and then pass it into another function.

Wispy · November 6, 2024, 1:34pm

Relevant discussion from Martin Fowler: Command Query Separation

Vasily_Pisarev · November 6, 2024, 2:01pm

The original statement is a common pattern in OOP, it is stated as one of the design principles, e.g. in “A Touch of class” by Bertrand Meyer, with pop mentioned as the only notable exception from the rule. The principles itself goes back probably to Algol, which was one of (if not the) first languages to distinguish subprograms into procedures which modify data and functions which return values.

That said, Julia is expression-based language. That means, any syntactically valid code unit has a value (can be used as rhp of an assignment expression). E.g., an expression x = for i in 1:10; s+=i; end is valid and x = nothing after evaluation, provided a variable s exists in the evaluation scope. This is why any subprogram, even those considered as procedures in statement-based languages, returns a value. Returning the modified argument for sort!, append!, push!, reverse! etc. is actually convenient for chaining operations, folds, and for immediate display of the action in REPL, as others have noticed.

world-peace · November 6, 2024, 3:25pm

Ok fine - you have convinced me.

I would say there should be three functions for the different usecases but I don’t genuinely believe that on balance that is a better solution. It would require creating a new name for a function just to do this - unless someone were to introduce a symbol like ! which could be used to disambiguate two functions with the same “name”.

kevbonham · November 6, 2024, 3:48pm

I suppose I’m confused about what the different use cases are. As in, what’s the advantage of having a version that returns nothing, since

?

If you don’t want the return value, just don’t (re)assign it. If you want your original function to return nothing, then just write return nothing at the end. It would seem odd to me to have multiple functions with the same purpose, except that one returns nothing.

As an aside, some Julia style guides (and my own preference) suggest explicit returns, so even if you wanted your function to return the dataframe, I would write it as

    ...
    append!(df1, df2)
    return df1
end

world-peace · November 6, 2024, 5:15pm

It fixes this bug

function some()::Nothing
    append!(...,...) # does not return `nothing`
end

carstenbauer · November 6, 2024, 5:27pm

That’s not a bug but a misconception of how Julia works (i.e. its conventions).

Btw, I quite generally recommend explicit return statements in which case this “issue” wouldn’t occur in the first place.

DanielVandH · November 6, 2024, 6:00pm

Functions return the last value if there is no explicit return. Just add a return nothing if you really think it’s important to return nothing

kevbonham · November 6, 2024, 6:39pm

It’s not a bug (as mentioned), but you can “fix” it anyway with

function some()
    append!(...,...) # does not return `nothing`
end

# or

function some()::Nothing
    append!(...,...) # does not return `nothing`
    return nothing
end

# or even 

function some()::Nothing
    append!(...,...) # does not return `nothing`
    nothing
end

# or finally

function nappend!(args...)
    append!(args...)
    return nothing
end

function some()::Nothing
    nappend!(...,...) 
end

bertschi · November 6, 2024, 8:04pm

I really don’t like functions which return nothing, especially if there is a meaningful value that could be returned instead. Here is an example in Java:

Map<String,Integer> h = new HashMap<String,Integer>();
h.put("a", 1);
h.put("b", 2);

This would be much nicer, if I could chain method calls h.put("a", 1).put("b", 2) instead, and fortunately fluent interfaces are becoming more popular in the OOP world as well. In the meantime, also some macros – such as Clojure’s doto – go a long way in providing nicer chaining syntax.

Topic		Replies	Views
Issue adding a row record of a DataFrame with `String` name to itself General Usage dataframes	5	1084	March 20, 2022
Appending rows to a dataframe is seemingly inconsistent and confusing Data	11	4718	December 24, 2021
Argument passing with dataframes New to Julia dataframes	4	445	July 9, 2021
DataFrame in Nested Loop New to Julia dataframes	8	1103	December 3, 2020
Append!() function extremely slow in DataFrames + CSV Data package	14	6143	January 16, 2018

Why does append!(df1, df2) return a DataFrame, as well as modify the argument df1?

Related topics