Checking if function input is DataFrame

I wrote a function which inspects a data frame and, if its content meets certain requirements, sends it to a server. To make sure (new) users can easily understand which aspect of the content needs to be changed in case the upload fails, I added an explicit error if the function input is no data frame.

function my_fun(df_in)

    if typeof(df_in) != DataFrame error("Input must be a data frame") end

...
end

However, when running the Juno profiler on the function (I am new to measuring performance - please tell me if this is a terrible approach), I noticed this line is very slow - it accounts for 50% of elapsed time.

I assume using my_fun(df_in::DataFrame) would improve performance but then the function would throw a method error as opposed to my specific error message.

Is there a performance friendlier way to check if an object is of type DataFrame?

Many thanks for hints and help!

You can do

function my_fun(df_in::DataFrame)
   # do stuff
end

function my_fun(df_in::Any)
    throw(ArgumentError("Must be a DataFrame"))
end
4 Likes
@assert typeof(df_in) == DataFrame "Input must be a DataFrame"

I believe this is what you want.

1 Like

This seems like an issue with the profiler. When I tried the following:

function my_fun(df_in)
    if typeof(df_in) != DataFrame a = 1 end
end

@btime my_fun(test_data)
4.583 ns (0 allocations: 0 bytes)

@btime my_fun(5)
0.036 ns (0 allocations: 0 bytes)

There’s a few nanoseconds of overhead from the if statement, but that’s it. I couldn’t actually time the error function in the profiler, but from manual testing it was practically instant, so this check should be fast in all cases.

if may even get optimized away at the compile time:

julia> function f(x)
           if typeof(x) == Int64 throw(Error("Int")) end
           3
       end

julia> @code_native f(3.0)
	.text
; ┌ @ REPL[1]:1 within `f'
	movl	$3, %eax
	retq
	nopw	%cs:(%rax,%rax)
; └

julia> @code_native f("a")
	.text
; ┌ @ REPL[1]:1 within `f'
	movl	$3, %eax
	retq
	nopw	%cs:(%rax,%rax)
; └
1 Like

This seems like an issue with the profiler

You’re right. I just commented the type check and now the profiler shows the next line to account for 50% of elapsed time. (I might be misinterpreting the flame graph.) I then used @btime to measure performance before and after commenting all input checks - the difference is negligible.