Checking if function input is DataFrame

jo-fleck · March 7, 2021, 11:54pm

I wrote a function which inspects a data frame and, if its content meets certain requirements, sends it to a server. To make sure (new) users can easily understand which aspect of the content needs to be changed in case the upload fails, I added an explicit error if the function input is no data frame.

function my_fun(df_in)

    if typeof(df_in) != DataFrame error("Input must be a data frame") end

...
end

However, when running the Juno profiler on the function (I am new to measuring performance - please tell me if this is a terrible approach), I noticed this line is very slow - it accounts for 50% of elapsed time.

I assume using my_fun(df_in::DataFrame) would improve performance but then the function would throw a method error as opposed to my specific error message.

Is there a performance friendlier way to check if an object is of type DataFrame?

Many thanks for hints and help!

pdeffebach · March 8, 2021, 12:03am

You can do

function my_fun(df_in::DataFrame)
   # do stuff
end

function my_fun(df_in::Any)
    throw(ArgumentError("Must be a DataFrame"))
end

tbeason · March 8, 2021, 12:28am

@assert typeof(df_in) == DataFrame "Input must be a DataFrame"

I believe this is what you want.

Satvik · March 8, 2021, 4:20am

This seems like an issue with the profiler. When I tried the following:

function my_fun(df_in)
    if typeof(df_in) != DataFrame a = 1 end
end

@btime my_fun(test_data)
4.583 ns (0 allocations: 0 bytes)

@btime my_fun(5)
0.036 ns (0 allocations: 0 bytes)

There’s a few nanoseconds of overhead from the if statement, but that’s it. I couldn’t actually time the error function in the profiler, but from manual testing it was practically instant, so this check should be fast in all cases.

jling · March 8, 2021, 5:41am

if may even get optimized away at the compile time:

julia> function f(x)
           if typeof(x) == Int64 throw(Error("Int")) end
           3
       end

julia> @code_native f(3.0)
	.text
; ┌ @ REPL[1]:1 within `f'
	movl	$3, %eax
	retq
	nopw	%cs:(%rax,%rax)
; └

julia> @code_native f("a")
	.text
; ┌ @ REPL[1]:1 within `f'
	movl	$3, %eax
	retq
	nopw	%cs:(%rax,%rax)
; └

jo-fleck · March 8, 2021, 12:44pm

This seems like an issue with the profiler

You’re right. I just commented the type check and now the profiler shows the next line to account for 50% of elapsed time. (I might be misinterpreting the flame graph.) I then used @btime to measure performance before and after commenting all input checks - the difference is negligible.

Topic		Replies	Views
Can DataFrames be distinguished by type? General Usage question , multidispatch	1	315	August 7, 2022
On iterating through columns of a data frame and check if they contain String values General Usage type , dataframes , data_structures	4	2080	April 28, 2020
Enforcing Schema on Data Frame Passed as Function Argument General Usage question , dataframes , function	7	1188	October 27, 2020
String31 in dataframe New to Julia question , dataframes	4	665	July 24, 2023
Detecting the type of user input New to Julia	3	65	November 19, 2024

Checking if function input is DataFrame

Related topics