Findall/Findfirst across multiple columns in dataframe

Hi all,

I’m trying to check whether a value exists in a dataframe or not and if so return “in dataframe”. In the past I have used:
findall(x → x==“value I want”, Dataframe)

Is there anyway to adapt this formula to search for the value across the dataframe?

When I try to search for a value I know is located somewhere in the dataframe I get the below error:
MethodError: no method matching keys(::DataFrames.DataFrame)
Closest candidates are:
keys(!Matched::Cmd) at process.jl:638
keys(!Matched::Core.SimpleVector) at essentials.jl:603
keys(!Matched::IndexLinear, !Matched::DataFrames.GroupedDataFrame) at

Any suggestions would be greatly appreciated!

Thanks in adavance.

No built-in function for this unfortunately.

Here is something

julia> function find_in(val, df::AbstractDataFrame)
       for col in eachcol(df)
           val in col && return true
       end
       return false
       end
3 Likes

Another one :slight_smile:

julia> df = DataFrame(a=4:6, b=0)
3×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     4      0
   2 │     5      0
   3 │     6      0

julia> findfirst(∋(5), eachcol(df)) !== nothing
true
4 Likes

Hi!

Thank you for replying!

When I try to run this I get the below error:
MethodError: no method matching ∋(::Int64)
Closest candidates are:
∋(::Any, ::Any) at operators.jl:1069
Stacktrace:
[1] top-level scope at none:1

Any idea why this might be happening?

Probably an older version of Julia: the one-argument method was added in 1.6.

3 Likes

Hi,

Thank you for your reply. I have been trying to adapt your function for what I am trying to do and apologise for not being specific enough in my initial post.

Some context to what I am trying to do:

I have 2 seperate dataframes Asset_Data and Project Data.
Asset_Data has 3 columns:
asset_id, country, cashflow

The Project Data dataframe has multiple columns (300 different columns)
Project name, type 1, type 2, …

I am trying to create a new column in my Asset_Data dataframe “Asset ID/Project” which identifies whether the asset_id is located in the Project Data dataframe or not.

Any idea how to do something like this?

What I have tried:
I have tried passing the values individually to check if it works - which it does. However, when I try to loop through the values from the Asset Data I’m finding issues with creating my new “Asset ID/Project” column.

transform!(Asset_Data, :asset_id => a->find_in(a, ProjectData) => :inproject)
1 Like

Thank you for the response. When I tried this it produces a new columns “asset_id_function” and all values are the same false => :inproject.

Any idea how to change this? thanks

Sorry, I omitted required parentheses:

transform!(Asset_Data, :asset_id => (a->find_in(a, ProjectData)) => :inproject)
2 Likes

Thank you this works. However, I noticed it is producing a column full of false. I think I am having issues find the value in the projects dataframe when the asset_id is not in the first column.

Try this one

transform!(Asset_Data, :asset_id => (a->find_in.(a, Ref(ProjectData))) => :inproject)
1 Like

Hey, thanks again for responding!

When I try this i get the following error:
LoadError: syntax: “$” expression outside quote around

Typo, no $.

1 Like

This produces the same column as previous (I.e. false).

I think the issue might be the functions itself. I don’t think the find_all function is actually finding the values when they not located in the first column of the Project Data frame.

julia> function find_in(val, df::AbstractDataFrame)
              for col in eachcol(df)
                  val in col && return true
              end
              return false
              end
find_in (generic function with 1 method)

julia> adf = DataFrame(:asset_id => string.(rand(Int,30)))
30×1 DataFrame
│ Row │ asset_id             │
│     │ String               │
├─────┼──────────────────────┤
│ 1   │ -2736397144903275711 │
│ 2   │ -5009355446030467515 │
│ 3   │ 3016860719257944439  │
│ 4   │ -6219914729618723688 │
│ 5   │ 7089418941294219076  │
│ 6   │ 7798049048074901135  │
│ 7   │ -1184777874901081190 │
│ 8   │ -8988294207548622192 │
│ 9   │ 2489031843846888792  │
│ 10  │ -5616582555900742121 │
│ 11  │ -8129534410659833719 │
│ 12  │ -2412467793492652638 │
│ 13  │ 1833479972888884497  │
│ 14  │ 7275841467256767530  │
│ 15  │ 1407559908209885941  │
│ 16  │ 6768678267236265035  │
│ 17  │ 8222936811673823306  │
│ 18  │ -4546108583101607217 │
│ 19  │ 8378684577093918191  │
│ 20  │ -4237121901420970    │
│ 21  │ 9060470584731928438  │
│ 22  │ -8677251945165029743 │
│ 23  │ 2885065925014277165  │
│ 24  │ -4974471416155912343 │
│ 25  │ -3966713344091129412 │
│ 26  │ -3759254107417327886 │
│ 27  │ -2176385816462454709 │
│ 28  │ 3378276156966501063  │
│ 29  │ 5572543475657793822  │
│ 30  │ -4331938757200166268 │

julia> ProjectData = DataFrame(["Col $i" => string.(zeros(Int,30)) for i in 1:10]);

julia> for a in adf.asset_id
       if a[1]!='-'
       row = rand(1:30); col = "Col $(rand(1:10))"; ProjectData[row , col] = a
       end
       end

julia> ProjectData
30×10 DataFrame. Omitted printing of 3 columns
│ Row │ Col 1               │ Col 2               │ Col 3  │ Col 4               │ Col 5               │ Col 6               │ Col 7  │
│     │ String              │ String              │ String │ String              │ String              │ String              │ String │
├─────┼─────────────────────┼─────────────────────┼────────┼─────────────────────┼─────────────────────┼─────────────────────┼────────┤
│ 1   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 1407559908209885941 │ 0      │
│ 2   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 3   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 4   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 5   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 6   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 7   │ 6768678267236265035 │ 0                   │ 0      │ 3378276156966501063 │ 0                   │ 0                   │ 0      │
│ 8   │ 0                   │ 9060470584731928438 │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 9   │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 10  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 11  │ 7275841467256767530 │ 1833479972888884497 │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 12  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 13  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 7798049048074901135 │ 0      │
│ 14  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 15  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 16  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 17  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 18  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 19  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 20  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 2885065925014277165 │ 0      │
│ 21  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 5572543475657793822 │ 0      │
│ 22  │ 0                   │ 0                   │ 0      │ 3016860719257944439 │ 0                   │ 0                   │ 0      │
│ 23  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 24  │ 0                   │ 0                   │ 0      │ 0                   │ 8378684577093918191 │ 0                   │ 0      │
│ 25  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 26  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 8222936811673823306 │ 0      │
│ 27  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 28  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 29  │ 0                   │ 0                   │ 0      │ 0                   │ 0                   │ 0                   │ 0      │
│ 30  │ 0                   │ 2489031843846888792 │ 0      │ 0                   │ 0                   │ 0                   │ 0      │


julia> transform!(adf, :asset_id => (a->find_in.(a, Ref(ProjectData))) => :inproject)
30×2 DataFrame
│ Row │ asset_id             │ inproject │
│     │ String               │ Bool      │
├─────┼──────────────────────┼───────────┤
│ 1   │ -2736397144903275711 │ 0         │
│ 2   │ -5009355446030467515 │ 0         │
│ 3   │ 3016860719257944439  │ 1         │
│ 4   │ -6219914729618723688 │ 0         │
│ 5   │ 7089418941294219076  │ 1         │
│ 6   │ 7798049048074901135  │ 1         │
│ 7   │ -1184777874901081190 │ 0         │
│ 8   │ -8988294207548622192 │ 0         │
│ 9   │ 2489031843846888792  │ 1         │
│ 10  │ -5616582555900742121 │ 0         │
│ 11  │ -8129534410659833719 │ 0         │
│ 12  │ -2412467793492652638 │ 0         │
│ 13  │ 1833479972888884497  │ 1         │
│ 14  │ 7275841467256767530  │ 1         │
│ 15  │ 1407559908209885941  │ 1         │
│ 16  │ 6768678267236265035  │ 1         │
│ 17  │ 8222936811673823306  │ 1         │
│ 18  │ -4546108583101607217 │ 0         │
│ 19  │ 8378684577093918191  │ 1         │
│ 20  │ -4237121901420970    │ 0         │
│ 21  │ 9060470584731928438  │ 1         │
│ 22  │ -8677251945165029743 │ 0         │
│ 23  │ 2885065925014277165  │ 1         │
│ 24  │ -4974471416155912343 │ 0         │
│ 25  │ -3966713344091129412 │ 0         │
│ 26  │ -3759254107417327886 │ 0         │
│ 27  │ -2176385816462454709 │ 0         │
│ 28  │ 3378276156966501063  │ 1         │
│ 29  │ 5572543475657793822  │ 1         │
│ 30  │ -4331938757200166268 │ 0         │

2 Likes

Thank you for the detailed worked example - much appreciated. I have been able to replicate this in Julia and can see that the function does work (my apologies).

I have also been able to replicate using my worked example - thank you all, your time is very much appreciated :smiley:

1 Like