Is it possible to join DataFrame with Arrow Table ensuring unique rows without bringing Arrow Table into RAM?

So combining @JorizovdZ and @rocco_sprmnt21’s solution here and @bkamins solution here thus far I think the best way to combine a new DataFrame with an existing Arrow file ensuring unique values without bringing the Arrow file into memory would be something like

database = "filepath" 
Arrow.write(database ,dataDF,  file = false) # when saving the dataDF as arrow file - keyword file has to be set to false 
dataDF = DataFrame(Arrow.Table(database))   
NewDF = antijoin(NewDF, dataDF , on = intersect(names(NewDF), names(DataDF)) 
Arrow.append (database, NewDF)
1 Like