Arrow stream usage clarification

JavierLoaizaRivas · December 1, 2022, 11:00am

This threat is very clarifying, thank so much!.
Nevertheless, there is a case where I still can not see the light: What happens when the source is itself an Arrow.Stream?. In this case it is not obvious to me how to convert each table (in a partion) to a dataframe and then back to a new partition…in my naive understanding something like this should work:

using Arrow
using DataFrames
using IterTools

MassivePartionedTable_Input = Arrow.Stream("inputFile.arrow")
firstPartition, restPartitions       = firstrest( MassivePartionedTable_Input )
df_firstPartition                         = DataFrame( firstPartition )
largerTableOuput                      = DoesSomeThingOnThisPartition( df_firstPartition)
Arrow.write("outputFile.arrow",largerTableOutput)
for eachPartition in restPartitions 
  df_eachPartition  = DataFrame( eachPartition) 
  largerTableOuput   = DoesSomeThingOnThisPartition( df_eachPartition )
  Arrow.append("outputFile.arrow",largerTableOutput)
end

However I obtain as error:

ERROR: MethodError: no method matching append(::WindowsPath, ::DataFrame)

I am not sure about how to proceed…should I convert largerTableOUtput back as table with 1 partition…

sorry…in this case I find the documentation a bit fussy. I really will appreciate your ideas.

Javier

Topic		Replies	Views
Append a `DataFrame` to a partition of an existing `ArrowTable` without creating a new `ArrowTable`? New to Julia question , data , dataframes , arrow	7	598	April 21, 2023
General Arrow questions General Usage question , arrow	7	911	February 28, 2022
Writing Arrow files by column Performance	1	192	May 8, 2024
Write data to Arrow file row by row General Usage arrow	7	1847	April 7, 2023
Conditionally read a subset of Arrow-data Data dataframes , arrow	9	1352	December 2, 2021

Arrow stream usage clarification

Related topics