I used Arrow.write()
from Arrow.jl 1.2.4 and read_feather()
from R arrow 3.0.0 but the read said errored:
Error in ipc___feather___Reader__Open(file) :
Invalid: Not a Feather V1 or Arrow IPC file
How can I read this file from R?
I used Arrow.write()
from Arrow.jl 1.2.4 and read_feather()
from R arrow 3.0.0 but the read said errored:
Error in ipc___feather___Reader__Open(file) :
Invalid: Not a Feather V1 or Arrow IPC file
How can I read this file from R?
I donât have difficulty with that combination on Ubuntu 20.10
bates$ julia-1.5.3 -t auto -O3
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.3 (2020-11-09)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using Arrow
(@v1.5) pkg> status Arrow
Status `~/.julia/environments/v1.5/Project.toml`
[69666777] Arrow v1.2.4
julia> Arrow.write("/tmp/arrowtest.arrow", (x = rand(6), f = repeat(["A","B"], inner=3)))
"/tmp/arrowtest.arrow"
julia>
bates$ R
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> require("arrow")
Loading required package: arrow
Attaching package: âarrowâ
The following object is masked from âpackage:utilsâ:
timestamp
> read_feather("/tmp/arrowtest.arrow")
x f
1 0.9617653 A
2 0.4586787 A
3 0.3014748 A
4 0.4049147 B
5 0.6989243 B
6 0.3541891 B
open("/tmp/my.arrow", "w") do f; RDatasets.dataset("ggplot2", "diamonds")) end
works with Râs read_feather("/tmp/my.arrow")
but
open("/tmp/my.arrow", "w") do f
Arrow.write(f, RDatasets.dataset("ggplot2", "diamonds"))
end
gives an error in R
> df = read_feather("/tmp/my.arrow")
Error in ipc___feather___Reader__Open(file) :
Invalid: Not a Feather V1 or Arrow IPC file
I think we may need @quinnj to weigh in here. Arrow.write
can write either the file format or the memory format and I suspect that is the distinction here. The file format has a magic number in the first 6 characters of âARROW1â
julia> String(read("/tmp/my.arrow")[1:6])
"ARROW1"
Hmmm, weird. So there are 3 distinct formats possible:
So it seems weird that the R package says it expects a Feather V1 or an Arrow IPC, but seems to not be able to read Feather V2? Or maybe Iâm getting this backwards because it seems from your example that if you try to write to a filename as a String (which produces Feather V2) that seems to work, but the IPC doesnât?
I think it may be the error message from the R arrow package that is incorrect. I certainly have had no problem reading the Feather V2 format files in R.
Yeah, looking back over the examples, I think thatâs right. I think they donât support reading the raw arrow IPC messages.
I think another function, read_ipc_stream
, is used for reading the raw arrow IPC messages.