It looks like we can get most of the way with the new PythonCall.jl package with a Tables.jl compatible interface as well:
(@v1.7) pkg> activate --temp
Activating new project at `/tmp/jl_cjhpZ1`
(jl_cjhpZ1) pkg> add CondaPkg
(jl_cjhpZ1) julia> using CondaPkg
(jl_cjhpZ1) pkg> conda add --pip pybaseball
(jl_cjhpZ1) pkg> add PythonCall
(jl_cjhpZ1) julia> using PythonCall # Should auto resolve and add pybaseball
(jl_cjhpZ1) julia> @py import pybaseball as pyb
This is a large query, it may take a moment to complete
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.29s/it]
Python DataFrame:
pitch_type game_date release_speed ... spin_axis delta_home_win_exp delta_run_exp
1206 SL 2017-06-26 83.8 ... 142 0.001 -0.416
1227 FF 2017-06-26 92.7 ... 198 0.0 0.0
1278 SL 2017-06-26 83.1 ... 99 0.0 0.087
1308 SL 2017-06-26 84.4 ... 124 0.0 0.0
1324 SL 2017-06-26 83.6 ... 130 0.0 0.0
... ... ... ... ... ... ... ...
3785 FF 2017-06-24 91.8 ... 182 0.022 -0.216
3978 FS 2017-06-24 82.6 ... 256 0.0 0.043
4026 SL 2017-06-24 85.9 ... 119 0.0 -0.062
4173 FF 2017-06-24 91.9 ... 192 0.0 -0.046
4244 FF 2017-06-24 92.4 ... 193 0.0 0.036
[11434 rows x 92 columns]
(jl_cjhpZ1) julia> df_py = pyb.statcast(start_dt="2017-06-24", end_dt="2017-06-26")
(jl_cjhpZ1) julia> tbl = PyTable(df_py)
11434×92 PyPandasDataFrame
pitch_type game_date release_speed ... spin_axis delta_home_win_exp delta_run_exp
1206 SL 2017-06-26 83.8 ... 142 0.001 -0.416
1227 FF 2017-06-26 92.7 ... 198 0.0 0.0
1278 SL 2017-06-26 83.1 ... 99 0.0 0.087
1308 SL 2017-06-26 84.4 ... 124 0.0 0.0
1324 SL 2017-06-26 83.6 ... 130 0.0 0.0
... ... ... ... ... ... ... ...
3785 FF 2017-06-24 91.8 ... 182 0.022 -0.216
3978 FS 2017-06-24 82.6 ... 256 0.0 0.043
4026 SL 2017-06-24 85.9 ... 119 0.0 -0.062
4173 FF 2017-06-24 91.9 ... 192 0.0 -0.046
4244 FF 2017-06-24 92.4 ... 193 0.0 0.036
[11434 rows x 92 columns]
I think this can usually be converted to a DataFrame by just doing:
(jl_cjhpZ1) julia> using DataFrames
(jl_cjhpZ1) julia> df = DataFrame(tbl)
but here it looks to throw an out of bounds datetime error that may be related to this: python - pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset - Stack Overflow
Sorry if this is the wrong place to ping you @cjdoris, but would this be something that could (or should) be handled on PythonCall.jl’s end in its datetime conversions?