I am trying to solve the “Predict Data Sales” on kaggle, using the DataFrames
, Plots
and CSV
packages. Here is my code, named processdata.jl
, so far:
@time using CSV, DataFrames, Plots
println("finished importing")
#
# Load datasets
@time begin
itemcat = DataFrame(CSV.File("item_categories.csv"))
items = DataFrame(CSV.File("items.csv"))
salestrain = DataFrame(CSV.File("sales_train.csv"))
shops = DataFrame(CSV.File("shops.csv"))
test = DataFrame(CSV.File("test.csv"))
end
println("finished loading datasets")
# Droping useless columns
@time select!(test, Not(:ID))
println("finished droping datasets' columns")
# Transforming data.
@time begin
getmonth(x) = split(x, '.')[2]
gsalestrain = groupby(salestrain, :item_id)
salestrainmonth = transform(gsalestrain, :date => x -> getmonth.(x))
end
println("finished transforming loading datasets")
Now, some details:
-
I am
using Revise
and hadincludet("processdata.jl")
at the beggining of the session, it was all working perfectly. -
The
salestrain
dataframe has around 3 milion rows. -
Those macros weren’t there originally: I put a time macro on every piece of the code because when I tryied to acess the
salestrainmonth
variable (or any other variable, as a matter of fact) in the REPL, it just took LITERALLY forever: the most I waited for was ten minutes, and it just wasn’t able to finish the command, eventualy I Ctrl+C’ed the process.
And why I say that the problem is with Revise
? Because when I run this exact code in the command line with julia processdata.jl
, it takes long but it does the job, here is the output:
18.799342 seconds (28.28 M allocations: 1.436 GiB, 3.84% gc time)
finished importing
7.724473 seconds (11.05 M allocations: 992.623 MiB, 5.77% gc time)
finished loading datasets
0.000029 seconds (31 allocations: 2.359 KiB)
finished droping datasets' columns
8.471720 seconds (47.52 M allocations: 2.178 GiB, 10.96% gc time)
finished transforming loading datasets
Since I added the macros I tried to reload the julia REPL, but as soon as I entry the first command: using Revise; includet("processdata.jl")
it prints:
16.395183 seconds (27.99 M allocations: 1.425 GiB, 4.79% gc time)
finished importing
7.853719 seconds (11.02 M allocations: 991.699 MiB, 5.23% gc time)
finished loading datasets
0.000032 seconds (31 allocations: 2.359 KiB)
finished droping datasets' columns
and then it’s just silence and nothing else (except for my notebook’s fan screaming).
@Edit
To shine some more light into this bug, I also tried to run julia --compile=min processdata.jl
and, in this case, the only thing it was able to do was the imports. Here is the output, with also the “After interruption” (I Ctrl^C’ed after waiting 5 minutes):
julia --compile=min processdata.jl
9.435052 seconds (10.61 M allocations: 633.672 MiB, 1.69% gc time)
finished importing
^C
signal (2): Interrupção
in expression starting at /home/ttv1/Documents/Codes/Julia/Kaggle/PredictFutureSales/processdata.jl:5
jl_f_arrayref at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5990a90855)
unknown function (ip: 0x7f5990a904fa)
unknown function (ip: 0x7f5990a91e01)
jl_fptr_interpret_call at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5990a90855)
unknown function (ip: 0x7f5990a904fa)
unknown function (ip: 0x7f5990a91e01)
jl_fptr_interpret_call at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5990a90855)
unknown function (ip: 0x7f5990a904fa)
unknown function (ip: 0x7f5990a9209a)
jl_fptr_interpret_call at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5990a90855)
unknown function (ip: 0x7f5990a904fa)
unknown function (ip: 0x7f5990a91e01)
jl_fptr_interpret_call at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5990a90855)
unknown function (ip: 0x7f5990a904fa)
unknown function (ip: 0x7f5990a91e01)
jl_fptr_interpret_call at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5990a90855)
unknown function (ip: 0x7f5990a904fa)
unknown function (ip: 0x7f5990a91d84)
unknown function (ip: 0x7f5990a92832)
unknown function (ip: 0x7f5990ab0371)
unknown function (ip: 0x7f5990a84d9e)
jl_load at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f5983a7e4bb)
unknown function (ip: 0x7f5983a8a11f)
unknown function (ip: 0x7f5983a8a7e2)
unknown function (ip: 0x7f5983a8a925)
unknown function (ip: 0x5611d8acc4fe)
unknown function (ip: 0x5611d8acc0a7)
__libc_start_main at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x5611d8acc15d)
unknown function (ip: (nil))
Allocations: 220195535 (Pool: 220192984; Big: 2551); GC: 34
@Edit2
(Sorry for that many edits, maybe I should have done more tests before posting, but I thought it wasn’t much more complex than it looked like)
Even if I comment the line:
#salestrainmonth = transform(gsalestrain, :date => x -> getmonth.(x))
The problem still persists. So, only if I also comment the line before it:
#gsalestrain = groupby(salestrain, :item_id)
Revise
is able to includet
my file. Now, an even stranger phenomena is that if I comment both lines and then copy&paste them into the REPL, Julia is able to normally create both gsalestrain
and salestrainmonth
. That reinforces my thesis that Revise
is the one struggling to make it work.
@Edit3
I was doing some tweaking and discovered the test
function for packages, here it goes my output for test Revise
:
(@v1.4) pkg> test Revise
Testing Revise
Status `/tmp/jl_eB5WW1/Manifest.toml`
[aafaddc9] CatIndices v0.2.1
[da1fd8a2] CodeTracking v0.5.11
[dc8bdbbb] CustomUnitRanges v1.0.0
[340492b5] EndpointRanges v0.2.0
[97e2ac4a] EponymTuples v0.2.2
[7876af07] Example v0.5.3
[aa1ae85d] JuliaInterpreter v0.7.14
[6f1432cf] LoweredCodeUtils v0.4.4
[dbb5928d] MappedArrays v0.2.2
[6fe1bfb0] OffsetArrays v1.0.4
[bac558e1] OrderedCollections v1.2.0
[ae029012] Requires v1.0.1
[295af30f] Revise v2.6.5
[2a0f44e3] Base64
[ade2ca70] Dates
[8ba89e20] Distributed
[7b1f6079] FileWatching
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[56ddb016] Logging
[d6f4376e] Markdown
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[6462fe0b] Sockets
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
Skipping Base.active_repl
Skipping Base.active_repl_backend
Comparison and line numbering: Test Failed at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:332
Expression: bt.func == :cube && (bt.file == Symbol(tmpfile) && bt.line == 7)
Stacktrace:
[1] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:332
[2] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:212
[4] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[5] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:79
Comparison and line numbering: Test Failed at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:340
Expression: bt.func == :mult2 && (bt.file == Symbol(tmpfile) && bt.line == 13)
Stacktrace:
[1] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:340
[2] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:212
[4] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[5] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:79
Line numbers in backtraces and warnings: Test Failed at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:1334
Expression: bt.file == Symbol(filename) && bt.line == 2
Stacktrace:
[1] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:1334
[2] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:1314
[4] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[5] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:79
Line numbers in backtraces and warnings: Test Failed at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:1358
Expression: bt.file == Symbol(filename) && bt.line == 3
Stacktrace:
[1] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:1358
[2] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:1314
[4] top-level scope at /build/julia/src/julia-1.4.1/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[5] top-level scope at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:79
Revise is currently tracking the following files in ReviseFileNow [top-level]: ["src/ReviseFileNow.jl"]
[ Info: tracking Base
┌ Warning: skipping git tests because Revise is not under development
└ @ Main ~/.julia/packages/Revise/MgvIv/test/runtests.jl:2378
extra
[ Info: tracking Pkg
┌ Warning: skipping Core.Compiler tests due to lack of git repo
└ @ Main ~/.julia/packages/Revise/MgvIv/test/runtests.jl:2486
┌ Warning: REPL tests skipped
└ @ Main ~/.julia/packages/Revise/MgvIv/test/runtests.jl:2529
Test Summary: | Pass Fail Broken Total
Revise | 638 4 1 643
PkgData | 1 1
Package contents | 1 1
LineSkipping | 8 8
Equality and hashing | 5 5
Parse errors | 4 4
Signature extraction | 4 4
Comparison and line numbering | 73 2 1 76
Display | 7 7
File paths | 209 209
Base & stdlib file paths | 5 5
Recursive types (issue #417) | 1 1
Cross-module extension | 6 6
@__FILE__ | 2 2
Module docstring | 6 6
Changing docstring | 3 3
Undef in docstrings | 72 72
Macro docstrings (issue #309) | 4 4
Changing @inline annotations | 16 16
Revising macros | 7 7
More arg-modifying macros | 4 4
Line numbers | 5 5
Line numbers in backtraces and warnings | 4 2 6
New submodules | 3 3
Timing (issue #341) | 3 3
Method deletion | 60 60
revise_file_now | 4 4
Evaled toplevel | 4 4
Revision errors | 37 37
Retry on InterruptException | 17 17
get_def | 7 7
Pkg exclusion | 3 3
Manual track | 18 18
Auto-track user scripts | 4 4
Distributed | 15 15
Git | 2 2
Recipes | 11 11
CodeTracking #48 | 1 1
Methods at REPL | No tests
baremodule | 2 2
ERROR: LoadError: Some tests did not pass: 638 passed, 4 failed, 0 errored, 1 broken.
in expression starting at /home/ttv1/.julia/packages/Revise/MgvIv/test/runtests.jl:78
ERROR: Package Revise errored during testing
I tried to reinstall it (Revise
); it didn’t work nor did it changed the output above. Then I tried to reinstall Julia
as whole, it also didn’t work.
I guess I already made the post too long, so I will just leave the output of lspci
and uname -a
and go on another way.
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 930M] (rev a2)
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 07)
Linux restofunesto 5.4.15-arch1-1 #1 SMP PREEMPT Sun, 26 Jan 2020 09:48:50 +0000 x86_64 GNU/Linux
I discovered Julia
recently, and I gotta tell that it was love at first sight (god knows my feelings for the numpy/pandas schizophrenic syntax), and I really wanna make it work, so I won’t give up on this is problem: but I really value workflow, and since quick-and-dirty coding with the “write code → run script” loop is not possible due to long compilation time and since Revise
seems to be the only viable alternative to it; I really am lost.