I’m currently trying to parse a file whose data lines are composed of 8 character columns without delimiter. Such as
<.. 1..><.. 2..><.. 3..><.. 4..><.. 5..><.. 6..><.. 7..><.. 8..><.. 9..><..10..>
11111111222222223333333344444444555555556666666677777777888888889999999911111111
So far, I’m working with indices and comprehension to do that with
split8_1(str::String) = [str[i+1:8] for i in 0:8:length(str)]
However, I wonder if there is a more elegant or maybe built-in way to do that. Currently split
only works by specifying a delimiter, not an index/indices.
I’ve found a potential solution with Iterators.partition
under the form of
split8_2(str::String) = join.(Iterators.partition(str, 8))
split8_3(str::String) = Iterators.partition(str, 8)
The join
is necessary to get the substrings and not vectors of characters, but give a huge performance hit.
Here are the benchmarks for the three implementations
julia> @benchmark split8_2($str)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 3.311 μs … 1.391 ms ┊ GC (min … max): 0.00% … 99.59%
Time (median): 3.956 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.413 μs ± 17.370 μs ┊ GC (mean ± σ): 5.50% ± 1.41%
█▆▅
▁▁▁▁▁▁▂▄████▇▅▄▃▃▃▃▃▃▃▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
3.31 μs Histogram: frequency by time 6.59 μs <
Memory estimate: 3.16 KiB, allocs estimate: 53.
julia> @benchmark split8_3($str)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.000 ns … 3.100 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.200 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.187 ns ± 0.075 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▂
1 ns Histogram: frequency by time 1.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark split8_1($str)
BenchmarkTools.Trial: 10000 samples with 870 evaluations.
Range (min … max): 140.805 ns … 12.287 μs ┊ GC (min … max): 0.00% … 98.36%
Time (median): 193.333 ns ┊ GC (median): 0.00%
Time (mean ± σ): 214.173 ns ± 359.497 ns ┊ GC (mean ± σ): 6.23% ± 3.66%
▁▂█▆▂▁ ▄▃▂
▂▂▂▂▂▃▅▂▂▃▂▂▃▃▅▅██████▅▃▃▂▂▃▄▅▇▇████▇▅▄▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁ ▃
141 ns Histogram: frequency by time 282 ns <
Memory estimate: 176 bytes, allocs estimate: 2.