Hello,
I wish to replace some occurrence in a file and save it under a new file. The following does what I need, but it reads the entire file before making the substitution.
Is there a better way to do this ?
some_variable = "Oranges"
open("some_file", "r") do file
global data = read(file, String)
end
data = replace(data, r"some text (?i)" => "$some_variable")
open("another_file", "w") do io
println(io, data)
end
I’m using Julia 1.6.5 under Windows 7.
Thanks!
I think you should be able to nest open
calls.
open("in.txt","r") do io_in
open("out.txt","r") do io_out
readline(io_in) |> print
readline(io_out) |> print
end
end
you can also use open without a do
block.
io_in = open("in.txt","r")
io_out = open("in.txt","w")
# do stuff
close(io_in)
close(io_out)
The implementation when using a function as a first argument or a do
block looks like this:
function open(f::Function, args...; kwargs...)
io = open(args...; kwargs...)
try
f(io)
finally
close(io)
end
end
However, I am not sure if there is an implementation of replace
that acts on data streams.
1 Like
Assuming that you can do your replacements on a line by line basis, something like this should work:
open("another_file", "w") do out
for line in eachline("some_file")
println(out, replace(line, r"some text (?i)" => "$some_variable"))
end
end
If you have very large files it’s probably better to read larger blocks than a line at a time, but you need to arrange it so the things you replace are not split between blocks.
Possibly of help, this is some production code of mine which solves the easier task of determining whether two files are identical.
block_size = 2 ^ 20
open(file.reference, "r") do reference
open(file.filename, "r") do file
while true
reference_block = read(reference, block_size)
file_block = read(file, block_size)
reference_block == file_block || return false
length(reference_block) < block_size && return true
end
end
end
I also found BufferedStreams.jl. This might implement some useful functionality such as anchoring a stream.
Also if the content you want to replace starts with a constant sequence readuntil
might be useful.
Thank you all for your answers !
So, if I need a a line by line replacement, something like this would be good?
some_variable = "Oranges"
open("some_output_file", "w") do file_out
open("some_intput_file", "r") do file_in
while !eof(file_in)
readline(file_in) |> data -> replace(data, r"some text (?i)" => "$some_variable") |> data -> do_other_stuff(data) |> data -> println(file_out, data)
end
end
end
And I can optimize by using either readuntil(data, "prefix")
for text data, or by specifying a bigger block size for binary data and pass it to the read
function.
@feanor12 what is “anchoring a stream” ?
And a last question, when I try to put the pipe operator on a new line I get the following error:
readline(file_in)
|> data -> replace(data, r"some text (?i)" => "$some_variable")
|> data -> do_other_stuff(data)
|> data -> println(file_out, data)
ERROR: syntax: "|>" is not a unary operator
Is there a possibility to put the pipe operator in a newline, so if I have several operations, or a lot of parameter it improves code readability ?
There is a descriptions of anchors that can be found here : https://biojulia.net/BufferedStreams.jl/stable/inputstreams.html#Anchors-1
Basically it marks a specific part of the stream that is kept in the buffer even when it is refilled. This makes it easier to deal with partial matches.
To improve the pipe problem there are different packages like Chain.jl, Pipe.jl or Lazy.jl, but if you put |>
at the end of the lines it should work.