Handling multiple files at the same time


I wish to replace some occurrence in a file and save it under a new file. The following does what I need, but it reads the entire file before making the substitution.
Is there a better way to do this ?

some_variable = "Oranges"

open("some_file", "r") do file
    global data = read(file, String)

data = replace(data, r"some text (?i)" => "$some_variable")

open("another_file", "w") do io
    println(io, data)

I’m using Julia 1.6.5 under Windows 7.


I think you should be able to nest open calls.

open("in.txt","r") do io_in
  open("out.txt","r") do io_out
   readline(io_in) |> print
   readline(io_out) |> print

you can also use open without a do block.

io_in = open("in.txt","r")
io_out = open("in.txt","w")
# do stuff

The implementation when using a function as a first argument or a do block looks like this:

function open(f::Function, args...; kwargs...)
    io = open(args...; kwargs...)

However, I am not sure if there is an implementation of replace that acts on data streams.

1 Like

Assuming that you can do your replacements on a line by line basis, something like this should work:

open("another_file", "w") do out
    for line in eachline("some_file")
        println(out, replace(line, r"some text (?i)" => "$some_variable"))

If you have very large files it’s probably better to read larger blocks than a line at a time, but you need to arrange it so the things you replace are not split between blocks.

Possibly of help, this is some production code of mine which solves the easier task of determining whether two files are identical.

block_size = 2 ^ 20
open(file.reference, "r") do reference
    open(file.filename, "r") do file
        while true
            reference_block = read(reference, block_size)
            file_block = read(file, block_size)
            reference_block == file_block || return false
            length(reference_block) < block_size && return true

I also found BufferedStreams.jl. This might implement some useful functionality such as anchoring a stream.

Also if the content you want to replace starts with a constant sequence readuntil might be useful.

Thank you all for your answers !

So, if I need a a line by line replacement, something like this would be good?

some_variable = "Oranges"

open("some_output_file", "w") do file_out
open("some_intput_file", "r") do file_in
    while !eof(file_in)
        readline(file_in) |> data -> replace(data, r"some text (?i)" => "$some_variable") |> data -> do_other_stuff(data) |> data -> println(file_out, data)

And I can optimize by using either readuntil(data, "prefix") for text data, or by specifying a bigger block size for binary data and pass it to the read function.

@feanor12 what is “anchoring a stream” ?

And a last question, when I try to put the pipe operator on a new line I get the following error:

     |> data -> replace(data, r"some text (?i)" => "$some_variable")
     |> data -> do_other_stuff(data) 
     |> data -> println(file_out, data)
ERROR: syntax: "|>" is not a unary operator

Is there a possibility to put the pipe operator in a newline, so if I have several operations, or a lot of parameter it improves code readability ?

There is a descriptions of anchors that can be found here : Input Streams · BufferedStreams

Basically it marks a specific part of the stream that is kept in the buffer even when it is refilled. This makes it easier to deal with partial matches.

To improve the pipe problem there are different packages like Chain.jl, Pipe.jl or Lazy.jl, but if you put |> at the end of the lines it should work.