Removing the first line in a text file

Jakob · March 4, 2020, 2:39pm

Hi everyone,

how would I go about removing the first line of a (potentially very large) text file, ideally without having to load the file into a DataFrame or copying the full file?

Best
Jakob

pdeffebach · March 4, 2020, 2:56pm

Check out working with text files from the Julia wikibook.

open("file_to_read") do f
       io = open("test.txt", "w")
       i = 1
       for l in eachline(f)
          i != 1 && println(io, l) 
          i += 1
       end
       close(io)
end

mbauman · March 4, 2020, 3:08pm

Do you just want to save it back on your disk in-place? It’d be nice to just tell the filesystem that the file starts at a new place, but I don’t think that’s a supported operation on any system and would likely have to be removed in exact multiples of 1 or 2k. I think the only way to do it is to work through the entire file and re-write it.

pixel27 · March 4, 2020, 3:13pm

Do you need to write a program? I mean if you are on linux you would do:

   tail -n +2 input.txt > output.txt

Not sure what the Window’s equivalent would be…I’d probably install the Ubuntu “App” to get a bash prompt with tail, and run from in there.

yuyichao · March 4, 2020, 3:14pm

Note that this adds an unconditional new line at the end, which may or may not be OK.

stevengj · March 4, 2020, 6:05pm

Even simpler:

open("file_to_read") do input
    open("test.txt", "w") do output
        for line in Iterators.drop(eachline(input), 1)
            println(output, line)
        end
    end
end

If you want to maximize speed, and avoid the extraneous newline pointed out by @yuyichao, it would be faster to read everything after the newline in a block:

open("file_to_read") do input
    readuntil(input, '\n')
    write("file_to_write", read(input))
end

This implementation is even shorter than the code based on eachline. (And it will be vastly faster than spawning a Unix program like tail, not to mention being more portable.)

The only downside of this approach is that it might take a lot of memory if you have an enormous file. A more general implementation would probably read the data in large blocks, similar to this code.

jling · March 4, 2020, 8:04pm

why not use POSIX tools…
https://superuser.com/questions/284258/remove-first-line-in-bash

stevengj · March 4, 2020, 9:53pm

POSIX tools are great, but Julia code that relies on them will not be portable.

Also, spawning executables takes a lot of time, so for simple tasks it is often orders of magnitude faster to use Julia code than to spawn a POSIX command-line program.

For example, run(pipeline(`tail -n +2 $inputname`, stdout=outputname)) seems to be about 10²× slower on my computer than my native-Julia readuntil code above for most files.

jling · March 4, 2020, 10:40pm

I agree with everything you said it’s just that op didn’t say they need programmatically do this multiple times on different occasions, so I thought I’d mention sed in case it’s just a one time thing.

pixel27 · March 4, 2020, 11:51pm

If you are generating the files in Julia then the most efficient solution would be to create a structure where you’ve implemented the IO methods for. That way you can filter out the first line and write the rest of the data.

Jakob · March 5, 2020, 9:10am

This is really neat! (There is an extra ) though after "file_to_write" )
One can even point "file_to_read" and "file_to_write" to the same file to do the in-place replacement. As I’ve changed the workflow to cutting large files into smaller chunks, memory is not really an issue anymore.

Thanks for all the replies btw., julia really does have an awesome community!

stevengj · March 5, 2020, 3:24pm

If you want the option to do this in-place, I would change the code to:

write("file_to_write", 
    open("file_to_read") do input
        readuntil(input, '\n')
        read(input)
    end)

so that the file is closed after reading before opening it to write.

Topic		Replies	Views
Changing strings at the beginning of a large file Data io	12	523	December 20, 2021
Preserving File Structure when Reading and Writing New to Julia	12	1357	August 29, 2018
How to read only the last line of a file (.txt)? General Usage question , io	24	4877	September 12, 2021
Write to a particular line in a file General Usage question	4	1873	November 15, 2020
Is it possible to reset a new-line character when read txt file in julia? New to Julia question	6	1509	January 7, 2019

Removing the first line in a text file

Related topics