Genetic Epidemiology tools

Hi,
I am working with genotyping data (PLINK format) and DNA methylation data (large matrices of numbers in range 0-1). Are there any recommended packages, workflows, or tutorials for working with these data in Julia?
I’m interested in performing GWAS as well as QTL-type analyses.

First of all, welcome!

Second - this was talked about in Slack, but it might be nice to get it here for posterity (slack messages vanish after a couple of weeks).

@Mateusz_K suggested VariantCallFormat.jl for VCF files

You found https://openmendel.github.io/SnpArrays.jl/ for PLINK, and there’s also GitHub - dmbates/BEDFiles.jl: Routines for reading and manipulating GWAS data in .bed files, though that’s much older, and from BioJulia, there’s GitHub - BioJulia/BED.jl (I don’t understand the relationship between BED and PLINK, but it seems like they’re related somehow?

1 Like

Thank you! I’ll try one of these. I see that BEDFiles.jl and SnpArrays.jl have both much better documentation - BioJulia could improve their package in this sense.

About the methylation data - it’s basically a large matrix (1000x800,000) with numbers in range 0-1. Do you know a fast way to read such data?

1 Like

Are a lot of those numbers 0? If so, you might​want to consider SparseArrays.jl. If not (or in addition), depending on the precision you need, you’ll probably want to use 32 or even 16 bit floats (Julia’s default is 64), to save on memory.

Depends on the file type. CSV.jl is very fast, but if it’s a super simple format, readdlm from Base may be enough. @jakobnissen is probably the most knowledgeable in BioJulia about how to go from file bytes to Julia data structures - I’d defer to him.

No, these are not sparse matrices, I can’t use this package. I tried readdlm with Float32, but it’s still quite big. Anyway - will keep trying, thanks for the help so far!

Just for the record, I’ve found a nice set of tools and tutorials for whole-genome analyses: JuliaHub :slight_smile:

3 Likes

For storing & fast reading, I’d give HDF5 a shot: GitHub - JuliaIO/HDF5.jl: Save and load data in the HDF5 file format from Julia.

1 Like

Oh, haven’t heard about this one! I’ll definitely give it a try!