I am a Ph.D. student at the University of Tokyo, Japan, and a core developer of the BioJulia project. I’d like to participate in the summer of code this year again.
The project idea I’d like to propose is introducing parallelism in BioJulia. Today’s computational biology faces the problem of growing data, and hence BioJulia developers have been careful of the design and algorithms to squeeze computational power out of a single CPU. However, we haven’t paid so much attention to parallelism in the project because enriching the functionality has had higher priority. I’ve implemented lots of new tools in Bio.jl I need this and last year, and my lab mates are starting to use it in their researches. I think it’s time to make it faster with the power of multiple cores.
The ease of use will be the highest priority. What biologists want to do is to finish their jobs faster, not to write fast but complicated code. So I think an approach like dask and Dagger.jl would be the best way to go, which parallelizes computation of delayed tasks (or thunks) using a task scheduler. I’m going to focus on single node parallelism since distributed computing in a computer cluster would be too much for a summer project.