Northeastern University: Looking for Research Projects for Student Teams!

Hello everyone :wave:

I have some very exciting news to share for the Julia Community! After discussion with Northeastern University, I am pleased to share that we have the opportunity to bring in Julia-related projects into the graduate mathematics curriculum in Northeastern University’s Mathematics Department!

What this means is that folks within the Julia community can develop research projects that can be mentored by you and explored by student teams. Here are some of the classes offered at Northeastern University where these projects could take place:

Any potential projects will mostly be brought in through Northeastern University’s Experiential Network Program. In general, the projects we are looking for fit the following description:

These short-term projects help students apply their skills to solve problems with real parameters and constraints that deliver real value to project sponsors. Students complete projects remotely and typically work 30-40 hours on a project over approximately six weeks. Projects are scoped and evaluated by sponsors with assistance from the XN staff, who provide the administrative and operational support for the academic programs and the student participants.

To see an example of two project proposals and what is needed, click below:

Project 1: Brain image segmentation

Title: XN Project Title: Brain CT image hemorrhage segmentation by artificial neural network

Description: Based in Kenilworth, New Jersey, a global healthcare leader is working to help the world be well. Through prescription medicines, vaccines, biologic therapies, and consumer care and animal health products, the company works with customers and operate in more than 140 countries to deliver innovative health solutions. The company demonstrates a commitment to increasing access to healthcare through far-reaching policies, programs and partnerships. The company is interested in mining publicly available genomic data sets to identify cancer vulnerabilities, differential cancer gene dependencies and classify predictive models that share common genetic determinants using CRISPR KO gene dependencies, mutational and copy number profiles and RNA expression data obtained from over 1000 cancer cell lines. This task requires application of methods that can integrate diverse data types and machine learning algorithms to classify the lineages that are dependent on essential genes for tumorigenesis. Broadly the main questions are, which algorithms are suitable for mixed data distribution types for classification and scalable for effective computing.

Deliverables: Students will 1) Identify a machine learning algorithm that is best suitable for data and classifies lineages into significant clusters. 2) Test if the classification schemes derived by associating gene dependencies with other data sets such as RNA expression, pathway signatures, DNA mutation and copy number data are unique. 3) Determine whether the machine learning algorithm can be extended to identification of predictors of general viability loss rather than specific priors lineage-specific genes only.

Project 2: Oncology Targets and Biology

Title: Machine Learning -Find Novel Intrinsic Oncology Targets and Biology

Description: The Zeta Surgical company was started by a team of Harvard graduates and academics. The mission is to democratize the access to accurate, safe and fast image-guidance, to unlock the use of image guidance directly at the point-of-care, and to enable new treatments in cases such as emergencies and bedside procedures. In this project, the company provides students a dataset consists of various brain CT scan slices, each of which has a hemorrhage (bleeding) within it. The hemorrhages have been labeled in some of the images. These hemorrhages are divided into different types: intraparenchymal, intraventricular, subarachnoid, subdural, epidural, and category for images with multiple sources of bleeding. Students will use techniques in machine learning, computer vision, user interface, and data analytics to do the classification, regression and the segmentation of these CT images.

Deliverable: Students will develop mathematical models, apply machine learning and artificial neural network techniques, and program using Python and TensorFlow to investigate the labeled dataset. The goal of the project are complete the machine learning model with python scripts for classification, regression and more importantly the segmentation of the brain hemorrhage in CT images.

At this time, I am soliciting potential interest in projects for the Fall 2024 semester. We are particularly interested in potential projects related to health, biology, and climate research and a priority on projects that use Python and/or Julia! Any of these projects could be used as the foundation for future research, publication, or demonstration purposes.

Feel free to reach out to me here in the comments, at my email at, or on Slack @TheCedarPrince.

Thanks everyone!

~ tcp :deciduous_tree:

P.S. We will be discussing additional details and questions about this at the upcoming monthly JuliaHealth meeting – I would highly advise coming through if you are interested as I will be discussing additional potential project ideas and available to answer any questions!


Thank you @TheCedarPrince. What folks from the Julia community gain in return? Appreciate if you can elaborate on that.


Great question! Here is, in my opinion, what members from the Julia community stand to gain from this:

  • Imagine you are a busy Julia programmer or developer. You come across a couple research questions that might be interesting to explore but don’t have time to do so. A student team could do some minor exploration for you so you could know if the idea can be successful or lead to another exciting line of research.

  • You’re a research software engineer and want to show how your tools can compose with the rest of the Julia ecosystem but lack the time to come up with examples. The student team could use your tools, compose it with relevant packages, and write a final report that could become an example or two in your package’s documentation.

  • Suppose there are folks who have never worked with Julia before. You pitch a project and a student or two through this becomes a bona fide Julia programmer in the course of these projects and now becomes an active member of the Julia community and continues to enrich the experience of the Julia ecosystem with different background and perspectives.

It’s a two-way street where there are many things students stand to benefit from too in this, but the gist of this answer is that you can get out of the project, what you scope the project to be ideally.

Does that help give a general sense of what folks could benefit from with this @juliohm?


Thank you @TheCedarPrince for the answer. I just wanted to double check that there are no financial resources involved, and that the initiative relies on the generosity of folks.

Some university programs pay folks, who are not affiliated with the university, to mentor students online (e.g., help with questions in forums, grade their projects).


Ah I see what you are talking about. No, there are no financial resources involved so it would dependent on the generosity of folks and their interest in seeing a project come to bear.

For folks who may have been unable to attend the call today, here is a link to a video where I discuss more about the opportunity and answer some questions and give some more example project ideas:

My propositions
A) implement foundational models for medical image segmentation and regression, similar to monai network architectures [1]. Implementation should be tested on medical decathlon datasets [2] .
Expected outcome :
Set of the well established architectures, build in a modular way. Modularity shluld lead to the possibility of exchanging encoders and decoders from diffrent architectures. Each level of encoder or decoder should be implemented as a parametrirised module. There should benimplemented function that given parameters would return decoder, encoder, or whole network. All parameters of the constructed network should be present in struct to enable hyperparameter tuning. Models should be implemented in Julia using Lux.jl [3], and MedPipe3D.jl [4]. Final report should include the results of architectures with default settings (those can be based on monai [1]) on chosen datasets from medical decathlon [2] for segmentation use dice score. Report also training and infrence time, as well as generalizibility of the model measured by performance of the algorithm on ct-org [5]

B) Adapt and or add explainability algorithms in julia libraries [6] for 3d medical imaging. Taking into account ussually high memory requirements caused by big size of medical images. As the additional example analyze python captum library [7] .
Expected outcome :
Algorithms should include at least guided backpropagation including layer wise [8], occlusion [9] and training data attribution [10]. Experiments should be done using classic unet architecture in Julia using Lux.jl [3], and MedPipe3D.jl [4]. Final report should include experiments on medical decathlon datasets [2] . for guided backpropagation petform image wise classification of prostate based on picai dataset [11] and show that guided backpropagation shows approximately area of cancer present based on dataset labels. For training data attribution corrupt the segmentation dataset by dilatating or eroding labels using mathemathical morphology and show that training data attribution can find those samples as outliers. For occlusion use picai dataset and show that classification of cancer vs lack of it is harder when occluding area where this cancer is based on labels. Put results explanations, interpratations (for medical interpreterion consult me (Jakub Mitura) and visualization in final report

C) Implementing extraction of radiomic features of 2d and 3d medical images similar to pyradiomics [12] . Perform experiments on pet/ct data from autopet dataset [13].
Expected outcome : implement at least GLCM, GLSZM, GLRM, NGTDM, and GLDM [12] . Extract all features for segmented lesions in the auto pet dataset from both pet and ct modalities, and check weather you can using probabilistic methods from Turing.jl [14] you can differentiate between diffrent cancer types. In report describe methodology, give results of the model , and show that features values give the same result as in pyradiomics [12]. Implementation should be using medpipe3d [4] or medimage [15]. Extraction methods should be based on parallel stencil [17] or kernel abstractions [18]

D) implent capsule network architecture [16] for 3d medical imaging .

Expected outcome : implement capsule network,
Using Lux.jl [3], and MedPipe3D.jl [4]. provide explanation why thise network architecture can improve generalizability, adapt implement chosen variants of architecture from literature. Perform experiments trying to proove its generalizibility comparing it to classical unet on medical decathlon dataset.

  1. MONAI › networks
    Network architectures — MONAI 1.3.1 Documentation

  2. Medical Segmentation Decathlon
    Medical Segmentation Decathlon

  3. GitHub - LuxDL/Lux.jl: Explicitly Parameterized Neural Networks in Julia

  4. GitHub - JuliaHealth/MedPipe3D.jl

  5. Nature › articles
    CT-ORG, a new dataset for multiple organ segmentation in computed tomography

  6. Julia Explainable AI · GitHub

7)Algorithm Descriptions · Captum


  1. [1311.2901] Visualizing and Understanding Convolutional Networks

  2. arXiv › cs
    Exploring Practitioner Perspectives On Training Data Attribution


  4. PyRadiomics
    pyradiomics documentation!



  7. Redirect Notice






Currently developing the JuliaSurv org to implement on solid and future-proof grounds all kinds of survival analysis in Julia, I think I might be able to propose one or two projects that match :

Here are implementation targets in survival analysis that would be suitable for such a timeframe:

Project 1: Non-parametric survival analysis.

Title: Non-parametric survival analysis groungind in JuliaSurv

Description: There is no implementation (yet) of non-parametric survival analsys methods (kaplan-meier, aalen, etc…) in JuliaSurv. Since these methods are very central in the literature, they need to be implemented with care as they will serve as a base for other implementation later.

Project 2: Hazard regression

Title: Hazard regression in Julia

Description: Hazard regression represents a group of modeling techniques that are targeted at survival analysis. A general interface could be developped, based on what is already there at juliasurv, to propose these models to the practitionner. They are used a lot and thus would be really useful, for the moment people have to go to R to get them.

Project 3: Survival outcomes for neural networks.

Title: Survival outcomes for neural networks.

Description: Implement bindings to be able to use survival outcomes for a neural network from e.g. Flux.jl: survival outcomes means censored outcome, so this is not exactly the same as hazard modeling but has the same goal: being able to use censored outcomes for a neural net would be very nice and open a lot of applications !

Tell me if you feel like this would be interesting indeed for your students (e.g, they already learned survival analysis, and these skills are indeed in scope of what you want them to do). If so, I could expand the descriptions into Descriptions/Deliverables and provide more details as for now this is a bit sparse.

In both cases, data sources relevant for the implementations are all around (in particular in R’s packages) and usually open source.


Hey @lrnv and @Jakub_Mitura – these are amazing ideas! I will follow-up on the Discourse discussions here more shortly; I am bit busy for a few days but will respond early next week. Thank you!!!

1 Like

Additional idea (I do not have sufficient knowledge of the underlying algorithms to mentor it, but I know it would be very usefull)
Expected outcome:
Create a package that will calculate required sample size for study group and external validation for various task like binary prediction, multiclass prediction, prediction of continuous variable including time to event.
algoriths described for example in

big part of those articles them has short R code available

if a team of students could parse it to Julia and organize would be immensly helpfull for anybody planning a study.

1 Like

Hey @Jakub_Mitura,

First of all, a tremendous THANK YOU for your ideas and your thorough references! I am going to respond to each idea here.

Here is my personal rubric I used to determine my thoughts on the “fitness” for each project. This is my personal rubric and I put it together to help frame my comments and make sure your project is a good fit with the program at NEU). It is based on syllabuses from courses at the NEU mathematics department:

  • Scope of Work (1 - 10): How much work or tasks I see here for the students based on the description as well as how clearly scoped each task is.
  • Project Clarity (1 - 10): How clearly described the project is.
  • Alignment (1 - 10): How closely related I see this project in alignment with eligible classes at Northeastern.
  • Feasibility (1 - 10): How feasible or “able to be accomplished” by a student team I see the proposed project being.
  • Composite Score (4 - 40): Overall composite score with final grades of:
    • “Great Fit with Minimal Revisions” for 31 - 40pts
    • “Good Fit with Revisions” for 21 - 30pts
    • “Possible Fit with Significant Revisions” for 11 - 20pts
    • “Out of Scope” for 4 - 10 pts.

  • Scope of Work: 3
  • Project Clarity: 7
  • Alignment: 5
  • Feasibility: 2
  • Composite Score (4 - 40): 17pts, Possible Fit with Significant Revisions

Comments: I found the description of the project rather clear but could use some additional explanation on what are foundational models as well as the exact tasks you’d want the students to perform. Where I saw the greatest concerns is in the following: 1) it is unlikely that in the courses offered within the mathematics department that we cover foundational models and for that reason, I found the feasibility of this project to be low alongside the fact that many students wouldn’t have the required programming expertise to develop the desired implementations. 2) Implementing encoders and decoders from differing architectures would unlikely be covered in the courses offered. I think the datasets look great but in terms of alignment, I would want to see more explanation about what you mean by this being implemented in MedPipe3D.jl.

  • Scope of Work: 5
  • Project Clarity: 6
  • Alignment: 8
  • Feasibility: 5
  • Composite Score (4 - 40): 24pts, Good Fit with Revisions

Comments: I think this project has strong alignment with what is covered in courses at NEU. Particularly, courses definitely cover UNet architecture and this would be a very fun project for student teams. I think what needs revision here is the scope and methods you would want the student teams to use. I feel like dealing with high memory requirements goes somewhat out of what the students would expect to deal with (although downsampling techniques are covered). I could see the students needing additional clarity and guidance on the medical image they are working with as well as how to use the corruption methods you are talking about (if the corruption methods are very involved, that would be going out of scope for the student team). Additionally, they may not be familiar with guided backprop methods. I would suggest between corruption methods and occlusion methods, you should probably pick one or the other as doing both may be too much. Otherwise, with these revisions I think project would be a good fit!

  • Scope of Work: 6
  • Project Clarity: 8
  • Alignment: 10
  • Feasibility: 4
  • Composite Score (4 - 40): 28pts, Good Fit with Revisions

Comments: I actully think this project is a really good fit! The only thing that is keeping me from wanting to say “great fit” is that the students would most likely not be working with any kind of probabilistic programming methods (ergo, Turing.jl would be too much for the students I expect). Additionally, although I think kernels would be covered, stenciling most likely would not be and certainly not parallel computing methods. That said, I think for students to pursue naive implementations of the methods you enumerate would be great because the students could also validate results against pyradiomics; I should also that you will probably only want to pick 2 - 3 methods as required and remaining methods as “bonus” for the teams. Final thoughts on this is that I could see a bit of heavy involvement from your side on this project as there would be a run-up for students to know about the methods you talk about and how to use MedPipe3D and MedImage for this work. Overall, good fit but would need smome revisions.

  • Scope of Work: 8
  • Project Clarity: 6
  • Alignment: 8
  • Feasibility: 6
  • Composite Score (4 - 40): 28pts, Good Fit with Revisions

Comments: I think this project could be a good fit but needs a bit more clarity about the implementation the students should pursue here. Great dataset found too. The only thing keeping this from a “great fit” is that it will be quite a task to implement an architecture from literature so I think the student team may only be able to manage one or two variant implementations at most.

I know this idea came about from our discussion once upon a time Jakub so I will review this separately and see about potentially building it out further. I’ll table this idea for now.

In my personal opinion, these are some good projects. I would say the projects that I marked as “good fit” would probably be the best to proceed with further for sake of time/effort as I do not have time to give further comments at the moment. I’ll be in touch on next steps here! Thank you!!!

Hey @lrnv,

First off, a HUGE thank you for these ideas! I’ll go ahead and say that I am uncertain where and if survival analyses/hazard regressions are covered in our curriculum anywhere but I could easily see these being really fun projects that perhaps some more ambitious student teams could go after – they may just need more mentorship from your side if they haven’t encountered this before. Here is the scoring rubric I am using:

  • Scope of Work (1 - 10): How much work or tasks I see here for the students based on the description as well as how clearly scoped each task is.
  • Project Clarity (1 - 10): How clearly described the project is.
  • Alignment (1 - 10): How closely related I see this project in alignment with eligible classes at Northeastern.
  • Feasibility (1 - 10): How feasible or “able to be accomplished” by a student team I see the proposed project being.
  • Composite Score (4 - 40): Overall composite score with final grades of:
    • “Great Fit with Minimal Revisions” for 31 - 40pts
    • “Good Fit with Revisions” for 21 - 30pts
    • “Possible Fit with Significant Revisions” for 11 - 20pts
    • “Out of Scope” for 4 - 10 pts.

  • Scope of Work: 6
  • Project Clarity: 6
  • Alignment: 9
  • Feasibility: 3
  • Composite Score: 22pts, Good Fit with Revisions

Comments: I think this is a pretty well-defined project and aligns very well with the sorts of projects that would fit into classes at NEU. What I would want to see more about is the exact methods you’d want to implement – I would say 1 - 3 methods max would probably be feasible for the teams. Additionally, what sort of data would they want to work with to validate their attempts? The reason why feasibility received a low-rank is because non-parametric methods may be unknown to students so they will most likely need additional guidance here.

  • Scope of Work: 7
  • Project Clarity: 8
  • Alignment: 9
  • Feasibility: 8
  • Composite Score: 32pts, Great Fit with Minimal Revisions

Comments: I think this project is a great fit. I would say that it is outside the scope for students to create a general interface but creating a naive implementation would be a great research product. In this scenario, again, what exact datasets could be used and how might you want the students to validate their findings. If you can provide those details, this looks very great!

  • Scope of Work: 6
  • Project Clarity: 6
  • Alignment: 7
  • Feasibility: 5
  • Composite Score: 24pts, Good Fit with Revisions

Comments: I think what is most unclear here is what is meant by “bindings” as having students fully understand how to compose survival outcomes with Flux will be a challenge. If you could provide additional clarity, that would be great. One of the reasons why feasibility was low is that this could be a hard problem where you might need to mentor the students more directly for this problem. I think breaking down the scope of this project more into incremental discrete steps would be very beneficial here.

Overall, I think these projects look pretty great and well-defined. There are revisions needed, but I think the major problem across all projects is to also find relevant datasets as I know the curriculum is keen on having student outcomes being not only did they develop a model, but also they applied them to X (e.g. "I developed and applied a hazard regression model from scratch to assess outcomes in diabetic patient populations). This keeps inline with the experiential aspect of the program.

I’ll be in touch on next steps!

Hey @lrnv and @Jakub_Mitura ,

For next steps, I will plan to create a Google Doc that I will share here that contains one or two projects that are fully scoped in alignment to the rubric I was using here to provide feedback. Then, hopefully, that will help you with adding revisions into your project proposals.

Additionally, the next step for you all would be adding your revised proposals to that doc. From there, I will reach out to Professor He Wang with these proposals and we will take it from there. We will then do final revisions with Wang and, if all goes according to plan, get these projects into courses as NEU this fall!

Thanks tremendously for the patients everyone – since this is our first time ever doing this, we are creating procedures and best practices as we go through this together.


~ tcp :deciduous_tree:

@TheCedarPrince Thanks a lot for the guidance on how to develop the projects and the well-thought feedback. I will try to expand the proposals in the directions you outlined ASAP. Would end-of-next-week be an OK deadline w.r.t. your timings ?

One more question, w.r.t. real data application. I understood that applications are important for your faculty and the students themselves. Should I seek publication-level, unknown, new data, or would standard testing datasets, already known in the literature and well studied, be enough ? The second case is obviously easier to get and, to my eyes, still brings value to the table.

1 Like

Definitely! My goal is to have a nice list of project together by the end of the month so I can pass them along to Professor Wang. Then, throughout July, on our side we can do any additional revisions or make suggestions so by the time fall comes, these projects are ready for students.

This is a great question. My first instinct is similar to yours being to recommend data sets that are either standard or well known in the literature. That would be the preference for these projects. However, if there are novel datasets, the guiding principle should be if they are well-documented enough for a student to pick-up such that they could apply methods they are developing to this data for a novel approach, then that would be fine too. The onus would be on you, the project mentor, to make the case that this novel data would be suitable however.

Hope that helps @lrnv !

1 Like