Federated Learning in Julia

Is anyone working with federated learning in Julia?
Asking for a friend.

It wouldn’t be too hard to set up a basic work-flow for this. Unfortunately I don’t think it’s a popular Julia topic because Julia has restrictions on mobile devices (I believe)? Most people using this kind of tool are using it to deploy to mobile, although one could easily imagine other useful scenarios.

Putting mobile aside temporarily, this would definitely be useful in healthcare (my current line of work), where very strict data-sharing laws are in place (and people are paranoid about accidentally not following them); however, for this technique to be approved by any decently knowledgeable privacy team, there would need to guarantees that a trained system cannot be used to expose PHI/PII to those not authorized to view it. Without any other fancy techniques, that would limit the applicability of this technique to systems where either data is sufficiently de-identified (which in practice can mean a loss of some significant parts of the dataset), or where the the results coming out of the algorithm do not have the potential to leak sensitive data (i.e. results must only be aggregates of a sufficiently large sample).

Anyway, I think this idea is pretty cool regardless of the difficulties, and I hope someone will find it interesting enough to implement it :slight_smile:

4 Likes

@jpsamaroo Thankyou for a great reply. I am also working in the healthcare field. Perhaps we can take this discussion offline?

Agreed, I’m currently working in the healthcare field. Properly deidentifying data is extremely difficult so the default is that all potentially useful data is locked up and we end up working on extremely small datasets which have been gathered under strict governance.

Of course this is a big problem if you’re trying to build accurate models! I do think some kind of federated learning is a great idea for production systems at scale, but for research and early development it’s not clear it can help. Debugging such a system without being able to see the data seems like a nightmare. Also deployment with enough scale would be a show stopper for early stage projects… perhaps it would be a good fit for large manufacturers of medical devices.

Overall I feel like federated learning would be ideal if you have a very large existing userbase and you’d like to deploy some new machine learning system. The ideal way to retain trust in that case seems to be to respect privacy by design rather than as an implementation detail. Not coincidentally, this is the situation google is in with gboard et al.

3 Likes

Hello Chris, I think myself, you and @jpsamaroo can have a very good discussion on this, but offline.

I am not familiar with Google Gboard - please say a little more? I saw that Google released some privacy standards last week, I did not have time to look at it.

I’d prefer to have an open discussion here if possible — that way other people with diverse experience can contribute and if anything truly interesting is discussed other people can learn from it.

1 Like

@c42f No problem. I work with a company which is applying deep learning to mammography.

I just feel that if we are nto discussing anythig directly related to Julia the audience on this board might get annoyed. But for the moment - great!
What is Google Gboard please?

About Gboard:

We’re currently testing Federated Learning in Gboard on Android, the Google Keyboard. When Gboard shows a suggested query, your phone locally stores information about the current context and whether you clicked the suggestion. Federated Learning processes that history on-device to suggest improvements to the next iteration of Gboard’s query suggestion model.

Beyond Gboard query suggestions, for example, we hope to improve the language models that power your keyboard based on what you actually type on your phone (which can have a style all its own) and photo rankings based on what kinds of photos people look at, share, or delete.

Source: Federated Learning: Collaborative Machine Learning without Centralized Training Data