Discussion and questions related to Machine Learning and Artificial Intelligence in Julia, including the JuliaML organization.
Do you think it should be separate from Stats? I feel that ML is not a subfield of Stats since many machine learning algorithms are not statistically sound, yet are useful. Still it may be helpful to have a single discussion place for both in order to foster cooperation and common vocabularies.
Well, I think this is a tough question, with no right answer. I see machine learning as the combination of statistics and optimization. And I see AI as even more different/general. So if you’re going to have both a Statistics and Optimization, then there should be another category for some part of ML/AI. But maybe they should all be combined?
What about Data Science? That name pulls them together.
Works for me.
Side note… Just got this error back when trying to reply "Works for me"
by email (Can this setting be changed?):
We’re sorry, but your email message to [“julialang+
email@example.com”] (titled Re:
[JuliaLang] [Domains] Machine Learning and Artificial Intelligence) didn’t
Body is too short (minimum is 20 characters)
If you can correct the problem, please try again.
FWIW, I think it’d be valuable to have a separate Data Science domain category. Mainly due to use-cases; I think some users are going to come with a use-case in ML/Data Science that they want to discuss and it’s entirely certain it’s a “statistics” issue. Sure there are going to be plenty of posts that would be fine in either, but I think there are plenty more that would go to only one or the other.
Please keep this category! I expect Julia is going to take center stage for ML over the next few years, and it looks as though ML is going to take center stage in IT for next few decades. So it’s going to be a big community.
Also the category can twin with https://gitter.im/JuliaML/chat
Also, while it is technically a subcategory of Data Science, that is a big umbrella!
I don’t feel ML is a subcategory of Stats. It takes from stats. It also takes from linear algebra, multivariate calculus, probability, physics, neuroscience, algorithmics, etc. – and who knows what it will take from in the future?
The main problem is that a thread can only be in one category, so if we have “Stats”, “Data”, “AI”, “ML”, “Data Science” as different categories, we’re going to end up with a very fragmented ecosystem (not to mention pointless arguments about where a particular thread belongs)
Given that the current julia-stats list only gets a couple of posts a week, I think we should generally lean toward fewer categories until we reach the stage where the volume of posts becomes burdensome for the casual reader.
Don’t Data Science and Statistics apply also to Biology, which has its own category? It is on this level that AI deserves its own category. Likewise, Machine Learning resembles Molecular Biology, both being positioned on a deeper level. Of course nothing guarantees similar traffic.
If someone wants to share a thread’s knowledge with a different category, let them create a new thread in the different category and provide a link to the initial thread. Moderators should move a thread into a different category, only if current category is obviously invalid, and never to decide the best between two valid categories. In other words, they are expected to correct, not to improve. Improvement has its place in wiki posts. A long thread is good in “producing” knowledge, while a wiki post is good in “spreading” it, so let’s use the best tool for each job.
Cooperation and common vocabularies are good things not only within a domain, but also across different domains. By bringing all discussion into Discourse, we hope that cooperation will bridge fragmentation, without sacrificing the benefits of specialization.
Yes. Broader categories until its necessary to split them up.
I think “Data Science” is the most inclusive of these categories, and we
should merge them all into one. FWIW
An argument can be made for folding Statistics into Data then, which would include all of stats, data management, data science, machine learning, and such.
Do you really believe that a new user who has a question on Machine Learning or Artificial Intelligence will think of posting on Data? I predict that he/she will find that irrelevant and post on Usage instead. Or do you think that an AI programmer would like to subscribe to Data, just to get non-AI notifications more often? I’ve just posted the rest of my argument in the relevant thread.
Doing what you like with the top level categorization, and then aligning broad categories words/terms that sound right following “is an expert in the field of” would make it easier for people with varied backgrounds and distinct professional skills to find the best area for their post/search quickly.
Data is a broad term, and its interpretations are somewhat scope-dependent. If the goal is use an ontological suborganization, then that makes sense; otherwise idk. “applications of data under innovative mappings” may better belong subordinate to Transforms than Data or to their overlap.
I am pretty sure people who are working in Machine Learning know what Data Science is these days. It’s hard to miss.
Personally I have the suspicion that we are overthinking this a little. At the rate that domain proposals pop up I predict that we may soon have dozens of neatly organized but completely empty forums.
Is there maybe a way to start broad at first and then split into subdomains later if there is an actual need to reduce the noise?
Yes, absolutely. We can start using tags for that purpose
I would propose that we keep it at Statistics, Data, and add Machine Learning. It is a big enough field with multiple packages in Julia.
For anything else I would like to try out creating tags first, and categories as needed.
What Tags do we need?
To prevent premature fragmentation, there could be an informal rule along the lines of “we consider splitting up into subdomains when volume consistently exceeds [threshold]”. Below 50 posts/week, I would not think about splitting, but given how user-friendly Discourse is, even 100+ would make sense.
This is my final post on this thread. I will continue in the more relevant thread. As I explained there:
Broad categories work fine, just plenty of noise. Later splitting into already ongoing subcategories is more of a wish than a feasible goal, unless for small communities, like Julia was last year. Tags work much better, as long as there is someone to properly apply them, still subscription-by-tag isn’t currently an option (if it becomes, we could really merge all categories, but we cannot convince people to use tags). Low traffic forums work fine, as much as people get their minds around the “quality over quantity” principle. And wikis don’t care about traffic (yes, Discourse is also a wiki). Furthermore, Julia community worked fine without Discourse and we wouldn’t have this discussion if it wasn’t for the migration. But the migration has good reasoning, explained elsewhere. To that reasoning, if we want to take full advantage of Discourse, neat organization is the way to go, turning this platform from parallel hosting to real integration.
Accordingly, I’d like to see categories for both ML and AI, but if noise prevails, it won’t be the end of the world.