I have always liked visual data science and machine learning tools that let you use a mixture of flowcharts and code to write data manipulation and machine learning models. There are some popular ones out there including SAS enterprise miner, knime, rapid miner, orange etc. But there is no tool for Julia yet (this partly reflects the fact that Julia’s data ecosystem is not yet mature) but I think we can change it - by building one!
Architecturally we can build it like a desktop app by setting up a local server backed by Julia (WebIO.jl and Mux.jl) etc. For GUI can just use HTML and CSS and make it flexible so that we are not tied to any framework (which comes and goes every 18 months for JS); so we don’t run into the issue with Shiny where you are likely to be stuck with Bootstrap and jQuery.
I’m currently playing with a bit of a web/db-based would-be HTML Generator, just getting the db Ui and db structure to mesh. Progress is a bit slow, but not too far off being able to build a basic/static page with mostly drop-box selections =)
I’m pretty skeptical of the actual usefulness of these types of tools, but certainly if someone created one in Julia I would be glad to try it out.
As far as GUI’s in the Julia data ecosystem go, I personally very much would like to have an interface for viewing DataFrames and easily creating interactive plots from them. Unfortunately I really hate working on anything that even remotely involves a GUI, so the chances that I will start working on something like this myself are 0.
Most of these efforts are commercial, even when financed by a dual (partly open) model. Eg Shiny Pro starts at $10k/year. Successful purely open source solutions are rare, since the benefit/effort ratio for the potential intersection of users and developers is very low.
These frameworks run out of expressive power rapidly, and you end up having to put lots of small snippets of code into lots of tiny graphical buckets. I ran into something like this in another project (video compositing using Nuke) and thankfully was able to write programs to examine and modify the GUI’s stored graph. That’s worse than just programming it directly.
Python and R have done very well in this niche without that kind of tool if you’re concerned about “the hordes of users” coming.
Anyway, @randyzwitch said it best, maybe just Early Adopters are happiest with a terminal and vim, so someone else will have to pick up that flag in the Julia 2.0+ timeframe.
I think the efforts in the Julia data science community are best served in the areas that are already being worked on (by yourself included): Getting the data ecosystem at least on par with that of R/Python/Stata/SAS, hopefully better. Can we get things to be more flexible and more performant without sacrificing usability?
DataVoyager.jl looks really cool, definitely check it out sometimes. Hadley’s meetup talk should be awesome too!
I mostly just write code but these are my thoughts
There are often sense checks of the code that aren’t part of the work flow. E.g. I would do a frequency count of a particular variable after creating it to see if the results are as expected. In visual work flow, all of these code can be siphoned off into a separate item in the flow chart off the main trunk. This keeps the main code cleaner and allows me to rerun very easily
Most of the tools out there don’t make it easy to modularise code. I have some ideas on making that better.
Visual workflows makes parallelization opportunities more obvious. I often run code serially that works soemthing like this: a) manipulation data A, b) manipulate data B, c) merge A and B. Now a) and b) can be run in parallel, and it’s best seen in visual form rather than in code form.
Code and graphs should be one and the same. I tried out r-analytics flow and the thing that stops me from using it further was not being able to convert my graphs into code. Sometimes I just want to collapse the graphs into one piece of code so people don’t need to use a visual tool to run my code. For things like dplyr and DataFramesMeta.jl this should be possible.
Easier access to summary stats, sometimes I run a group-by mean on a set of variables, that is a separate box in the visual flow, and if I want to retrieve the results I just click on that box. If everything was in code I would have to search for it in my code, and run it. Or retype the code everytime.
No offense to the developers of this, it looks pretty cool, but people have been trying to do things like this since the 80’s and I have a feeling there is a reason they have never caught on. (See this excellent documentary involving one of the more famous and flamboyant examples.) The inherent problem is that abstract mathematical (and in this case programming) concepts are not in general bijective with position space. There is a reason that mathematical notation looks the way it does and has not evolved into some sort of more pictorial representation. In fact, one might argue that it has evolved dramatically in the opposite direction, as in all of human history before the Enlightenment mathematics was studied primarily using literal visual representations. There are certainly manycases in which graphical techniques are extremely powerful, but these techniques rarely generalize even when the underlying mathematical concepts behind them do, and historically distinct graphical techniques require distinct tools to implement computationally. The unfortunate side effect of all of this is that when it comes time to look at some plots (mind you this is more about viewing output, which is very different than formulating programs graphically) often tools for performing simple manipulations are unavailable. In the modern age where simple things are often subject to explosive, hysterical hype, I can’t help but feel that some of these graphical programming aides are more advertising gimmick than anything else.
Anyway, sorry to repeatedly come out as bashing this idea, especially with such grandiose, hand-waiving talk (also, my skepticism here is directed toward graphical programming, not data viewing tools such as DataVoyager.jl). Again, there are lots of people who like this sort of thing, and I myself would be happy to give any package that also has a really good conventional Julia API a shot.
Debate is good! I guess I am not wedded to it but I see how it can be better at certain things. Even as a side project I would like to make one so that I can see myself using. In ML, tensorflow and many other libraries are graph based. Bayesian statistics talks about DAG alot. So graphs and flowcharts are naturally how some problems can be visualised in maths. But it can’t be used for everything. Not everything are best represented in code form; but everything can be so code will dominate always.
That’s still half-true for assembly as well, it will dominate if given the chance since everything still runs as generated assembly, but it isn’t too easy to do by hand
Code visualization seems like fun, and you could get a fairly simple function call tree with a bit of text parsing other ideas?
I’m not sure these cases are comparable. My assertion was that visual programming hasn’t caught on because it’s not useful, not because of technical limitations, though perhaps I am wrong. It’s also worth remembering that the reason we are starting to see things like “AlphaGo” and its descendants only now has more to do with availability of computational resources than anything else. (Perhaps you could argue that was the primary limitation of visual programming in the 80’s, but probably not in the 90’s and certainly not in the 00’s.)
You are right not everything is (at least simple) transformable to image. But some kind of structures, levels are still present in mathematics. I mean lemmas and theorems which build together theory.
If we are not able to show graphically everyting I still see some space for really useful tool. But I agree that it is my personal view.
I’m also skeptical about general (“for everything”) GUI-based data tools. That said, tools for specific tasks/situations (like DataVoyager) can be very handy and big time-savers. Exploratory seems like a good example of such a tool based on R (although I haven’t personally used it). And I suspect that putting it together was relatively simple because most of the pieces were already there – R has large amount of data source connectors and dplyr is backend agnostic, plotly for nice visualizations and rmarkdown for the reports. They just needed to wrap them in the nice UI.
I can easily imagine people in various roles making heavy use of something like that and it might work well for luring people over from R/Python. So creating something similar might really be worth it once the data ecosystem in Julia is a bit further along.
It would be great to have an ETL tool to load/filter/reshape/clean/transorm the data.
And another (or the same) with the main functions and packages to analyze and plot.