Unfortunately understanding in this area is highly empirical, so it is difficult to give strong answers!
Nevertheless, I would expect extremely poor performance if you just generate a very dense feature vector of ‘float’ values. Binning into histograms is totally analogous to the ‘one-hot’ encoding you’ve seen in more standard machine learning. Expecting a set of neurons to learn quite a complicated and highly non-linear function from a single float is asking a bit much, whereas once it’s binned you can have e.g. some neurons that trigger on close-contacts and do something accordingly.
There’s nothing stopping you composing e.g. the 10 binned values, and then the actual floating point value, as your feature vector.
Another perspective is that the overall size of your model is proportional to the size of these feature vectors. Neural network models for materials seem to saturate with few hidden layers (for instance, Fig S2b in the PRL you linked shows that the performance saturates with 2 convolution layers - barely ‘deep’ learning! The same is also empirically observed even with conceptually more simple ‘dense’ networks.), so you need to have fairly expanded feature vectors to have much predictive power.
For the bonds / graph edges, it may be that this is necessary to work with their Eqn 5. They construct some kind of gated architecture, so that set of weights has to interpret how much two neighbours are interacting, and so some kind of larger bond feature vector is probably needed here.
In terms of implementation, it would be most useful to not just have these feature functions implemented, but also have your code such that you could choose and combine these features at run time. This seems to have been what the authors of the PRL also had, and tried different size combinations of features as a hyper parameter optimisation.