Given that you have a very expensive function and gradient information, I would recommend a â€śgradient-enhancedâ€ť Bayesian method marginalized over hyperparameters, instead of using a maximum a-posteriori estimate [1]. Unfortunately, I donâ€™t think any package provides this (in Julia or elsewhere) and you would have to roll your own. I do have an example for the marginalization over hyperparameters, which I may take to posting at some point, but Iâ€™ve never seen an example of marginalization + gradient-enhancement.

Surrogates.jl has an assortment of different surrogates you may like to try first. You can get a feel for their performance by doing a train/test split on pre-generated data. Unfortunately, none of them accept gradient information at this time, though I have a hunch that you can append the gradient to the SVM or RandomForest surrogates to help with prediction. You can also get fancy and try writing a neural network to model your F with gradient information, though your mileage may vary.

Another option is to call the Surrogate Modeling Toolbox from Julia. This has an implementation of gradient-enhanced Kriging. (Kriging is another name for fitting a Gaussian process, which is the mathematical object at the center of Bayesian optimization)

[1] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. *Advances in Neural Information Processing Systems* , *4* , 2951â€“2959.