Given that you have a very expensive function and gradient information, I would recommend a “gradient-enhanced” Bayesian method marginalized over hyperparameters, instead of using a maximum a-posteriori estimate . Unfortunately, I don’t think any package provides this (in Julia or elsewhere) and you would have to roll your own. I do have an example for the marginalization over hyperparameters, which I may take to posting at some point, but I’ve never seen an example of marginalization + gradient-enhancement.
Surrogates.jl has an assortment of different surrogates you may like to try first. You can get a feel for their performance by doing a train/test split on pre-generated data. Unfortunately, none of them accept gradient information at this time, though I have a hunch that you can append the gradient to the SVM or RandomForest surrogates to help with prediction. You can also get fancy and try writing a neural network to model your F with gradient information, though your mileage may vary.
Another option is to call the Surrogate Modeling Toolbox from Julia. This has an implementation of gradient-enhanced Kriging. (Kriging is another name for fitting a Gaussian process, which is the mathematical object at the center of Bayesian optimization)
 Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems , 4 , 2951–2959.