How to use a minimise mean squared loss function in Julia for gradient descent?

Hi everyone, I’m very new to Julia and I’m currently trying to understand the language by building from scratch multivariate linear regression using gradient descent. I’m coming from a Python/R background so apologies if I don’t follow Julia conventions yet (I’m open for suggestions).

I have constructed my loss function as follows;

    mean_squared_cost(X, y, θ)

This function computes the batch cost based on the values 
of the design matrix (X), target vector (y),
 and the weights (θ) passed to it.

This function ignores regularisation for brevity's sake.

function mean_squared_cost(X, y, θ)
    # Sample size
    m = size(X)[1]

    # Vectorised Prediction loss
    preds = X * θ
    loss = preds - y

    # Half mean squared loss
    cost =  (1/(2m)) * (loss' * loss)

    return cost

The next function uses gradient descent via a loop to try to find the best weights.

    lin_reg_grad_descent(X, y, α, fit_intercept=true, n_iter=1000)

This function uses the gradient descent algorithm to find the 
best weights (θ) 
that minimises the mean squared loss between the predictions
that the model generates and the target vector (y).

A tuple of 1D vectors representing the weights (θ)
and a history of loss at each iteration (𝐉) is returned.
function lin_reg_grad_descent(X, y, α, fit_intercept=true, n_iter=1000)
    # Initialize some useful values
    m = length(y) # number of training examples

    if fit_intercept
        # Add a constant of 1s if fit_intercept is specified
        constant = ones(m, 1)
        X = hcat(constant, X)
        X # Assume that the user added constants

    # Use the number of features to initialise the theta θ vector
    n = size(X)[2]
    θ = zeros(n)

    # Initialise the cost vector based on the number of iterations
    𝐉 = zeros(n_iter)

    for iter in range(1, stop=n_iter)
        pred = X * θ

        # Calcaluate the cost for each iter
        𝐉[iter] = mean_squared_cost(X, y, θ)

        # Update the theta θ at each iter
        θ = θ - ((α/m) * X') * (pred - y);
    return (θ, 𝐉)

This requires manual tweaking of the alpha term and the number of iterations as well as the plotting of the cost values. I know I can use an optimisation function to find the best weights automatically.

Reading around, I have come accross Optim as a library that can handle this but I can’t seem to figure out how to use it for find the best theta values. How can I replace the loop with an optimisation algorithm?

Thanks in advance. Julia rocks!