Batch Training with TensorFlow.jl


I have an existing DNN project that works in Python, but I want to transition towards Julia. I’m having problems when it comes to using batches for training. Since my project is quite large, I will simply summarize the important parts here and provide a “bare version” of the code at the end of this post (If I miss something important, just let me know)

  • The input layer contains 100 nodes, there are three hidden layers and the output layer contains 10 nodes.
  • I want to train on roughly 25,000 training instances (25,000 arrays of 100 inputs and 25,000 arrays of 10 labels).
  • I want to train using batches of about 200 instances (I haven’t yet optimized the model), but I want to be able to make predictions on a different number of instances (for validation/testing purposes).
  • In Python, the code works fine, I provide batches of 200 instances when training, then I provide 5,000 instances when validating and another 5,000 when testing the model.

So far, I tried a few different things:

  • Using ‘MLDataUtils’ and ‘BatchView’ which essentially creates an array of array containing my data in batches. However, I get mismatched errors because the DNN expects 100 inputs and not 200 (the batch size). It seems therefore that it directly feeds the arrays that contain instances, and not the instances themselves.
  • I preprocessed my data so that every training instance was now an array 200x100 for inputs and 200x10 for labels and ran the function again. In other words, I vertically concatenated the different instances within a single batch. This seems to be working, but now I’m bound to providing inputs in the same format, e.g. I can’t make a prediction for a single instance.

What is the best way to make this work? Is there something I’m completely missing?

“Bare version” of the code:

# Parameters
nbr_input  = 100;             # Number of nodes in input layer
nbr_output = 10;              # Number of nodes in output layer 
batch_size = 200;             # Batch Size
nbr_inst   = 25_000;          # Number of training instances
nbr_nodes  = [500, 500, 500]; # Number of nodes per hidden layer
nbr_epochs = 100;             # Number of epochs

# Creating Training Set
X = [];
[push!( X, randn(nbr_input) ) for i = 1:nbr_inst];
Y = [];
[push!( Y, randn(nbr_output) ) for i = 1:nbr_inst];

# Training Function
function train( X, Y )
    sess = Session( Graph( ) ); 

     # Simply returns a DataFrame with random weights and biases for every node.
    nodes_info = ini_weights( nbr_input, nbr_nodes, nbr_output );

    # Creating placeholder. The shape will likely change depending on how I implement things
    x_p = placeholder( Float64, shape = [batch_size, nbr_input], name = "x" );
    y_p = placeholder( Float64, shape = [batch_size, nbr_output], name = "y" );

    # Defines the structure of the neural 
    pred = multilayer_perceptron( x_p, nodes_info );

    # Setting the cost function
    cost = reduce_mean( tf.square( pred .- y_p ) );            
    # Setting the optimizer used in the training phase
    optimizer   = tf.train.AdamOptimizer( 0.01 );   
    minimize_op = tf.train.minimize( optimizer, cost );

    # Initializes all variables in the design
    run( sess, global_variables_initializer( ) );

    for epoch = 1:nbr_epochs 
        for batch = 1:Int64( nbr_inst/batch_size )
            # Creating current batch
            idx = Array( ( batch - 1 ) * batch_size + 1: batch * batch_size ); 
            x   = zeros( 0, nbr_input );
            [x  = vcat( x, X[idx[j]]' ) for j = 1:batch_size];
            y   = zeros( 0, nbr_output );
            [y  = vcat( y, Y[idx[j]]' ) for j = 1:batch_size];

            # Training set
            run( sess, [cost, minimize_op], Dict( x_p => collect( x ), y_p => collect( y ) ) );

        # This line fails in validation. x_val and y_val were created the same way x and y were created in the for loop but they have 5,000 rows instead of 200.
        run( sess, cost, Dict( x_p => x_val, y_p => y_va ) );

function multilayer_perceptron( x::Tensor{Float64}, nodes_info );
    # First layer
    layer_1 = tf.add( tf.matmul( x, nodes_info.Weights[1] ), nodes_info.Biases[1] );
    layer_1 = nn.relu( layer_1 );

    # Second layer
    layer_2 = tf.add( tf.matmul( layer_1, nodes_info.Weights[2] ), nodes_info.Biases[2] );
    layer_2 = nn.relu( layer_2 );

    # Same thing for layer 3 and output layer. Output layer has 10 nodes.
    return output_layer;

In case this may help:

julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)