Choosing training parameters

Using the optimal training parameters improves your bot’s performance in prediction, while reducing the computational time. Whichever your neural network architecture, the following parameters are required:

Batch size:
This text field specifies the number of samples that the Hailp platform will use during each training epoch. In most cases, you don’t need to change the default value. For large-sized data, such as images, you may want to lower this value to get the optimal resources provided by the server.

This is the total number of iterations you want to use when deploying your neural network for training. Even if you are free to use any iteration value, we recommend starting with a low iteration number until you spot the most promising learning rate and the best neural network architecture. For that, you can use the loss graph to check out how the fast the model learns along the iteration. We especially recommend using a low value of iteration when trying a new LSTM and deep convolutional network.

Shuffle property:
You will always need to select YES except for the times series dataset.

Learning rate:
The learning rate is one of the most important parameters to take into account when training the neural network. It defines the pace at which your bot will learn the data during each training step. On one hand, too low a learning rate slows down the training speed and therefore yields an unoptimised AI bot. On the other hand, too high a learning rate changes the neural network’s internal status too quickly and makes it miss the optimal values.

In practice you can appraise the effectiveness of your learning rate by looking at the loss graph. If the loss stands still along the training process, you may need to increase the leaning rate. On the contrary, if the loss oscillates with the iteration, you will need to slow things down by decreasing the learning rate.

Decay parameter and decay step:
The decay parameter is the decay that will be applied to the learning rate after each STEP epoch of training. If you are new to machine learning, you can keep the default value.

Layer activation:
The Hailp platform supports the RELU, TANH, SIGMOID and LINEAR ACTIVATION. The first three activation types are best for the training classifier neural network, while the LINEAR one is recommended for the regression problem. If you are new to machine learning, you can stick with the default values. The same applies to the Optimiser fields. Available options are: stochastic gradient descent (SGD), Adam, Momentum, AdaDelta and RMS prop.

The regularisation type, regularisation rate and dropout rate are needed when dealing with the neural network overfitting problem. More details are available here. Nevertheless, we recommend disabling these fields when trying new training parameters.

Cross-validation set:
The cross-validation field specifies the amount of the dataset that will be used to check the bot’s performance over new data. For classification projects, we recommend using the default value if you have enough data for each label; otherwise, disable this field.

Depending on your network architecture, some additional parameters are required:

Fully connected neural network

This is the number of data features. You don’t normally need to change the suggested value. In case you want to modify the number of inputs for your AI model, you will need to rebuild the entire dataset by going to the FEATURE tab.

For classification problems, the output number field relates to the number of labels that can occur when making predictions. You don’t need to change the suggested value. You will need to go back to the LABEL tab if you want to add or remove a label.

Number of units
The number of units specifies the number of neurons that will be used in each hidden layer. You can check out my post about “Choosing NN architectureEcoli protein identification project, the optimal training parameters were as follows: 10, HIDDEN LAYERS:2 , OUTPUT: 2

The source data has the following format:


  • Info: current protein localization sites is 10
  • Sequence Name,0
  • gvh,0
  • lip,0
  • chg,1
  • aac,0
  • alm1,0
  • alm2,0

Convolutional neural network

Filter numbers:
This field relates to the number of filters you want to use in a given layer of your convolutional neural network.
For image processing, only a few filters are needed in the first layers, as the network will use them to learn basic shapes like edges and lines. A higher number of filters may be required in the abstraction layer which are meant to map the input pixels with the output label, by combining simple shapes from the earlier layers.

Filter size
This field specifies the size of the filter that you want to apply to a given layer when processing the layer’s inputs.

Pooling kernel
This field relates the max pooling parameters you want to use in the given convolutional layer.

Fully conn.
The fully conn. parameter stands for “Fully connected”; it relates to the size of the fully connected layer located after the last network’s convolutional layer.

If you are new to deep learning, as a starting point you can use the default values of filter numbers, filter sizes and pooling sizes.


The following parameters are required for Long short-term memory neural networks:

This is the size of your data feature.

For classification problems, this is the number of labels.

Number units:
This field relates to the number of neurons in a given hidden layer. For text processing problems, you can use the default value as a starting point.