Choosing the neural network architecture
Along with the input data quality, the neural network architecture is among the most important success factors when creating an artificial intelligence bot. To keep pace with the magnitude of real-world problems, the OVERFIT AI platform offers a wide variety of architectures including fully connected neurons, convolutional neural networks and long short terms memory neural networks.
The right type of architecture depends on various parameters such as the type of data, the complexity of the data, the number of features, the amount of samples and the available computation time.
If you are new to machine learning, the fully connected neural network is a good starting point. The convolutional neural network is also recommended if you deal with image recognition or any spatially distributed features: signal processing, chromatographic data, sound recognition, IR or UV spectroscopy, etc.
The recurrent network and LSTM architecture is a good option for text processing, sentiment analysis and times series data: stock market forecasting, energy bill prediction, etc.
How many layers
The number of required layers depends mainly on the complexity of the learning task. While a single hidden layer is enough for most models, using a certain number of abstraction layers is usually needed for problems such as image classification: welcome to the deep learning land!
Keep in mind that adding more layers is not a free lunch. Due to exploding and vanishing gradient problems, the number of hidden layers is usually limited to 2 layers for a fully connected neural network (multi-layer perceptron). The number can be much higher for convolutional neural networks such as the one used in this object recognition neural network model. In any case, we recommend the bot trainer to start with a minimum layer and then increase the number gradually.
How many neurons
The required number of neurons in hidden layers is normally correlated with the complexity of the data. However, using the rule of thumbs is a good guess when choosing the size.
If the number of neurons is too low, the bot will not be able to apprehend some relationship between inputs and output. Not only will the bot therefore fail to replicate the training dataset (underfitting), but it will also not be able to properly predict the right output from the new unseen test set. You can detect underfitting by looking at the training loss and the training accuracy value.
On the other hand, the higher the number of neurons, the higher the accuracy over the training set. Notice, however, that too large a layer size can drastically reduce the bot’s performance over unseen data. This is called overfitting. The bot will fail to generalise and is likely to memorise training samples (incl. noises) rather than learning the patterns that govern them. One can detect overfitting by comparing the training loss values with the cross-validation loss value (or training accuracy vs. cross-validation accuracy).
In the best case scenario, the cross-validation loss is similar to the training loss, and both values are low.