Choosing the model type and data format

Selecting the model type

The Hailp Artificial intelligence as a service platform can deal with two types of prediction tasks: classification and regression.

Choose the Regression model if you want to train your AI bot to predict a value for some input data. The inputs can be continuous or discrete, and the output is a continuous value. An example of regression jobs is where an AI bot predicts residential energy consumption based on the building size and neighbourhood energy usage history. Another example of such a bot is an artificial neural network that can predict house pricing based on the nationwide house pricing data and the house’s features.

On the other hand, select the Classification model if you want to train your bot to categorise user submitted samples. In other words, a classification bot yields a discrete output value by using some continuous or discrete data as inputs. An example of the use of a classification bot is an artificial intelligence model that analyses the results of previous similar experiments to recognise whether a molecule is likely to have activity on a pathology or not. Another example of a classification bot is the image recognition neural network or the gender recognition model.

Choosing the data format

Hailp AI bots accept 3 types of data format: CSV (comma separated values) for table and array-like data; text format for text data; and jpg or png formats for images.

Depending on how your data is structured, you can select any of the following types when uploading your CSV file:

Option1 – Single CSV for all samples:

Choose this option if all of your training samples are merged into a single CSV file. The first row of your CSV file must contain the name of each feature (Input names). The first column or the last column of the CSV table must be the label values (Model output), while the remaining columns contain the values of each feature for each training sample.

The features names can be any alphanumeric string and can include special characters such as underscore, space and dash. The feature values must be alphanumeric only.

For a classification problem, the label values can be numeric or alphanumeric. Below is an example of a CSV file for the classification problem:

cucumber_qualities.csv

  • Color, Width, Length, Weight, QUALITY
  • 1, 50, 100, 20.4, Good
  • 2, 40, 80, 10.4, Bad
  • 1, 10, 120, 26, Good
  • 2, 50, 100, 11.9, Good
  • 1, 60, 150, 20.0, Intermediate
  • 2, 11, 200, 05.5, Bad

For a regression problem, the label values must be numeric. Here is an example:

cucumber_prices.csv

  • Color, Width, Length, Weight, PRICE
  • 1, 50, 100, 20.4, 60.90
  • 2, 40, 80, 10.4, 5.10
  • 1, 10, 120, 26, 70.00
  • 2, 50, 100, 11.9, 100
  • 1, 60, 150, 20.0, 50
  • 2, 11, 200, 05.5, 2
Option2 – One CSV per sample:

Select this option if each training sample has its own CSV file. For instance, let’s say you want to train an AI bot to predict cucumber quality. For that you will need to provide 100 CSV files of the following format that relate the 100 GOOD cucumbers specifications, and another 110 csv files that relate the 110 BAD cucumbers.

goodcumber1.csv

  • Feature names, Feature values
  • Color,1
  • Width,50
  • Length,100
  • Weight, 20.4

other_goodcumber.csv

  • Feature names, Feature values
  • Color,1
  • Width,10
  • Length,120
  • Weight, 26

The first row of the sample file contains the features you want to take into account when training the bot. Feature names can be numeric or alphanumeric. The second column contains the value for each feature for the current sample test. The first line is optional; it generally contains column names or other information that helps the human user to identify the file (tags, title, author etc.). This line will be ignored during the training process.

Note that during all of the above steps, you will only need to upload one sample file. This will be used to teach the Hailp platform how to deal with your data. When features are created, you will be asked to upload the remaining files.

Option 3 – Log/Txt, one file per sample:

logs_processing_aiSelect this option if you want to train your bot for a text processing task. For example, let’s say you want to train your bot to recognise server status, by analysing the server’s log contents. For that you will need to upload 500 log files from optimised servers and 450 log files generated by badly optimised servers. Text content can be alphanumeric but can also contain some special characters such as underscores, commas, dashes, parentheses and dots. Other special characters will be ignored during the training process. We accept any text file that’s less than 50 ko in size.

Option4 – Images, one file per sample:

biology_machine_learningUse this option if you want to train your bot for an image recognition job. Let’s say you are a microbiologist and you want to train your bot to automatically recognise whether a food is safe by examining a bacterial culture photo. To make this happen, you will need to log in to your Hailp account, choose the “Images- One file per sample” as the data type and then upload some snapshots of safe bacterial cultures along with some culture photos which are deemed unsafe.