Adding features

What is a feature?

Features or Attributes are the parameters that can be used as input variables for the AI bot. During the training process, the value of each feature, for each sample, and the corresponding experimental outputs will be fed to the bot to teach it how to make a prediction. These same features will also be used by the bot when predicting the output for new data submitted by the end-user.

For the Iris species classifier bot, for example, the selected features are sepal length in cm, the sepal width in cm, the petal length in cm and petal width in cm. By using these features, the bot will be able to predict with 80% confidence the species of an Iris flower.

Loading features from the sample file

You don’t need to add features one by one on the Hailp AI as a service platform. Actually, you can instantly specify all of the features you want to use by simply uploading one of your sample files. The sample files don’t all necessarily need to have all of the features specified. However, in order to get the best performance from your dataset, we recommend that you upload the most complete sample file when building the feature list. All of the features that are not specified during the feature inventory will be ignored during the training process (even if they are present in a training set).

To load features from a sample, proceed as follows: Go the Hailp’s FEATURE TAB, choose your model and data type and then validate. Next, upload one of your sample files (or the whole sample file if you selected Single CSV as the data type) and click on the Validate button. If you selected CSV as the data type, the found features will show up at the bottom of the page. On the contrary, if you selected image or text data, you will be prompted to move directly to the next step.

The following options are required when uploading a CSV sample:

Skip first row:

Check this option if your sample CSV data contains a title in the first row. If your sample dataset is like the following, you will need to check the box:

my_first_iris.csv:

  • FeaturesName,Values
  • sepal length,7
  • sepal width,3
  • petal length,6
  • petal width,2

my_first_iris.csv:

  • FeaturesNumber,Values
  • 1,7
  • 2,3
  • 4,2

On the other hand, the following file doesn’t need the “Skip first row option”

my_first_iris.csv:

  • sepal length,7
  • sepal width,3
  • petal length,6
  • petal width,2

my_first_iris.csv:

  • 1,7
  • 2,3
  • 4,2

Note that during the training, all future sample files must satisfy the same “First Row action” parameters. The same applies to test sets uploaded by yourself or by your buyer.

Finally, for the image recognition bot, we recommend uploading the most representative photo in terms of image size when building the features. All future images uploaded to the server will be resized accordingly.

Separators

You can choose between comma-separated and semicolon-separated values when dealing with CSV data. This option pertains to all data that you (or your user) upload during the training and prediction processes.

Label names location:

If your whole dataset is presented as a single CSV file, this option specifies the column that contains the label value (i.e. the Output). For example, for the following dataset, the label name location should be set as “Last Column”.

my_whole_iris_dataset.csv:

  • sepal length, sepal width, petal length, petal width,IRIS SPECIES
  • 5.1,3.5,1.4,0.2,Setosa
  • 4.9,3,1.4,0.2, Setosa
  • 4.7,3.2,1.3,0.2,Versicolour
  • 4.6,3.1,1.5,0.2,Versicolour
  • 5,3.6,1.4,0.2,Virginica
  • 5.4,3.9,1.7,0.4, Virginica

Altering a feature

You may sometimes want to ignore some features during the training process, in order to prevent overfitting, for example. You have two options to make this happen. You can delete the relevant line and re-upload the sample test, or you can go to the feature list area and manually remove the feature you want to ignore.

To take into account your end user’s feature naming preference, you may occasionally need to rename your feature. You can do this simply by building your dataset with the original feature names (go to the PREPARE TAB) and then renaming the feature according to your preference.