Data Labels

Understanding how to set up the right labels

Ted Tigerschiöld avatar
Written by Ted Tigerschiöld
Updated over a week ago

Data labels are the categories that the AI model will use to classify your text data. Setting up the right labels will help you in making sure that the model will perform exactly what you are after.

How many labels should I use?

This depends on what you wish to do with your model. if you are building a sentiment analysis model it could be enough to just work with the two labels "Positive" and "Negative". However if you are trying to understand the NPS of a user review you might need three labels "Promoter", "Passive" and "Detractor". There is no real limit to the amount of labels you can add but it is important to make sure that the labels make sense for you model and output.

Are my labels represented in the data?

One problem you can run into is that some labels have just a few or no datapoints in your dataset. This is not ideal as the model needs to train on a multiple different texts to understand how to correctly classify them. If you are finding that a label is missing or just have a few occurences in the dataset there are two ways to go about it.

1. See if you can somehow enrich the dataset by adding more data that includes the underrepresented label. You can add these manually as examples in the label setup stage.

2. If you cannot find more data for the label, think about maybe joining up the underrepresented label with another one, as long as it makes sense for the model output.

Did this answer your question?