What is Botgen Platform?
Botgen platform is dedicated to scale NLP models up. As a chatbot developer, you will find all the needed tools to manage massive data sets and to configure, train, analyse and deploy your NLP models.
With Botgen Platform you have full hands on what you do and on your methodologies. You choose the way you want to create your training and test sets, configure your training pipeline and analyse your results.
In this quick start, we offer you an example of a project set up. Thus, we are going to deal with the following topics :
- Intents and Entities creation
- Data addition and tagging
- Values dictionaries creation
- Filtering data sets
- Creating train sets and test sets
- Configure a model
- Analyse first results
- Using a first model to speed tagging up
Create a project
Once you have logged-in, you arrive on the project management screen. It is the place in which you will find all the projects you created and the ones shared with you. If it is the first time, you connect to the platform, you probably find an empty list, unless somebody shared projects with you. To create a new project, click on “New project” button.
A pop-up ask you to define a name for your new project. Enter your your project’s name and click on “save”. Your new project is now available on projects list. Click on it to set up your first NLP model.
Create your intents
The platform is divided in two main part : “NLP” and “Training”. A the beginning, you will spend most of your time in the “NLP” part since it is where you will manage and tag your data. The “training” part is dedicated to model configuration and result analysis.
In the next sections, we will consider that you know how to structure a NLP model project and that you are confident with all of the key concepts of NLP like intents, entities, training etc…
If you are not at ease with those notions, you can read this article : “How to structure a NLP project”
For this tutorial, we are going to build the first iteration of NLP model used by a retail chatbot that helps customers in choosing their sport shoes. The two main intents of this chatbot are “GET_ADVICE”, « TOP_PRODUCT » and “GET_PRODUCT_FEEDBACK”.
“GET_ADVICE” is the intent of customers seeking for pieces of advice on the shoes they should choose based on their activities, level, motivation.
« TOP_PRODUCT » is the intent of customers looking for the best products in their categories.
“GET_PRODUCT_FEEDBACK” is the intent of customers seeking for feedbacks on specific products.
Let’s create our two intents. To create intents, click on “Intents” tab and on the “New intent” button.
Choose a name and a color for your intent. Click on “save” to add this intent to the list
After having defined our intents, we are going to define the entities that will be useful for our chatbot logic. Here are the entities to extract from sentences :
- PRODUCT : the kind of shoes the customer is looking for
- ACTIVITY : the activity practiced by the customer
- MOTIVATION : the reason why the customer is seeking for new sport shoes
- LEVEL : the level of the customer in his activity
- BRAND : brand of the product
- CHARAC : characteristics of the product
- PRICE : price of the product
To create a new entity, click on the entities tab and on the “new entity” button. Choose a name and a color and click on “save” to add it in the list.
Add your first sentences
In this section we are going to add sentences to train the model. All the sentences used for this tutorial are available here.
To add sentences, click on “sentences” and on the “add” button.
In the pop-up that appears, you can enter your sentences. To add several sentences at the same time, just split them with a line break. You can also copy and paste sentences from a spreadsheet.
Once done, you can select the intent of your batch of sentences or not, if the sentences have different intents.
You can also add a label to your sentences. Labels are simple but very powerful tool to structure your data sets. You can use them to track the origin of sentences, to define their use (training, test…) or their status (tagged, pending, trashed). You are totally free to use labels to whatever purpose in order to simplify your subsets creation.
In this case, let’s say that we want to label those sentences to know that they come from the development team. It will be useful if we want to remove them from the train set when we will have sentences given by our chatbot users. We add an “internal” label to the batch of sentences. Click on “save” to add your sentences
Sentences tagging and entity values dictionaries
Once you have added new sentences to your data set, you can tag them. This operation is simple. Click on a word or a group of word you want to tag. A widget appears in which you can select the entity you want to extract for this expression. Once you have selected the proper entity, you can add a value. If no keen-on value exists, you can add it by clicking on “add value” button. If a value similar to the entity value you are tagging already exist, you can add an alias by clicking “add an alias”. Validate your tag and repeat this tagging operation on all the sentence of your set. To make your sentences available for your models you need to validate them by clicking on .
To access to the values dictionaries you created while tagging, go into the entities section and select an entity. This will open the entity dictionary with values on the left and synonyms on the right.
You can export and import your dictionaries and manually add new values and synonyms
Create you first training and test set
After we tagged all our sentences, we want to create train sets and test sets. To do so, we are going to use labels, filters and views. You are already familiar with labels (if not please read « add your first sentences section ») so we are only going to detail filters and views notions in this part.
First, we have to create a « test » label to distinguish sentences that will be use for testing from those that will be used for training. To create a label for a sentence, click on the label column to make a drop-down appear. You should see the labels you already created. To create a new label, type your label’s name and press enter. Create « test ».
Now we want to apply this label to a bulk of sentences. Select several sentences by ticking them, click on the « X sentences selected » button. A drop-down appears. It allows you to apply global actions to your selection like « Delete sentences », « Change intents » or « Change labels ».
Click on « Change labels », select « test » and save. « test » is applied to selected sentences.
As our sentences dedicated to test are labelled, we can use filters to create a data subset. Click on « Filters » button.
Botgen Platform filters are powerful tools to navigate into your data. You can select any object to filter on like intents, entities, entities count, labels. You have different powerful operators to filter like « has any » to select sentences matching any of the values you provided.
You can also combine filters by adding « AND » or « OR » operators between them.
To create our test set, we are going to filter sentences that have « test » label. Click on « add filter ». Then select « label », then « has » operator and select « test » for the label value. Apply the filter. You should now have a subset of your data with all sentences that have « test » label.
Once you have the right filter, you can save it in a View. A View is a subset of your data that will update if you add sentences that match with that filter. Views are very useful to create test sets and train sets for instance.
Let’s create a view for our test set. Click on « Create view », give it a name, « test set » for example and save.
You can now go back to that subset at any time by selecting it in the drop-down menu on the left.
To create the train set, we will use all sentences that are not in our test set. We simply need to use a filter that excludes those sentences. To do it, click on add a filter and select « label », « has not » and « test ». Apply and save the view as « train set »
Create a first model
As you have a train set, you can create your first model. First, you need to define a configuration for your training pipe. Botgen Platform is based on RASA NLU for the management of the training pipeline so you needs to provide a .yml file that describes the RASA pipe you want to use. For this tutorial, we are going to use the following configuration file : config-standard.yml
To add this config file to the configurations list, go to « Configurations », click on « Add a new configuration ». Define the name of the configuration (« standard » for instance), upload your file and add a note if necessary (to describe your training pipeline for instance). Save.
Then, go to « Models » and click on « New Model ». First you need to give a name to your model like « init model » and select a configuration. Select the « standard » config, we’ve just added.
Then, define the train set you want to use. Fortunately, we created a view of our train set that we can use here. As you select you view, you see the details of your filter. You can modify you filter if needed and you can also see the detail of the sentences contained in it. If necessary, you can manually remove some sentences. Validate
Then, the interface asks you if you want to define a test set. It is not compulsory, but for the purpose of this tutorial we are going to define one. Click on « add a test set » button and select « test set » view. You can define several test sets. Nonetheless, to keep it simple, let’s stay with one. Validate.
Besides, you can select/unselect the entities you want integrate or not in your training. It is useful if you find it hard to extract some close entities and you want to test your model performance without one of them. For this tutorial, keep them all. Validate.
At last, a sum up screen appears with the details of the training you are going to proceed. Validate. The model configuration appears in your models list. Click on « Train » button and wait for the training to finish. When the training is over, a button appears. To access to the results of your training, click on it.
Analyse your model performances
In this dashboard you can assess your model training based on classic KPIs. First, you need to select the test set you want to use to analyse your figures in the drop-down menu in the upper-left corner of the page. As you have only one test set at this stage, you can only select one but if you had several, you could take the one you want to use to assess your model’s performances.
To begin with, on the top of the page, you find the main KPIs used in NLP : Recall, Precision and F1 Score. Those are useful to have a global idea of your model performances. You will find more information on KPIs in the documentation : (doc link)
To go deeper in your analysis, you should use the confusion matrix below. It gives you a quick overview of the prediction made by your model on the sentences in your test set. You can analyse those predictions using recall, precision and F1 score. The more efficient your model is, the more the predictions are on the diagonal. All prediction that differs from it is a confusion between two intentions. The darker a square in the matrix is, the more the two intents are confused (or well predicted if on the diagonal).
To deep dive in the sentences that led to a confusion, just click on the square à the crossing of the two intents you are interested in. A list of the sentences appears at the page bottom. You can compare the sentences the way they where expected (as you tagged them) with the way they where predicted by your model. This analysis gives you insights on the actions to take to improve your models, like add an entity to extract to improve the distinction between two intents or adapt the mix of your training sentences if your model is over-fitted.
Using a first model to speed tagging up
Now, you have your first model, you can use it to speed your tagging up.
Go back in the sentences section and add some sentences like :
- I’m looking for shoes to run under the rain
- I want shoes to perform on a marathon
- What could you have that is durable and resistant for trail running ?
You can now tick the option « parse sentences with NLP ». That means that your NLP model is going to predict the intent of your sentences and extract their entities. This is not a real test but it gives you good intuition on your model’s performances. Above all, it saves you a lot of manual tagging time. To use this option, please do not specify any intent. You can nonetheless attach labels to your sentences. Save and see the results.
Use your model
After a few iterations, you will end into having a NLP model performant enough to be used in a chatbot. You can access to your model with two options : a request to Botgen API or a request to your dowloaded model (packaged in a Docker machine or raw).
The quickest way is the API request. To set it up, go to models, select the model you want to use and click on the button. Select « API ».
A window opens with all the information necessary to make a request. You need to generate a token to secure your calls. Once done, the interface provides you a curl example of a request to send through the API and an example of the response you will get.
This response contains information useful for your conversational logic : the entities extracted, the component used to extracted them, intent and the confidence of the intent classifier on its prediction and a intent ranking useful to set up threshold in your conversational logic.
That concludes our quick start tutorial. You are now able to initiate a first NLP model with Botgen Platform, iterate on it and use it in your applications. More advanced tutorials are at stake and will be released soon.
To finish with do not hesitate to contact us if you face a problem of course but also to give feedback to us. Your feedbacks are very important to us to improve our product and build a roadmap that fits with your expectations. Those feedbacks can deal with whatever subject : the product but also the documentation, support…