Corpus (definition)

Table of contents

About this guide

A corpus contains text and voice data that can be used to train AI and machine learning systems.

What is a corpus in the area of NLP?

In computational linguistics or NLP, corpus is a text or even a conglomerate of texts that form the context of models. In a broad sense, a corpus is the training data set of a language model. Depending on the model, however, it also implies the vocabulary and the connection between words.

Happier customers through faster answers.

See for yourself and create your own chatbot. Free of charge and without obligation.