What Is a Corpus in the Area of NLP?
In computational linguistics or NLP, corpus is a text or even a conglomerate of texts that form the context of models. In a broad sense, a corpus is the training data set of a language model. Depending on the model, however, it also implies the vocabulary and the connection between words.