What is a corpus in the area of NLP?
In computational linguistics or NLP, corpus is a text or even a conglomerate of texts that form the context of models. In a broad sense, a corpus is the training data set of a language model. Depending on the model, however, it also implies the vocabulary and the connection between words.