A collection of data used to train and evaluate machine learning models. Related Articles: Glossary: PretrainingGlossary: Token(s)