thomas-zastrow.de
Home
The Daily Corpora
Data Stuff
Publications
Teaching
Impressum
The wp-2022 Textcorpus
Downloads
Single file
(AA/wiki_00, 700 KB, unzipped ca. 3.4 MB)
One subfolder
(AA, 67 MB, unzipped ca. 327 MB)
The whole wp-2022 corpus
(4.4 GB, unzipped ca. 22GB)
Named Entities Annotations
(1.1 GB, unzipped ca. 2.7 GB)
Lemma list
(199 MB zipped, including token and POS frequencies)
All trigrams of the corpus
(4.4 GB zipped)