The Daily Corpora is a web based platform for evaluating and exploring linguistically annotated text corpora.
For performance reasons, in the public available instance (link above) only the wp-2020 corpus is integrated. The raw data of the corpus together with some derived data files can be downloaded here:
Single file (AA/wiki_00, 700 KB, unzipped ca. 3.4 MB)
One subfolder (AA, 67 MB, unzipped ca. 327 MB)
The whole wp-2022 corpus (4.4 GB, unzipped ca. 22GB)
Named Entities Annotations (1.1 GB, unzipped ca. 2.7 GB)
Lemma list (199 MB zipped, including token and POS frequencies)
All trigrams of the corpus (4.4 GB zipped)