On this page, articles from a corpus are displayed (in the wp-2022 corpus, an article represents an article of the German Wikipedia).
The Search Function looks only through the titles of the articles, not the fulltext 1. For the wp-2022 corpus, enter a title or part of a title from the German Wikipedia.
After the search was executed, you can choose from the result list and the plain text of the article, without any formatting, images etc. will be displayed:
The text can be colorized in several ways (sentiment analysis, named entities and POS tagging). Statistics about POS tagging and named entities in the article can be displayed. See details below.
Colorizing the Text¶
Based on the free available SentWS dataset, every word of the article which appears in the SentWS dataset gets a color assigned: green for positive, red for negative words. In addition, some aggregated values for the whole article are shown at the top.
Every word gets colored by its part of speech tag. In the field "POS Tags to colorize", you can enter which POS tags should get colorized, the other ones stay black. Entering "all" in the field results in coloring all POS tags (which is a very nice colorful mess ...)
Persons get a green color, locations are in blue and organisations are red. The MISC category is not colored.
You can display some statistics about the POS tags or the named entities the article contains.
If you want a fulltext search, goto search. ↩