Get Started
What is Grimm?
Grimm is a little package to do the syntax cleaning (based on some ideas from WikiExtractor) for Wikipedia markup contents. A side quest of Grimm is to extract the links from the contents. So, alongside with the cleaned contents (no link markup), Grimm also returns a separate list of links for the contents (binding with their associated indexes).
Installation
Grimm is released on PyPI. You can install it with pip:
pip install grimm
Manual Installation
To install Grimm manually (not using PyPI), you need to have the distribution package file locally. You can download it from the releases page. Then, you can install it using the following command (with correct version number):
pip install grimm-0.1.0-py3-none-any.whl
Quick Start
To use Grimm, you need to import functions from grimm
module:
from grimm import clean_syntax
Please note that, Grimm does not have any class yet (and no need for that). So, you can only use the functions from the module.
Then, just pass your content into the clean_syntax
function:
text, external_links, internal_links, images = clean_syntax(content)