• Documentation
  • Grimm
  • Get Started

Get Started

What is Grimm?

Grimm is a little package to do the syntax cleaning (based on some ideas from WikiExtractor) for Wikipedia markup contents. A side quest of Grimm is to extract the links from the contents. So, alongside with the cleaned contents (no link markup), Grimm also returns a separate list of links for the contents (binding with their associated indexes).

Installation

Grimm is released on PyPI. You can install it with pip:

pip install grimm

Manual Installation

To install Grimm manually (not using PyPI), you need to have the distribution package file locally. You can download it from the releases page. Then, you can install it using the following command (with correct version number):

pip install grimm-0.1.0-py3-none-any.whl

Quick Start

To use Grimm, you need to import functions from grimm module:

from grimm import clean_syntax
💡

Please note that, Grimm does not have any class yet (and no need for that). So, you can only use the functions from the module.

Then, just pass your content into the clean_syntax function:

text, external_links, internal_links, images = clean_syntax(content)
Last updated on December 8, 2022