WP2TXT extracts text and category data from Wikipedia dump files (encoded in XML / compressed with Bzip2), removing MediaWiki markup and other metadata.
Required Ruby Version
>= 3.0
Authors
Yoichiro Hasebe
Versions
- 2.1.1 February 21, 2026 (300 KB)
- 2.1.0 February 19, 2026 (299 KB)
- 1.1.3 May 13, 2023 (7.78 MB)
- 1.1.2 April 15, 2023 (7.78 MB)
- 1.1.1 January 25, 2023 (7.78 MB)