Ruby port of TinySegmenter.js for tokenizing Japanese text. Uses a Naive Bayes model that has been trained using the RWCP corpus and optimized using L1-norm regularization. The resultant model is quite compact, yet has a 95% accuracy rate.

Required Ruby Version

>= 0

Authors

Peter Graham

Versions

  1. 0.0.6 October 26, 2015 (16 KB)
  2. 0.0.4 March 31, 2013 (16 KB)
  3. 0.0.2 August 27, 2012 (14 KB)
  4. 0.0.1 August 20, 2012 (11.5 KB)
Show all versions (6 total)

SHA 256 checksum