GTokenizer recreates the closed-source tokenization library used by Google for their Google NGrams app (http://ngrams.googlelabs.com/), based on the information in the associated Science paper (http://www.sciencemag.org/content/suppl/2010/12/16/science.1199644.DC1/Michel.SOM.revision.2.pdf)

Required Ruby Version

None

Authors

Alex Peattie

Versions

  1. 1.0.0 July 02, 2011 (6.5 KB)

SHA 256 checksum