sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation1

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.

... part of T2, get it here

URL: https://github.com/google/sentencepiece

Author: Google
Maintainer: The T2 Project <t2 [at] t2-project [dot] org>

License: APL
Status: Stable
Version: 0.2.0

Download: https://github.com/google/sentencepiece/ sentencepiece-v0.2.0.tar.gz

T2 source: sentencepiece.cache
T2 source: sentencepiece.desc

Build time (on reference hardware): 50% (relative to binutils)2

Installed size (on reference hardware): 7.68 MB, 37 files

Dependencies (build time detected): bash binutils cmake coreutils cython diffutils gawk grep gzip linux-header make openssl pkgconfig protobuf python sed setuptools tar tbb

Installed files (on reference hardware): n.a.

1) This page was automatically generated from the T2 package source. Corrections, such as dead links, URL changes or typos need to be performed directly on that source.

2) Compatible with Linux From Scratch's "Standard Build Unit" (SBU).