tokenizers: Fast State-of-the-Art Tokenizers optimized for Research and Production1

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

... part of T2, get it here

URL: https://github.com/huggingface/tokenizers

Author: Anthony MOI <m [dot] anthony [dot] moi [at] gmail [dot] com>
Maintainer: The T2 Project <t2 [at] t2-project [dot] org>

License: APL
Status: Stable
Version: 0.21.0

Download: https://github.com/huggingface/tokenizers/ tokenizers-0.21.0.tar.gz

T2 source: tokenizers.cache
T2 source: tokenizers.desc

Build time (on reference hardware): 120% (relative to binutils)2

Installed size (on reference hardware): 9.66 MB, 43 files

Dependencies (build time detected): bash coreutils curl diffutils gawk grep gzip linux-header openssl pyrequests python python-gpep517 python-maturin rustc sed tar

Installed files (on reference hardware): n.a.

1) This page was automatically generated from the T2 package source. Corrections, such as dead links, URL changes or typos need to be performed directly on that source.

2) Compatible with Linux From Scratch's "Standard Build Unit" (SBU).