Skip to content

dwoods/chardet

This branch is 371 commits behind chardet/chardet:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

70af46c · Oct 1, 2012

History

15 Commits
Aug 1, 2012
Oct 1, 2012
Sep 10, 2012
Oct 25, 2011
Sep 10, 2012
Oct 1, 2012
Oct 1, 2012
Oct 25, 2011
Oct 1, 2012

Repository files navigation

chardet

chardet guesses the encoding of text files.

Detects...

  • ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
  • Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
  • EUC-KR, ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
  • ISO-8859-2, windows-1250 (Hungarian)
  • ISO-8859-5, windows-1251 (Bulgarian)
  • windows-1252 (English)
  • ISO-8859-7, windows-1253 (Greek)
  • ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Requires Python 2.1 or later.

Command-line Tool

chardet comes with a command-line script which reports on the encodings of one or more files:

% chardetect.py somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

About

Forked version of chardet

Resources

License

Stars

Watchers

Forks

Packages

No packages published