Tesseract-OCR


Version: 3.02.02
Date: 2013-08-5
Size:
12.90MB
Requirements:
No special requirements
Seller:
Ray Smith
Price:
Free
System:
Windows 7/Vista/XP
Rating:
4.9
License:
Others

Description - Tesseract-OCR



Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

Tesseract release notes Oct 23 2012 - V3.02.02

- Moved ResultIterator/PageIterator to ccmain.
- Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
- Added paragraph detection in layout Analysis/post OCR.
- Fixed inconsistent xheight during training and over-chopping.
- Added simultaneous multi-language capability.
- Refactored top-level word recognition module.
- Added experimental Equation detector.
- Improved handling of resolution from input images.
- Blamer module added for error analysis.
- Cleaned up externally used namespace by removing includes from baseapi.h.
- Removed dead memory management code.
- Tidied up constraints on control parameters.
- Added support for ShapeTable in classifier and training.
- Refactored class pruner.
- Fixed training leaks and randomness.
- Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding.
- Improved line detection and removal.
- Added fixed pitch Chopper for CJK.
- Added UNICHARSET to WERD_CHOICE to make mult-language handling easier.
- Fixed problems with internally scaled images.
- Added page and bbox to string in tr files to identify source of training data better.
- Fixes to Hindi Shiroreka splitter.
- Added word bigram correction.
- Reduced stack memory consumption and eliminated some ugly typedefs.
- Added new uniform classifier API.
- Added new training error counter.
- Fixed endian bug in dawg reader.
- C API (thanks to Tobias Müller)
- New solution for VS 2008 (thanks to Tom Powers)
- And more...



More in Components & Libraries-Tesseract-OCR

Detection Tesseract Module Added Finding Improved Line