|
|
(7 intermediate revisions by 4 users not shown) |
Line 1: |
Line 1: |
− | [[Category:Applications]] | + | #REDIRECT [[List of applications/Documents#OCR software]] |
− | {{Stub}}
| |
− | | |
− | There are several steps to the whole OCR process, the actual OCR engine is only part of this:
| |
− | # scanning
| |
− | # document layout analysis
| |
− | # optical character recognition
| |
− | # post-processing (formatting, PDF creation) | |
− | | |
− | == OCR software ==
| |
− | === OCR (Optical Character Recognition) Engines ===
| |
− | * {{App|[[CuneiForm]]|A command line OCR system originally developed and open sourced by Cognitive technologies. Supported languages: eng, ger, fra, rus, swe, spa, ita, ruseng, ukr, srp, hrv, pol, dan, por, dut, cze, rum, hun, bul, slo, lav, lit, est, tur.|https://launchpad.net/cuneiform-linux|{{Pkg|cuneiform}}}}
| |
− | * {{App|[[GOCR]]/JOCR|An OCR engine which also supports barcode recognition.|http://jocr.sourceforge.net/|{{Pkg|gocr}}}}
| |
− | * {{App|[[Ocrad]]|An OCR program based on a feature extraction method.|http://www.gnu.org/software/ocrad/|{{Pkg|ocrad}}}}
| |
− | * {{App|[[Tesseract]]|"Probably one of the most accurate open source OCR engines available". Package splitted, you need install some datafiles for each language ({{Pkg|tesseract-data-eng}} for examle).|http://code.google.com/p/tesseract-ocr/|{{Pkg|tesseract}}}}
| |
− | | |
− | === Layout analysers and user interfaces ===
| |
− | * {{App|[[OCRFeeder]]|Python GUI for Gnome which performs document analysis and rendition, and can use either [[CuneiForm]], [[GOCR]], [[Ocrad]] or [[Tesseract]] as OCR engines. It can import from PDF or image files, and export to HTML or OpenDocument. |http://live.gnome.org/OCRFeeder|{{pkg|ocrfeeder}}}}
| |
− | * {{App|[[YAGF]]|graphical interface for the [[CuneiForm]] text recognition program on the Linux platform. Available from community repository|http://symmetrica.net/cuneiform-linux/yagf-en.html|{{Pkg|yagf}}}}
| |
− | * {{App|[[gscan2pdf]]|scans, runs Tesseract and creates a PDF all in one go|http://gscan2pdf.sourceforge.net/|{{AUR|gscan2pdf}}}}
| |
− | * {{App|[[OCRopus]]|OCR ''platform'', modules exist for document layout analysis, OCR engines (it can use Tesseract or its own engine), natural language modelling, etc. Available from [[AUR]]|http://code.google.com/p/ocropus/|{{AUR|ocropus}}}}
| |