Difference between revisions of "Optical Character Recognition"

From ArchWiki
Jump to: navigation, search
m (add to Software category)
(merged to List of applications/Documents#OCR software, redirect there)
(16 intermediate revisions by 9 users not shown)
Line 1: Line 1:
[[Category:Software (English)]]
#REDIRECT [[List of applications/Documents#OCR software]]
There are several steps to the whole OCR process, the actual OCR engine is only part of this:
# scanning
# document layout analysis
# optical character recognition
# post-processing (formatting, PDF creation)
== OCR software ==
=== OCR (Optical Character Recognition) Engines ===
* {{App|[[CuneiForm]]|A command line OCR system originally developed and open sourced by Cognitive technologies. Supported languages: eng, ger, fra, rus, swe, spa, ita, ruseng, ukr, srp, hrv, pol, dan, por, dut, cze, rum, hun, bul, slo, lav, lit, est, tur.|https://launchpad.net/cuneiform-linux|{{Pkg|cuneiform}}}}
* {{App|[[GOCR]]/JOCR|An OCR engine which also supports barcode recognition.|http://jocr.sourceforge.net/|{{Pkg|gocr}}}}
* {{App|[[Ocrad]]|An OCR program based on a feature extraction method.|http://www.gnu.org/software/ocrad/|{{Pkg|ocrad}}}}
* {{App|[[Tesseract]]|"Probably one of the most accurate open source OCR engines available".|http://code.google.com/p/tesseract-ocr/|{{Pkg|tesseract}}}}
=== Layout analysers and user interfaces ===
* {{App|[[YAGF]]|graphical interface for the [[CuneiForm]] text recognition program on the Linux platform. Available from community repository|http://symmetrica.net/cuneiform-linux/yagf-en.html|{{Pkg|yagf}}}}
* {{App|[[gscan2pdf]]|scans, runs Tesseract and creates a PDF all in one go|http://gscan2pdf.sourceforge.net/|{{AUR|gscan2pdf}}}}
* {{App|[[Kooka]]|scanner GUI for KDE which supports the OCR engines [[GOCR]], [[Ocrad]] or [[KADMOS]]. Used to be part of kdegraphics4, but dropped out due to lack of development|http://kooka.kde.org/{{Linkrot|2011|09|03}}|{{AUR|kooka}}}}
* {{App|[[OCRFeeder]]|Python GUI for Gnome which performs document analysis and rendition, and can use either [[CuneiForm]], [[GOCR]], [[Ocrad]] or [[Tesseract]] as OCR engines. It can import from PDF or image files, and export to HTML or OpenDocument. Available from [[AUR]]|http://live.gnome.org/OCRFeeder|{{AUR|ocrfeeder}}}}
* {{App|[[OCRopus]]|OCR ''platform'', modules exist for document layout analysis, OCR engines (it can use Tesseract or its own engine), natural language modelling, etc. Available from [[AUR]]|http://code.google.com/p/ocropus/|{{AUR|ocropus}}}}

Latest revision as of 12:46, 15 April 2014