How to scan and OCR like a pro with open source tools

Posted by Scott_Ruecker on Jun 25, 2008 3:28 AM EDT
Linux.com; By Mathis Dirksen-Thedens
Mail this story
Print this story

With optical character recognition (OCR), you can scan the contents of a document into a single file of editable text. This article, which focuses on scanning books, describes the steps you need to take to prepare pages for optimal OCR results, and compares various free OCR tools to determine which is the best at extracting the text. First, fire up your distribution's package manager to fetch a few packages and dependencies. In Debian, the required packages are sane, sane-utils, imagemagick, unpaper, tesseract-ocr, and tesseract-ocr-eng. You may also install other language packs for Tesseract -- for example, I installed tesseract-ocr-deu for German text.

Full Story

  Nav
» Read more about: Groups: Debian; Story Type: News Story

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.