Linux News
The world is talking about GNU/Linux and Free/Open Source Software

How to scan and OCR like a pro with open source tools

Posted by Scott_Ruecker on Jun 25, 2008 3:28 AM EDT
Linux.com; By Mathis Dirksen-Thedens

With optical character recognition (OCR), you can scan the contents of a document into a single file of editable text. This article, which focuses on scanning books, describes the steps you need to take to prepare pages for optimal OCR results, and compares various free OCR tools to determine which is the best at extracting the text. First, fire up your distribution's package manager to fetch a few packages and dependencies. In Debian, the required packages are sane, sane-utils, imagemagick, unpaper, tesseract-ocr, and tesseract-ocr-eng. You may also install other language packs for Tesseract -- for example, I installed tesseract-ocr-deu for German text.

Full Story

Nav

» Read more about: Groups: Debian; Story Type: News Story

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.

Linux News
The world is talking about GNU/Linux and Free/Open Source Software

Login

Today's Big Story

LXer Features

Have something to say?

Latest Discussions

Site Menu

Other News

How to scan and OCR like a pro with open source tools

Linux NewsThe world is talking about GNU/Linux and Free/Open Source Software

Login

Today's Big Story

LXer Features

Have something to say?

Latest Discussions

Site Menu

Other News

How to scan and OCR like a pro with open source tools

Linux News
The world is talking about GNU/Linux and Free/Open Source Software