Tesseract is considered one of the most accurate open-source OCR engines currently, and its development has been supported by Google since 2006. Its capabilities can be more limited than commercial software such as Adobe Acrobat Pro and ABBYY FineReader. In this article, we are going to teach you How to Install and Use Tesseract OCR on Debian 11. You can visit the packages available in Eldernode if you wish to purchase a Linux VPS server.
Table of Contents
How to Install and Use Tesseract OCR on Debian Linux
Introduction Tesseract OCR
Tesseract is free and open-source software that runs through the command-line interface and is an optical character recognition (OCR) system. Tesseract has been sponsored by Google since 2006.
How to analyze documents by Tesseract
- User inputs desired title, document title and desired format into Tesseract.
- Tesseract analyzes images and creates a new and searchable document in the user’s desired format.
- You cannot scan something directly into Tesseract.
Install Tesseract OCR on Debian 11 | Debian 10
First, update Debian with the following command:
apt update -y
Then install Tesseract on Debian 11 by executing the following command:
sudo apt install tesseract-ocr
Tesseract will install under /usr/share/tesseract-ocr/4.00/tessdata.
The convert command is useful for converting between image formats and resizing an image, blurring, cropping, despeckling, dithering, drawing on, fliping, joining, re-sampling and more. This tool is provided by Imagemagick and you should enter the following command to install it:
sudo apt install imagemagick
Now you should test Tesseract. To do this, find an image containing the text and then execute the following command:
tesseract <image_name> <output file_name>
Tesseract extracts text from the image. To work with Tesseract, all you need to do is create word count documents. You have to train it to understand the handwriting.
Installing Tesseract with Sources
On different Linux distributions, you can also get Tesseract using the following command:
git clone https://github.com/tesseract-ocr/tesseract.git
Now you can go into the tesseract directory by running cd:
cd tesseract
At this point, you should run the autogen.sh script. To do this, enter the following command:
sudo ./autogen.sh
The above command creates the installation files. You can start the installation process by entering the following command:
sudo ./configure
You should enter the following command to start compiling Tesseract:
sudo make
Next, run the following command;
sudo make install
Then enter Idconfig command:
sudo Idconfig
Now you need to compile the training tools. To do this, run the following command:
sudo make training
Finally, run the following command:
sudo make training-install
Conclusion
InThis article taught you how to install and use Tesseract on Debian 11. We hope this article was useful for you.