Turn scanned documents into high-quality PDF files
Scantools is a high-quality library and a matching set of command-line programs for the handling and manipulation of scanned documents. At present, the tools can convert image files to PDF/A. Files in JBIG2, JPEG, and JPEG2000 format are directly included in the PDF, other files are compressed in a lossless manner. HOCR files, which are produced by optical character recognition programs such as ‘tesseract’, can be used to make the PDF file searchable. The resulting files comply with the ISO PDF/A standard for long-term archiving of digital documents and offer compression rates comparable to that of the DJVU file format. There are currently three command-line utilities.
scantools.image2pdf converts images to a PDF/A compliant PDF file.
scantools.hocr2any converts HOCR files to text or renders them as raster graphics or PDF files.
scantools.ocrPDF adds a text layer to a graphics-only PDF file, without re-encoding graphics data or otherwise modifying file content.
The packages for RHEL 8 and RHEL 7 are in each distribution’s respective Extra Packages for Enterprise Linux (EPEL) repository. The instructions for adding this repository diverge slightly between RHEL 8 and RHEL 7, which is why they’re listed separately below.
The EPEL repository can be added to RHEL 8 with the following command: