Simple Text Extractor is a fast, high-performance, and 100% offline desktop OCR application designed for secure and effortless text extraction.
The application comes bundled with its own Tesseract OCR engine, specifically optimized to provide a turnkey experience on Linux without any complex dependencies. Version 1.2 introduces an Industrial-Grade core, capable of processing massive documents (2000+ pages) with maximum stability.
What's New in v1.2: "Secure & High-Performance Edition"
Memory-Optimized Engine: New ThreadPoolExecutor logic specifically tuned for Linux Snap confinement. It avoids memory saturation (OOM) by using smart pointer references, allowing the processing of files over 2300 pages without slowing down your OS.
Multi-Core Acceleration: Intelligent distribution of OCR tasks across your CPU cores (up to 8 simultaneous pages) for industrial-level productivity.
Enhanced Security (CWE-377 & CWE-209)
Atomic File Handling: Secure temporary file creation with restricted permissions (0600).
Log Anonymization: Automatic masking of personal paths and sensitive data in system logs.
Hardened Robustness: Native protection against "Decompression Bombs" (malicious high-pixel images) and command injection.
Expanded Language Pack: Now includes 17+ native languages (French, English, German, Dutch, Italian, Spanish, Portuguese, Chinese, Arabic, Japanese, Russian, Turkish, Vietnamese, Norwegian, Swedish, Danish, and Greek).
Key Features:
100% Offline & Private: Your documents never leave your computer. Ideal for legal, medical, or fiscal archives.
Professional Archiving: Generate PDF/A files for long-term preservation, keeping the original visual perfectly intact with a searchable text layer.
Batch Processing: Process multiple PDFs or images in one click.
Zero-Freeze UI: Completely asynchronous architecture ensuring the interface remains responsive even during heavy processing.
Metadata Analysis: Instant view of DPI, page count, and PDF/A status before processing.
How to Use:
Add Files: Drag and drop your PDFs or images (PNG, JPG, TIFF, BMP) into the window.
Select the OCR Language: choose the language corresponding to your document's text.
Configure: Select your destination folder or enable PDF/A archiving if needed.
Start OCR: Click "Start OCR" and follow the progress via the real-time loading bar.
Access: Click the direct links to open your processed files (automatically suffixed with _ocr).
💡 Tip: Using Wayland? If Drag & Drop doesn't work, please use the "Add Files" button or switch to an X11 session.
⚠️ IMPORTANT: Accessing External Drives (USB / Secondary Drives)
By default, the strict security confinement prevents the app from reading your USB sticks or external hard drives. If your files are stored on removable media, you must grant access by running this command once in your terminal:
COMMAND: sudo snap connect simple-text-extractor:removable-media
You are about to open
Do you wish to proceed?
Thank you for your report. Information you provided will help us investigate further.
There was an error while sending your report. Please try again later.
Snaps are applications packaged with all their dependencies to run on all popular Linux distributions from a single build. They update automatically and roll back gracefully.
Snaps are discoverable and installable from the Snap Store, an app store with an audience of millions.
Snap is available for Red Hat Enterprise Linux (RHEL) 8 and RHEL 7, from the 7.6 release onward.
The packages for RHEL 7, RHEL 8, and RHEL 9 are in each distribution’s respective Extra Packages for Enterprise Linux (EPEL) repository. The instructions for adding this repository diverge slightly between RHEL 7, RHEL 8 and RHEL 9, which is why they’re listed separately below.
The EPEL repository can be added to RHEL 9 with the following command:
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
sudo dnf upgrade
The EPEL repository can be added to RHEL 8 with the following command:
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo dnf upgrade
The EPEL repository can be added to RHEL 7 with the following command:
sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Adding the optional and extras repositories is also recommended:
sudo subscription-manager repos --enable "rhel-*-optional-rpms" --enable "rhel-*-extras-rpms"
sudo yum update
Snap can now be installed as follows:
sudo yum install snapd
Once installed, the systemd unit that manages the main snap communication socket needs to be enabled:
sudo systemctl enable --now snapd.socket
To enable classic snap support, enter the following to create a symbolic link between /var/lib/snapd/snap and /snap:
sudo ln -s /var/lib/snapd/snap /snap
Either log out and back in again or restart your system to ensure snap’s paths are updated correctly.
To install Simple-Text-Extractor, simply use the following command:
sudo snap install simple-text-extractor
Browse and find snaps from the convenience of your desktop using the snap store snap.
Interested to find out more about snaps? Want to publish your own application? Visit snapcraft.io now.