open source optical character recognition engine
Tesseract has unicode (UTF-8) support, and can recognize more than 100
languages "out of the box". It can be trained to recognize other languages.
Tesseract supports various output formats: plain-text, hocr(html), pdf.
If you want to access the files under /media/* or /run/media/* you'll have
to connect the snap to the core
snap's removable-media
interface:
$ sudo snap connect tesseract:removable-media