Tika Server for metadata discovery and extraction
Apache Tika is a content detection and analysis framework, written in Java, stewarded at the Apache Software Foundation.It detects and extracts metadata and text from over a thousand different file types, and as well as providing a Java library, has server and command-line editions suitable for use from other programming languages.