🪶 Apache Tika
![]()
Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
🏷️ Category: Productivity
🐳 Image: docker.io/apache/tika:latest
🔗 Links
| 📦 Project | github.com/apache/tika |
| 🐛 Support | Apache Jira |
| 💛 Donate | donate.apache.org |
🌐 Ports
| Port | Protocol | Description |
|---|---|---|
9998 | TCP | Tika Web Interface & API |
💾 Volumes
No volumes required for this container.
⚙️ Environment Variables
No environment variables required for this container.
🚀 Quick Start
- Open the MOS Hub
- Search for Apache Tika
- Click Install
- Access the Tika API at
http://your-server-ip:9998
📡 API Usage
Once running, you can extract text from documents via the REST API:
# Extract text from a file
curl -X PUT --data-binary @document.pdf \
-H "Content-Type: application/pdf" \
http://your-server-ip:9998/tika
# Detect file type
curl -X PUT --data-binary @document.pdf \
http://your-server-ip:9998/detect/stream
💡 Tip: Apache Tika is often used together with Paperless-NGX or Nextcloud for automatic document processing.