🪶 Apache Tika

Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

🏷️ Category: Productivity

🐳 Image: docker.io/apache/tika:latest


   
📦 Project github.com/apache/tika
🐛 Support Apache Jira
💛 Donate donate.apache.org

🌐 Ports

Port Protocol Description
9998 TCP Tika Web Interface & API

💾 Volumes

No volumes required for this container.


⚙️ Environment Variables

No environment variables required for this container.


🚀 Quick Start

  1. Open the MOS Hub
  2. Search for Apache Tika
  3. Click Install
  4. Access the Tika API at http://your-server-ip:9998

📡 API Usage

Once running, you can extract text from documents via the REST API:

# Extract text from a file
curl -X PUT --data-binary @document.pdf \
  -H "Content-Type: application/pdf" \
  http://your-server-ip:9998/tika

# Detect file type
curl -X PUT --data-binary @document.pdf \
  http://your-server-ip:9998/detect/stream

💡 Tip: Apache Tika is often used together with Paperless-NGX or Nextcloud for automatic document processing.


MOS Templates – maintained by J000K3R

This site uses Just the Docs, a documentation theme for Jekyll.