Skip to content

REST API Reference

Base URL: http://localhost:8000/api/v1

Interactive docs: http://localhost:8000/api/docs


Endpoints

Health Check

GET /api/v1/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "env": "development",
  "extractors": ["tika", "pymupdf"],
  "ocr_available": true
}

Extract Text

POST /api/v1/extract
Content-Type: multipart/form-data

Parameters:

Name Type Default Description
file File required Document to extract
output_format string markdown markdown, json, plaintext
extraction_engine string auto auto, tika, pymupdf
ocr_engine string tesseract tesseract, none
language string OCR language hint

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "filename": "document.pdf",
  "mime_type": "application/pdf",
  "size_bytes": 102400,
  "output_format": "markdown",
  "content": "# Document Title\n\nExtracted text...",
  "processing_time_ms": 1234,
  "created_at": "2026-02-11T10:00:00Z",
  "extraction_engine": "pymupdf"
}

Extract Raw

POST /api/v1/extract/raw
Content-Type: multipart/form-data

Same parameters as /extract, but returns raw text without JSON wrapper.

Response: text/markdown or text/plain body.


Get Formats

GET /api/v1/formats

Response:

{
  "extraction_engines": ["tika", "pymupdf", "auto"],
  "ocr_engines": ["tesseract", "google_vision", "none"],
  "output_formats": ["markdown", "json", "plaintext"],
  "max_file_size_mb": 100
}

Error Responses

All errors follow this format:

{
  "detail": "File exceeds maximum size of 100MB"
}
Status Meaning
400 Bad request (missing file, empty file)
413 File too large
422 Processing failed
500 Internal server error