Quick Start¶
Extract text from your first document in under 5 minutes.
1. Start the Stack¶
Wait for all services to be healthy:
2. Open the API Docs¶
Visit http://localhost:8000/api/docs for the interactive Swagger UI.
3. Extract Text from a PDF¶
4. Get Raw Text (No JSON Wrapper)¶
5. Explore Available Formats¶
Response:
{
"extraction_engines": ["tika", "pymupdf", "auto"],
"ocr_engines": ["tesseract", "google_vision", "none"],
"output_formats": ["markdown", "json", "plaintext"],
"max_file_size_mb": 100
}
Next Steps¶
- Configuration — customize extraction behavior
- Architecture — understand the pipeline
- API Reference — full endpoint documentation