|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- gradio |
|
|
- omni-api |
|
|
- multimodal |
|
|
- chat-interface |
|
|
- pdf-processing |
|
|
- image-processing |
|
|
- audio-processing |
|
|
- llm |
|
|
- api-client |
|
|
- chatbot |
|
|
- text-generation |
|
|
- document-analysis |
|
|
- ocr |
|
|
- transcription |
|
|
widget: |
|
|
- src: https://api.modelharbor.com |
|
|
--- |
|
|
|
|
|
# Omni API Gradio UI |
|
|
|
|
|
This is a Gradio-based user interface for the Omni API that supports multimodal interactions with various file types including text, PDF documents, images, and audio files. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
The Omni API Gradio UI provides an easy-to-use web interface for interacting with the Omni API, which supports advanced multimodal AI capabilities. Users can send text prompts along with various file types and receive intelligent responses. |
|
|
|
|
|
### Supported Models |
|
|
|
|
|
The interface supports several state-of-the-art models: |
|
|
- typhoon-ocr-preview |
|
|
- openai/gpt-5 |
|
|
- meta-llama/llama-4-maverick |
|
|
- qwen/qwen3-vl-235b-a22b-instruct |
|
|
- gemini/gemini-2.5-pro |
|
|
- gemini/gemini-2.5-flash |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Multimodal Support**: Process text, PDFs, images, and audio files in a single interface |
|
|
- **File Ordering**: Upload multiple files in a specific order for precise control |
|
|
- **Configurable Models**: Switch between different AI models for different tasks |
|
|
- **Real-time Responses**: Get immediate feedback from the API |
|
|
- **Customizable Parameters**: Adjust max tokens and other settings |
|
|
|
|
|
## Intended Uses & Limitations |
|
|
|
|
|
### Intended Uses |
|
|
- Document analysis and summarization |
|
|
- Image OCR and analysis |
|
|
- Audio transcription and analysis |
|
|
- Multimodal chat applications |
|
|
- Content extraction from various file formats |
|
|
|
|
|
### Limitations |
|
|
- Requires access to the Omni API |
|
|
- Dependent on network connectivity |
|
|
- File size limitations based on API constraints |
|
|
- Some models may require API keys |
|
|
|
|
|
## How to Use |
|
|
|
|
|
1. Configure the API base URL (defaults to https://api.modelharbor.com) |
|
|
2. Select your preferred model from the dropdown |
|
|
3. Enter your text message in the input box |
|
|
4. Upload files (PDF, images, or audio) as needed |
|
|
5. Click "Send Request" to interact with the API |
|
|
6. View the response in the output panel |
|
|
|
|
|
### Supported File Types |
|
|
|
|
|
- **PDFs**: Document processing and analysis |
|
|
- **Images**: JPG, PNG, GIF, BMP, WEBP for OCR and visual analysis |
|
|
- **Audio**: MP3, WAV, M4A, FLAC, OGG for transcription |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Frameworks and Libraries |
|
|
- Gradio 4.0+ |
|
|
- Python 3.8+ |
|
|
- Requests library for API communication |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
# Install dependencies |
|
|
uv sync |
|
|
|
|
|
# Run the application |
|
|
uv run python app.py |
|
|
``` |
|
|
|
|
|
### Development Mode |
|
|
```bash |
|
|
# Run with auto-reload for development |
|
|
uv run python dev.py |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this interface in your work, please cite: |
|
|
|
|
|
``` |
|
|
@misc{omni_api_gradio_ui, |
|
|
title={Omni API Gradio UI}, |
|
|
author={ModelHarbor Team}, |
|
|
year={2025}, |
|
|
howpublished={\url{https://github.com/your-username/omni-api-gradio-ui}} |
|
|
} |