Spaces:

utarn
/

ai_ocr

Running

App Files Files Community

ai_ocr / model_card.md

utarn

Update model

dfdb180 2 months ago

preview code

raw

history blame contribute delete

2.84 kB

	---
	license: mit
	tags:
	- gradio
	- omni-api
	- multimodal
	- chat-interface
	- pdf-processing
	- image-processing
	- audio-processing
	- llm
	- api-client
	- chatbot
	- text-generation
	- document-analysis
	- ocr
	- transcription
	widget:
	- src: https://api.modelharbor.com
	---

	# Omni API Gradio UI

	This is a Gradio-based user interface for the Omni API that supports multimodal interactions with various file types including text, PDF documents, images, and audio files.

	## Model Description

	The Omni API Gradio UI provides an easy-to-use web interface for interacting with the Omni API, which supports advanced multimodal AI capabilities. Users can send text prompts along with various file types and receive intelligent responses.

	### Supported Models

	The interface supports several state-of-the-art models:
	- typhoon-ocr-preview
	- openai/gpt-5
	- meta-llama/llama-4-maverick
	- qwen/qwen3-vl-235b-a22b-instruct
	- gemini/gemini-2.5-pro
	- gemini/gemini-2.5-flash

	## Features

	- Multimodal Support: Process text, PDFs, images, and audio files in a single interface
	- File Ordering: Upload multiple files in a specific order for precise control
	- Configurable Models: Switch between different AI models for different tasks
	- Real-time Responses: Get immediate feedback from the API
	- Customizable Parameters: Adjust max tokens and other settings

	## Intended Uses & Limitations

	### Intended Uses
	- Document analysis and summarization
	- Image OCR and analysis
	- Audio transcription and analysis
	- Multimodal chat applications
	- Content extraction from various file formats

	### Limitations
	- Requires access to the Omni API
	- Dependent on network connectivity
	- File size limitations based on API constraints
	- Some models may require API keys

	## How to Use

	1. Configure the API base URL (defaults to https://api.modelharbor.com)
	2. Select your preferred model from the dropdown
	3. Enter your text message in the input box
	4. Upload files (PDF, images, or audio) as needed
	5. Click "Send Request" to interact with the API
	6. View the response in the output panel

	### Supported File Types

	- PDFs: Document processing and analysis
	- Images: JPG, PNG, GIF, BMP, WEBP for OCR and visual analysis
	- Audio: MP3, WAV, M4A, FLAC, OGG for transcription

	## Technical Details

	### Frameworks and Libraries
	- Gradio 4.0+
	- Python 3.8+
	- Requests library for API communication

	### Installation
	```bash
	# Install dependencies
	uv sync

	# Run the application
	uv run python app.py
	```

	### Development Mode
	```bash
	# Run with auto-reload for development
	uv run python dev.py
	```

	## Citation

	If you use this interface in your work, please cite:

	```
	@misc{omni_api_gradio_ui,
	title={Omni API Gradio UI},
	author={ModelHarbor Team},
	year={2025},
	howpublished={\url{https://github.com/your-username/omni-api-gradio-ui}}
	}