Supertonic MNN CLI

A command-line interface for running Supertonic TTS models using MNN.

Features

MNN Inference: Fast, on-device inference using MNN, RTF ~ 0.07
Int8 Supports: no loss of precisions compared with fp32 and fp16

Usage

Install by pip and run:

pip install supertonic-mnn
# Provide text through stdin
echo "Hello world" | supertonic-mnn --output out.wav

# Or read from a text file
supertonic-mnn --input-file sentences.txt --voice F1 --precision int8 --output out.wav

Available Options

--input-file, -i: Input text file to synthesize (each line will be synthesized separately)
--voice: Voice style (default: M1, choices: M1, M2, F1, F2)
--precision: Model precision - fp32, fp16, or int8 (default: fp16)
--output, -o: Output audio file path (default: output.wav)
--speed: Speech speed multiplier (default: 1.0)
--steps: Number of denoising steps (default: 5)
--model-dir: Directory containing models

Installation By Source Code

git clone https://github.com/vra/supertonic-mnn
cd supertonic-mnn
uv sync

Usage

# Reading text from stdin
echo "Hello world" | uv run supertonic-mnn --output hello.wav

# Using local models with default precision (fp16)
echo "Hello world" | uv run supertonic-mnn --output hello.wav --model-dir /path/to/models

# Specify precision
echo "Hello world" | uv run supertonic-mnn --output hello.wav --precision fp32

# Download models from HuggingFace (automatic)
echo "Hello world" | uv run supertonic-mnn --output hello.wav --precision int8

# Batch processing from text file
uv run supertonic-mnn --input-file sentences.txt --voice F1 --output result.wav

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support