Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! ๐
mrfakename PRO
AI & ML interests
Recent Activity
Organizations
Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! ๐
Fine-tuned MiMo Audio to accept text/emotion captions (e.g. "intense fury, rage, hate") as input, trained a LoRA for 1k steps on LAION's voice acting dataset.
Thanks to HF for the GPUs to train ๐ค
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo
(Turn ๐ on to hear audio samples)
Try out Papla's new ultra-realistic TTS model + compare it with other leading models on the TTS Arena: TTS-AGI/TTS-Arena
mrfakename/mistral-small-3.1-24b-instruct-2503-gguf
Instruct: mrfakename/mistral-small-3.1-24b-instruct-2503-hf
Base: mrfakename/mistral-small-3.1-24b-base-2503-hf
GGUF quants coming soon!
The refreshed UI for the leaderboard is smoother and (hopefully) more intuitive. You can now view models based on a simpler win-rate percentage and exclude closed models.
In addition, the TTS Arena now supports keyboard shortcuts. This should make voting much more efficient as you can now vote without clicking anything!
In both the normal Arena and Battle Mode, press "r" to select a random text, Cmd/Ctrl + Enter to synthesize, and "a"/"b" to vote! View more details about keyboard shortcuts by pressing "?" (Shift + /) on the Arena.
Check out all the new updates on the TTS Arena:
TTS-AGI/TTS-Arena
Hi, do you see a limit in the number of voices I have 416 and it fails to load all of them. (scroll menu limit?)
I'm not sure if there's a set limit for the dropdown, but with that many voices, it might make sense to not use the dropdown but instead have a textbox to specify the path to the reference speaker.
I don't think that's supported by the model, but you could fine-tune it or clone a voice with emotions. (I am not the author of the model itself, just of the web demo)
Hi,
You can upload a WAV file to the voices folder. Then, in the app.py file, add the filename of the voice (without .wav) to the voicelist list. It should show up in the Gradio demo.
Hi,
I added:
import nltk
nltk.download('punkt_tab')
and it seems to resolve the issue for me. Have you changed any code from the original Space?
Thanks!
Hi,
Sorry about the issues! Please try adding:
nltk.download('punkt_tab')
below the nltk.download() line โ let me know if it works!
Moonshine is a fast, efficient, & accurate ASR model released by Useful Sensors. It's designed for on-device inference and licensed under the MIT license!
HF Space (unofficial demo): mrfakename/Moonshine
GitHub repo for Moonshine: https://github.com/usefulsensors/moonshine
Training itself would be pretty easy, but the main issue would be data. AFAIK there's not much data out there for other TTS models. I synthetically generated the StyleTTS 2 dataset as it's quite efficient but other models would require much more compute.
It is an LLM controlled Rogue-Like in which the LLM gets a markdown representation of the map, and should generate a JSON with the objective to fulfill on the map as well as the necessary objects and their placements.
Come test it on the space :
Jofthomas/Everchanging-Quest
I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.
I wanted to see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.
The paper with all the details is available here: https://arxiv.org/abs/2407.12707
Congratulations!