fine-tuning a 14B model with TRL + SFT on a free Colab (T4 GPU)? thanks to the latest TRL optimizations, you actually can! sharing a new notebook showing how to do it ๐
If your Space stops working after restarting mainly for the last 5 days (https://discuss.huggingface.co/t/my-space-suddenly-went-offline-the-cpu-cannot-restart/151121/22), try some of following. 1. Add pydantic==2.10.6 to requirements.txt or upgrade Gradio to the latest version. 2. Upgrade PyTorch to 2.2.0 or later (torch>=2.2.0 for Zero GPU space). 3. Fix Transformers to 4.49.0 or earlier (transformers<=4.49.0for spaces using Transformers or Diffusers). 4. Fix huggingface_hub to the old version (huggingface_hub==0.25.2 for if an error like cached_download is not available occurs or inference does not work properly) 5. Specifying WORKDIR in Dockerfile may cause the application to fail to start with error 137. (Docker Spaces, https://discuss.huggingface.co/t/error-code-137-cache-error/152177)
Edit: Zero GPU space has been upgraded from A100 to H200. This is likely the reason why older versions of PyTorch are no longer supported. In fact, an error message to that effect was displayed. zero-gpu-explorers/README#163
Multilingual Tokenization Showdown Analyzing 12 LLM Tokenizers Across 204 Languages.
First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages: Norod78/WikiCat-Multilingual
For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).
I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo: https://github.com/Norod/wikicat-tokenizer-eval
Introducing AWQ and GPTQ quantized versions of SmolVLM from Hugging Face!
These models only had their text models quantized, and had a 50% model size reduction (4GB~2GB) while keeping model degradation under 1% on the DocVQA benchmark.
Finally, I uploaded the model I developed for my masterโs thesis! Given a financial event, it provides explained predictions based on a dataset of past news and central bank speeches. Try it out here: SelmaNajih001/StockPredictionExplanation (Just restart the space and wait a minute)
๐ Technical Implementation: (Runnable with Copy & Paste at the MLange link!)
๐ Device Compatibility Matrix: Tested on 50+ devices including Samsung Galaxy series, Google Pixel lineup, and Xiaomi devices, iPhones and iPads. Consistent sub-5ms performance across the board!
๐ Applications Unlocked: - Real-time AR/VR face tracking - Privacy-preserving edge authentication - Live video processing pipelines - Mobile security applications - Interactive camera filters
The democratization of high-performance computer vision on mobile devices is happening NOW! This study proves that complex CV models can run efficiently on consumer hardware without compromising accuracy. Want to reproduce these results? Check out the benchmark methodology and implementation guide!
reacted to omarkamali's
post with ๐2 months ago
PawMatchAI โ Now with SBERT-Powered Recommendations! ๐ถโจ
โญ๏ธ NEW: Description-based recommendations are here! Just type in your lifestyle or preferences (e.g. โI live in an apartment and want a quiet dogโ), and PawMatchAI uses SBERT semantic embeddings to understand your needs and suggest compatible breeds.
What can PawMatchAI do today? ๐ธ Upload a photo to identify your dog from 124 breeds with detailed info. โ๏ธ Compare two breeds side-by-side, from grooming needs to health insights. ๐ Visualize breed traits with radar and comparison charts. ๐จ Try Style Transfer to turn your dogโs photo into anime, watercolor, cyberpunk, and more.
Whatโs next? ๐ฏ More fine-tuned recommendations. ๐ฑ Mobile-friendly deployment. ๐พ Expansion to additional species.
My goal: To make breed discovery not only accurate but also interactive and fun โ combining computer vision, semantic understanding, and creativity to help people find their perfect companion.
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! ๐คฏ Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐
How does it work? ๐ค 1๏ธโฃ Generate and cache image features for each frame 2๏ธโฃ Create a list of embeddings for selected patch(es) 3๏ธโฃ Compute cosine similarity between each patch and the selected patch(es) 4๏ธโฃ Highlight those whose score is above some threshold
... et voilร ! ๐ฅณ
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
This time, I have mapped and contributed to https://www.openstreetmap.org more than 100 swimming pools around my wife's hometown. Only took about 20min to find them all (+~3 min verification) in a free Colab GPU๐
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.
Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)
Has anyone tried parallel QLoRa and merge before?
I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".
I want to release some cool stuff when I have the time: - how an answer to a single question changes over time, with each training round or day - a chart to show AHA alignment over training rounds
3 replies
ยท
reacted to multimodalart's
post with ๐10 months ago
I've made yet another merge of reasoning models with incremental gains on the current Open LLM leaderboard. open-llm-leaderboard/open_llm_leaderboard
Merging in DeepSeek R1 distillation to Llama 3.1 8B (at 10% task arithmetic weight, using the Llama 3.1 8B base model as the case rather than the instruct model) with a prior best merge resulted in a slightly lower IFEval, but a higher result in every other benchmark save for MMLU-PRO, which went down only marginally. MATH Lvl5 and GPQA went up palpably. grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B
This result is currently my best Llama 3.1 8B merge result to date. The actual R1 distillation itself scored quite badly, so this would seem to be another case of unexpected formatting (reflected in IFEval) hurting the evaluation results, obscuring the strength of a model.
It is also possible to use the text generation feature of this model to generate roleplay completions. Based on informal testing, this model's bias toward problem-solving will subtly impact narration.
A collection of 39,280 video clips metadata from GoodGame.ru streaming platform featuring:
- Complete clip information including direct video URLs and thumbnails - Streamer details like usernames and avatars - Engagement metrics such as view counts - Game categories and content classifications - Released under Creative Commons Zero (CC0) license
This extensive clips collection provides a valuable resource for developing and evaluating video-based AI applications, especially in Russian gaming and streaming contexts.