Massimo Roberto Scamarcia PRO
mrs83
AI & ML interests
Natural Language Processing, Text Generation, Question Answering, Data Augmentation, Knowledge Transfer, Chain-of-Thought, ResearchOps, MLOps
Recent Activity
updated
a model
about 4 hours ago
ethicalabs/Kurtis-EON1
replied to
their
post
about 4 hours ago
In 2017, my RNNs were babbling. Today, they are hallucinating beautifully.
10 years ago, getting an LSTM to output coherent English was a struggle.
10 years later, after a "cure" based on FineWeb-EDU and a custom synthetic mix for causal conversation, the results are fascinating.
We trained this on ~10B tokens on a single AMD GPU (ROCm). It is not a Transformer: Echo-DSRN (400M) is a novel recurrent architecture inspired by Hymba, RWKV, and xLSTM, designed to challenge the "Attention is All You Need" monopoly on the Edge.
The ambitious goal is to build a small instruct model with RAG and tool usage capabilities (https://huggingface.co/ethicalabs/Kurtis-EON1)
📊 The Benchmarks (Size: 400M)
For a model this size (trained on <10B tokens), the specialized performance is surprising:
*SciQ*: 73.8% 🦄 (This rivals billion-parameter models in pure fact retrieval).
*PIQA*: 62.3% (Solid physical intuition for a sub-1B model).
The Reality Check:
HellaSwag (29.3%) and Winogrande (50.2%) show the limits of 400M parameters and 10B tokens training.
We are hitting the "Reasoning Wall" which confirms we need to scale to (hopefully) unlock deeper common sense. As you can see in the visualization (to be released soon on HF), the FineWeb-EDU bias is strong. The model is convinced it is in a classroom ("In this course, we explore...").
The Instruct Model is not ready yet and we are currently using curriculum learning to test model plasticity.
Source code and weights will not be released yet. This is not a fork or a fine-tune: the base model is built in-house at https://www.ethicalabs.ai/, with novel components that do not exist in current open libraries.
🤝 Call for Collaboration: I am looking for Peer Reviewers interested in recurrent/hybrid architectures. If you want to explore what lies beyond Transformers, let’s connect!
Training diary: https://huggingface.co/ethicalabs/Kurtis-EON1
replied to
their
post
about 4 hours ago
Hello HF community, I'm happy to share a project I've been working on that combines mlx-lm with Flower, to enable federated fine-tuning of SLMs (Small Language Models) on MacOS devices
GitHub Repo: https://github.com/ethicalabs-ai/BlossomTuneLLM-MLX
By combining mlx-lm with a federated learning framework like Flower (https://flower.ai/), we can leverage the hardware people already own and reduce the reliance on expensive GPUs, enabling collaborative model training.
This project is the MLX-native evolution of an earlier codebase for FlowerTune LLM:
https://arxiv.org/abs/2506.02961
https://flower.ai/blog/2024-10-16-flowertune-llm-leaderboard
https://github.com/ethicalabs-ai/BlossomTuneLLM
How it works:
Flower handles all the federated learning logic.
A central server (superlink) coordinates the training rounds, client selection, and parameter aggregation.
Each participant in the network runs a Flower client (supernode) on their Mac. In each round, the client:
- Receives the global LoRA/DoRA adapter weights from the server.
- Loads its local data partition.
- It makes use of the mlx-lm programmatic API (mlx_lm.tuner.train) to perform LoRA/DoRA fine-tuning.
- Sends only the updated adapter weights back to the server.
The server only ever sees the aggregated model updates and private data never leaves the device.
Flower made it easy to run a full simulation (with a centralized HF dataset, partitioned using flower-datasets) on a single machine or multiple machines, to test the whole process in action and experiment further.
All you need is a single or multiple Mac machines with Apple Silicon
