Eden Yavin
EdenYav
AI & ML interests
Reinforcement learning, online learning, cybersecurity, large language models
Organizations
None yet
Reasoning
-
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper • 2508.05004 • Published • 130 -
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Paper • 2508.04017 • Published • 11 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36
LLM Evaluation
-
Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
Paper • 2508.03644 • Published • 25 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 141 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 63
LLM in Cybersecurity
VisionLM
LLM Evaluation
-
Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
Paper • 2508.03644 • Published • 25 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 141 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 63
Reasoning
-
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper • 2508.05004 • Published • 130 -
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Paper • 2508.04017 • Published • 11 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36
LLM in Cybersecurity