SWVRR2

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

yilunzhao updated a dataset 11 minutes ago

SWVR2/video_0061

yilunzhao published a dataset about 1 hour ago

SWVR2/video_0061

yilunzhao updated a dataset about 6 hours ago

SWVR2/video_0057

View all activity

yilunzhao

updated a dataset 11 minutes ago

SWVR2/video_0061

Updated 1 minute ago

yilunzhao

published a dataset about 1 hour ago

SWVR2/video_0061

Updated 1 minute ago

yilunzhao

updated a dataset about 6 hours ago

SWVR2/video_0057

Updated about 1 hour ago

yilunzhao

published a dataset about 7 hours ago

SWVR2/video_0057

Updated about 1 hour ago

yilunzhao

updated 2 datasets about 12 hours ago

SWVR2/video_0062

Updated about 2 hours ago • 110

SWVR2/video_0058

Updated about 8 hours ago • 378

yilunzhao

updated a dataset about 13 hours ago

SWVR2/video_0039

Updated about 13 hours ago • 63

yilunzhao

published a dataset about 14 hours ago

SWVR2/video_0039

Updated about 13 hours ago • 63

yilunzhao

updated a dataset about 16 hours ago

SWVR2/video_0049

Updated about 15 hours ago • 48

yilunzhao

published a dataset about 16 hours ago

SWVR2/video_0049

Updated about 15 hours ago • 48

yilunzhao

authored 10 papers about 17 hours ago

AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

Paper • 2507.13300 • Published Jul 17, 2025 • 20

PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles

Paper • 2510.06475 • Published Oct 7, 2025 • 2

MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

Paper • 2508.20867 • Published Aug 28, 2025

FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering

Paper • 2510.06426 • Published Oct 7, 2025 • 3

SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

Paper • 2506.04583 • Published Jun 5, 2025

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Paper • 2411.05764 • Published Nov 8, 2024

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval

Paper • 2510.09510 • Published Oct 10, 2025 • 8

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Paper • 2510.15232 • Published Oct 17, 2025 • 6

LimRank: Less is More for Reasoning-Intensive Information Reranking

Paper • 2510.23544 • Published Oct 27, 2025 • 9

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

AI & ML interests

Recent Activity

Team members 3

SWVR2's activity