Ai-general
updated
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper
•
2512.02472
•
Published
•
51
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
•
2509.25454
•
Published
•
141
Video Reasoning without Training
Paper
•
2510.17045
•
Published
•
7
Agent Learning via Early Experience
Paper
•
2510.08558
•
Published
•
270
RLP: Reinforcement as a Pretraining Objective
Paper
•
2510.01265
•
Published
•
40
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
•
2510.00938
•
Published
•
58
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Paper
•
2511.03628
•
Published
•
12
PromptBridge: Cross-Model Prompt Transfer for Large Language Models
Paper
•
2512.01420
•
Published
•
9
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
•
2510.09577
•
Published
•
7
Diversity Has Always Been There in Your Visual Autoregressive Models
Paper
•
2511.17074
•
Published
•
7
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Paper
•
2511.13254
•
Published
•
136
Search Self-play: Pushing the Frontier of Agent Capability without
Supervision
Paper
•
2510.18821
•
Published
•
17
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
•
2510.03259
•
Published
•
57
Every Attention Matters: An Efficient Hybrid Architecture for
Long-Context Reasoning
Paper
•
2510.19338
•
Published
•
114
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper
•
2511.16043
•
Published
•
108
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
•
2510.03561
•
Published
•
24
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
through Multi-Turn Reinforcement Learning
Paper
•
2509.08755
•
Published
•
56
gpt-oss-120b & gpt-oss-20b Model Card
Paper
•
2508.10925
•
Published
•
12
Paper
•
2412.16720
•
Published
•
36
Self-Improving VLM Judges Without Human Annotations
Paper
•
2512.05145
•
Published
•
18
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
Paper
•
2511.09067
•
Published
•
2
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
Paper
•
2510.23038
•
Published
•
1
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Paper
•
2511.06805
•
Published
•
12
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
Paper
•
2511.15958
•
Published
•
1
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
Paper
•
2511.19899
•
Published
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
•
2512.05150
•
Published
•
74
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
Paper
•
2512.03000
•
Published
•
36
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
Paper
•
2512.04926
•
Published
•
41
Voxify3D: Pixel Art Meets Volumetric Rendering
Paper
•
2512.07834
•
Published
•
43
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Paper
•
2512.07461
•
Published
•
74
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper
•
2512.13586
•
Published
•
87
RePo: Language Models with Context Re-Positioning
Paper
•
2512.14391
•
Published
•
8
Universal Reasoning Model
Paper
•
2512.14693
•
Published
•
40
MMGR: Multi-Modal Generative Reasoning
Paper
•
2512.14691
•
Published
•
114
Next-Embedding Prediction Makes Strong Vision Learners
Paper
•
2512.16922
•
Published
•
82
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper
•
2512.17351
•
Published
•
24
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Paper
•
2512.14052
•
Published
•
39
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper
•
2512.19535
•
Published
•
10
SemanticGen: Video Generation in Semantic Space
Paper
•
2512.20619
•
Published
•
88
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
•
2512.20618
•
Published
•
52
Multi-hop Reasoning via Early Knowledge Alignment
Paper
•
2512.20144
•
Published
•
6
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper
•
2512.21004
•
Published
•
12
TimeBill: Time-Budgeted Inference for Large Language Models
Paper
•
2512.21859
•
Published
•
22
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
Paper
•
2512.22322
•
Published
•
36
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
•
2512.24618
•
Published
•
103