About me
I am Kaixin LI, a PhD student at National University of Singapore, under supervision of Prof. HUANG Zhiyong. I am deeply interested in autonomous agents, code generation and multimodality.
I am now working as a research intern at Qwen supervised by Binyuan Hui, building Qwen-Coder and Qwen-VL.
Prior to this, I am lucky to have worked with Prof. Tat-Seng Chua, Prof. Michael Qizhe Shieh and Prof. Junxian He.
Computer Use Agents
GUI Grounding
(Multimodal) Code Generation
Front-end Agents
News
- Sep 2025: Released Qwen3-VL in which I am responsible for its coding capabilities.
- Sep 2025: One paper SE-GUI accepted to NeurIPS 2025.
- Aug 2025: Two papers accepted to EMNLP 2025.
- Jul 2025: One paper ScreenSpot-Pro accepted to ACMMM 2025 as oral presentation.
- May 2025: Three papers accepted to ACL 2025.
SELECTED PUBLICATIONS
AGENTS & GUI INTERACTION
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
GitHub
Paper
π€ Hugging Face
Used by Qwen2.5-VL, Qwen3-VL, Microsoft Omniparser, Seed-VL
30,000+ downloads
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
NeurIPS 2025
Robi Butler: Remote Multimodal Interactions with Household Robot Assistant
ICRA 2025
CODING & CODE INTELLIGENCE
MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems
EMNLP 2024 Findings; ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
arXiv
Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models
ACL 2025
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
arXiv preprint arXiv:2502.12468
InstructCoder: Empowering Language Models for Code Editing
ACL 2024 SRW; EMNLP 2023 Reject with Score 4 4 4 / 5
SERVICES
Peer Review: Code intelligence, multimodal understanding/generation, GUI agents.
ICLR 2024, 2025
CVPR 2025
ACL 2025
EMNLP 2025
AAAI 2025
ACMMM 2025
OPEN SOURCE
TACO-verified
Verified code contest problems and solutions. 5,000+ downloads; widely used by the community and by DeepCoder
IconStack-48M
The largest icon dataset with 48 million images, SVG, captions and rich metadata.