kaixinli@nus:~$ ./init_portfolio
_ __ _ _ _ ___ | |/ / __ _ (_) __ __ (_) _ __ | | |_ _| | ' / / _` | | | \ \/ / | | | '_ \ | | | | | . \ | (_| | | | > < | | | | | | | |___ | | |_|\_\ \__,_| |_| /_/\_\ |_| |_| |_| |_____| |___|
kaixinli@nus:~$ Building powerful models and datasets...

About me

I am Kaixin LI, a PhD student at National University of Singapore, under supervision of Prof. HUANG Zhiyong. I am deeply interested in autonomous agents, code generation and multimodality.

I am now working as a research intern at Qwen AI logo Qwen supervised by Binyuan Hui, building Qwen-Coder and Qwen-VL.

Prior to this, I am lucky to have worked with Prof. Tat-Seng Chua, Prof. Michael Qizhe Shieh and Prof. Junxian He.

Computer Use Agents GUI Grounding (Multimodal) Code Generation Front-end Agents

News

  • Sep 2025: Released Qwen3-VL in which I am responsible for its coding capabilities.
  • Sep 2025: One paper SE-GUI accepted to NeurIPS 2025.
  • Aug 2025: Two papers accepted to EMNLP 2025.
  • Jul 2025: One paper ScreenSpot-Pro accepted to ACMMM 2025 as oral presentation.
  • May 2025: Three papers accepted to ACL 2025.

SELECTED PUBLICATIONS

AGENTS & GUI INTERACTION

ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, Tat-Seng Chua
ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
GitHub Paper πŸ€— Hugging Face Used by Qwen2.5-VL, Qwen3-VL, Microsoft Omniparser, Seed-VL 30,000+ downloads
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan, Jian Zhang, Kaixin Li, Zhuoxuan Cai, Jie Chen, Lujian Yao, Enguang Wang, Qibin Hou, Jinwei Chen, Peng-Tao Jiang, Bo Li
NeurIPS 2025
Grounding Computer Use Agents on Human Demonstrations
Aarash Feizi, Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin, Kaixin Li, Rabiul Awal, Xing Han LΓΉ, Johan Obando-Ceron, Juan A. Rodriguez, Nicolas Chapados, David Vazquez, Adriana Romero-Soriano, Reihaneh Rabbany, Perouz Taslakian, Christopher Pal, Spandana Gella, Sai Rajeswara
arXiv
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
Xiaoxue Ren*, Penghao Jiang*, Kaixin Li*, Zhiyong Huang, Xiaoning Du, Jiaojiao Jiang, Zhenchang Xing, Jiamou Sun, Terry Yue Zhuo
arXiv
Robi Butler: Remote Multimodal Interactions with Household Robot Assistant
Anxing Xiao, Nuwan Janaka, Tiancheng Hu, Anshul Gupta, Kaixin Li, Chen Yu, David Hsu
ICRA 2025

CODING & CODE INTELLIGENCE

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems
Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, Jing Ma
EMNLP 2024 Findings; ICLR 2025 Workshop on Reasoning and Planning for Large Language Models
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
Terry Yue Zhuo, Xiaolong Jin, Hange Liu, Juyong Jiang, Tianyang Liu, Chen GONG, Bhupesh Bishnoi, Vaisakhi Mishra, Marek Suppa, Noah Ziems, Saiteja Utpala, Ming Xu, Guangyu Song, Kaixin Li, Yuhan Cao, Bo Liu, Zheng Liu, Sabina Abdurakhmanova, Wenhao Yu, Mengzhao Jia, Jihan Yao, Kenneth Hamilton, Kumar Shridhar, Vu Minh Chien, Dingmin Wang, Jiawei Liu, Zijian Wang, Qian Liu, Binyuan Hui, Meg Risdal, Ahsen Khaliq, Atin Sood, Zhenchang Xing, Wasi Uddin Ahmad, John C. Grundy, David Lo, Banghua Zhu, Xiaoning Du, Torsten Scholak, Leandro Von Werra
arXiv
Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models
Ziyang Luo, Kaixin Li, Hongzhan Lin, Yuchen Tian, Mohan Kankanhalli, Jing Ma
ACL 2025
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
Yutong Wang, Pengliang Ji, Chaoqun Yang, Kaixin Li, Ming Hu, Jiaoyang Li, Guillaume Sartoretti
arXiv preprint arXiv:2502.12468
InstructCoder: Empowering Language Models for Code Editing
Kaixin Li*, Qisheng Hu*, Xu Zhao, Yuxi Xie, Tiedong Liu, Hui Chen, Qizhe Xie*, Junxian He*
ACL 2024 SRW; EMNLP 2023 Reject with Score 4 4 4 / 5

SERVICES

Peer Review: Code intelligence, multimodal understanding/generation, GUI agents.

ICLR 2024, 2025 CVPR 2025 ACL 2025 EMNLP 2025 AAAI 2025 ACMMM 2025

OPEN SOURCE

TACO-verified
Verified code contest problems and solutions. 5,000+ downloads; widely used by the community and by DeepCoder
IconStack-48M
The largest icon dataset with 48 million images, SVG, captions and rich metadata.