Ge Li

About

I am now a full professor with tenure in the School of Computer Science in Peking University. I obtained my Ph.D. from Peking University in 2006. I had been a visiting associate professor at Artificial Intelligence Laboratory of Stanford University in 2013-2014. My current research mainly concerns applications of probabilistic methods for machine learning, including Program Language Processing, Natural Language Processing, and Software Engineering.

Preprints

[arXiv 2025] Hao Zhu, Jia Li, Cuiyun Gao, Jiaru Qian, Yihong Dong, Huanyu Liu, Lecheng Wang, Ziliang Wang, Xiaolong Hu, Ge Li; Specification-Guided Vulnerability Detection with Large Language Models; arXiv preprint, arXiv:2511.04014, 2025. (Cited by 2)
[arXiv 2025] Xue Jiang, Yihong Dong, Mengyang Liu, Hongyi Deng, Tian Wang, Yongding Tao, Rongyu Cao, Binhua Li, Zhi Jin, Wenpin Jiao, Fei Huang, Yongbin Li, Ge Li; CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment; arXiv preprint, arXiv:2510.18471, 2025. (Cited by 1)
[arXiv 2025] Yihong Dong, Zhaoyu Ma, Xue Jiang, Zhiyuan Fan, Jiaru Qian, Yongmin Li, Jianha Xiao, Zhi Jin, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li, Ge Li; Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model; arXiv preprint, arXiv:2510.18165, 2025. (Cited by 3)
[arXiv 2025] Yongding Tao, Tian Wang, Yihong Dong, Huanyu Liu, Kechi Zhang, Xiaolong Hu, Ge Li; Detecting data contamination from reinforcement learning post-training for large language models, arXiv preprint, arXiv:2510.09259, 2025. (Cited by 3)
[arXiv 2025] Ziliang Wang, Ge Li, Jia Li, Hao Zhu, Zhi Jin; VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection, arXiv preprint; arXiv:2509.11523, 2025. (Cited by 1)
[arXiv 2025] Huanyu Liu, Jia Li, Chang Yu, Taozhi Chen, Yihong Dong, Lecheng Wang, XiaoLong Hu, Ge Li; Evocot: Overcoming the exploration bottleneck in reinforcement learning; arXiv preprint, arXiv:2508.07809, 2025. (Cited by 2)
[arXiv 2025] Yihong Dong, Xue Jiang, Jiaru Qian, Tian Wang, Kechi Zhang, Zhi Jin, Ge Li; A Survey on Code Generation with LLM-based Agents; arXiv preprint, arXiv:2508.00083, 2025. (Cited by 33)
[arXiv 2025] Huanyu Liu, Jia Li, Chang Yu, Taozhi Chen, Yihong Dong, Lecheng Wang, Hu XiaoLong, Ge Li; EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning; arXiv preprint, arXiv:2508.07809, 2025. (Cited by 2)
[arXiv 2025] Yihong Dong, Xue Jiang, Yongding Tao, Huanyu Liu, Kechi Zhang, Lili Mou, Rongyu Cao, Yingwei Ma, Jue Chen, Binhua Li, Zhi Jin, Fei Huang, Yongbin Li, Ge Li; RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization; arXiv preprint, arXiv:2508.00222, 2025. (Cited by 4)
[arXiv 2025] Jia Li, Xuyuan Guo, Lei Li, Kechi Zhang, Ge Li, Zhengwei Tao, Fang Liu, Chongyang Tao, Yuqi Zhu, Zhi Jin; LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding; arXiv preprint, arXiv:2503.04359, 2025. (Cited by 7)
[arXiv 2025] Xue Jiang, Yihong Dong, Zheng Fang, Yingwei Ma, Tangxinyu Wang, Rongyu Cao, Binhua Li, Zhi Jin, Wenpin Jiao, Yongbin Li, Ge Li; Large Language Model Unlearning for Source Code; arXiv preprint, arXiv:2506.17125, 2025.
[arXiv 2025] Kechi Zhang, Ge Li, Jia Li, Huangzhao Zhang, Jingjing Xu, Hao Zhu, Lecheng Wang, Yihong Dong, Jing Mai, Bin Gu, Zhi Jin; Computational Thinking Reasoning in Large Language Models; arXiv preprint, arXiv: 2506.02658, 2025. (Cited by 2)
[arXiv 2025] Yuqi Zhu, Ge Li, Xue Jiang, Jia Li, Hong Mei, Zhi Jin, Yihong Dong; Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs; arXiv preprint, arXiv: 2503.15341, 2025. (Cited by 21)
[arXiv 2025] Jia Li, Xianjie Shi, Kechi Zhang, Lei Li, Ge Li, Zhengwei Tao, Fang Liu, Chongyang Tao, Zhi Jin; CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation; arXiv preprint, arXiv: 2504.10046, 2025. (Cited by 9)
[arXiv 2024] Lecheng Wang, Xianjie Shi, Ge Li, Jia Li, Yihong Dong, Xuanming Zhang, Wenpin Jiao, Hong Mei; Why Language Models Collapse When Trained on Recursively Generated Text; arXiv preprint, arXiv: 2412.14872, 2024.
[arXiv 2024] Jia Li, Ge Li, Lecheng Wang, Hao Zhu, Zhi Jin; Generating Equivalent Representations of Code By A Self-Reflection Approach; arXiv preprint, arXiv: 2410.03351, 2024. (Cited by 4)
[arXiv 2024] Xue Jiang, Yihong Dong, Zhi Jin, Ge Li; SEED: Customize Large Language Models with Sample-Efficient Adaptation for Code Generation; arXiv preprint, arXiv: 2403.00046, 2024. (Cited by 6)
[arXiv 2023] Zhang, Kechi, Ge Li, Jia Li, Zhuo Li, Zhi Jin; ToolCoder: Teach Code Generation Models to Use API Search Tool; arXiv preprint, arXiv: 2305.04032, 2023. (Cited by 75)
[arXiv 2023] Zejun Wang, Jia Li, Ge Li, Zhi Jin; ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation; arXiv preprint, arXiv: 2311.00272, 2023.
[arXiv 2022] Zhang, Kechi, Ge Li, Zhi Jin; What Does Transformer Learn About Source Code?; arXiv preprint, arXiv: 2207.08466, 2022. (Cited by 11)

Selected Publications

BibTex BibSytle

[ICSE 2026]Yongmin Li, Jia Li, Ge Li, Zhi Jin; AdapTrack: Constrained Decoding without Distorting LLM's Output Intent; Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE 2026), Rio de Janeiro, Brazil, Apr. 12 - 18, 2026.(Accepted)
[SCIS, 2026] Fang Liu, Ge Li, Qianhui Zhao, Li Zhang; Learning to Represent Code Semantics, Vol. 68, Iss. 7, July 2025, pp. 172101:1–172101:21. [PDF]
[ICSE 2026] Kechi Zhang, Huangzhao Zhang, Ge Li, Jinliang You, Jia Li, Yunfei Zhao, Zhi Jin; SEAlign: Alignment Training for Software Engineering Agent; Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE 2026), Rio de Janeiro, Brazil, Apr. 12 - 18, 2026. (Accepted) (Cited by 5)
[TOSEM, 2025] Xue Jiang, Yihong Dong, Zhiyuan Fan, Zhi Jin, Wenpin Jiao, Ge Li; Exploring Data-Efficient Adaptation of Large Language Models for Code Generation; ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.(Accepted)[PDF]
[TOSEM, 2025] Ziliang Wang, Ge Li, Jia Li, Jia Li, Meng Yan, Yingfei Xiong, Zhi Jin; M2CVD: Enhancing Vulnerability Understanding through Multi-Model Collaboration for Code Vulnerability Detection; ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.(Accepted)[PDF] (Cited by 2)
[NeurIPS 2025] Kechi Zhang, Ge Li, Jia Li, Huangzhao Zhang, Yihong Dong, Jingjing Xu, Zhi Jin; StackTrans: From Large Language Model to Large Pushdown Automata Model; Proceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 20245), San Diego, USA, Dec. 2 - 7, 2025. arXiv:2507.15343
[NeurIPS 2025] Yihong Dong, Ge Li, Xue Jiang, Yongding Tao, Kechi Zhang, Hao Zhu, Huanyu Liu, Jiazheng Ding, Jia Li, Jinliang Deng, Hong Mei; FANformer: Improving Large Language Models Through Effective Periodicity Modeling, Proceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 20245), San Diego, USA, Dec. 2 - 7, 2025. arXiv: 2502.21309
[NeurIPS 2025] Yihong Dong, Ge Li, Yongding Tao, Xue Jiang, Kechi Zhang, Jia Li, Jing Su, Jun Zhang, Jingjing Xu; FAN: Fourier Analysis Networks; Proceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 20245), San Diego, USA, Dec. 2 - 7, 2025. arXiv: 2410.02675 (Cited by 40)
[NeurIPS 2025] Huanyu Liu, Jia Li, Hao Zhu, Kechi Zhang, Yihong Dong, Ge Li; SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning; Proceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 20245), San Diego, USA, Dec. 2 - 7, 2025. arXiv: 2505.16368 (Cited by 4)
[ASE 2025] Jia Li, Hao Zhu, Huanyu Liu, Xianjie Shi, He Zong, Yihong Dong, Kechi Zhang, Siyuan Jiang, Zhi Jin, Ge Li; aiXcoder-7B-v2: Training LLMs to Fully Utilize the Long Context in Repository-level Code Completion; Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025), Seoul, South Korea, Nov. 16 - 20, 2025. arXiv: 2503.15301 (Cited by 12)
[ACL 2025] Kechi Zhang, Ge Li, Yihong Dong, Jingjing Xu, Jun Zhang, Jing Su, Yongfei Liu, Zhi Jin; CodeDPO: Aligning Code Models with Self Generated and Verified Source Code; Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, Jul. 27 - Aug. 1, 2025. (Cited by 35)
[ACL 2025] Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li; Rethinking Repetition Problems in Code Generation; Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, Jul. 27 - Aug. 1, 2025. (Cited by 6)
[ACL 2025] Kechi Zhang, Ge Li, Jia Li, Yihong Dong, Jia Li, Zhi Jin; Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points; Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria (ACL 2025), Jul. 27 - Aug. 1, 2025. (Cited by 13)
[ACL 2025] Kaibo Liu, Zhenpeng Chen, Yiyang Liu, Jie Zhang, Mark Harman, Yudong Han, Yun Ma, Yihong Dong, Ge Li, Gang Huang; LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs; Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria (ACL 2025), Jul. 27 - Aug. 1, 2025. (Cited by 8)
[ACL 2025] Jia Li, Xuyuan Guo, Lei Li, Kechi Zhang, Ge Li, Jia Li, Zhengwei Tao, Fang Liu, Chongyang Tao, Yuqi Zhu, Zhi Jin; Benchmarking Long-Context Language Models on Long Code Understanding; Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, Jul. 27 - Aug. 1, 2025. (Cited by 1)
[TOSEM, 2025] Jia Li, Chongyang Tao, Jia Li♂, Ge Li, Zhi Jin, Huangzhao Zhang, Zheng Fang, Fang Liu; Large language Model-aware in-Context Learning for Code Generation; ACM Transactions on Software Engineering and Methodology; Vol. 34, Iss. 7, Aug. 14, 2025, pp.1-33. [PDF] (Cited by 66)
[ICSE 2025] Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Ge Li; aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion; Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), Ottawa, Ontario, Canada, Apr. 27 - May 3, 2025. (Cited by 11)
[ICSE 2025] Xue Jiang, Yihong Dong, Yongding Tao, Huanyu Liu, Zhi Jin, Ge Li; ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation; Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), Ottawa, Ontario, Canada, Apr. 27 - May 3, 2025. (Cited by 17)
[EMSE, 2025] Kechi Zhang, Jia Li, Zhuo Li, Zhi Jin, Ge Li; Transformer-based Code Model with Compressed Hierarchy Representation; Empirical Software Engineering (EMSE), Vol. 30, Iss. 2, Mar. 2025, pp.1-43. [PDF] (Cited by 3)
[EMSE, 2025] Jia Li, Zheng Fang, Xianjie Shi, Zhi Jin, Fang Liu, Yunfei Zhao, Ge Li; SCodeSearcher: Soft Contrastive Learning for Code Search; Empirical Software Engineering, Vol. 30, Iss. 3, May 2025, pp. 1-23. [PDF] (Cited by 2)
[TOSEM, 2025] Yuwei Zhang, Zhi Jin, Ying Xing, Ge Li, Fang Liu, Jiaxin Zhu, Wensheng Dou, Jun Wei; PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing; ACM Transactions on Software Engineering and Methodology (TOSEM), 2025. [PDF] (Cited by 11)
[TOSEM, 2025] Jia Li, Ge Li, Yongmin Li, Jin Zhi; Structured Chain-of-Thought Prompting for Code Generation; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 34, Iss. 2, Jan. 2025, pp.1-23. [PDF] (Cited by 299)
[TOSEM, 2024] Yihong Dong, Jiazheng Ding, Xue Jiang, Ge Li, Zhuo Li, Zhi Jin; Evaluating Code Generation by Learning Code Execution; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 33, No. 7, Article 182, Sep. 2024. [PDF] (Cited by 118)
[ASE 2024] Zejun Wang, Kaibo Liu, Ge Li, Zhi Jin; SlicePromptTest4J: High-coverage Test Generation using LLM via Method Slicing; Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024), Sacramento, California, United States, Oct. 27 - Nov. 1, 2024, pp.1258 - 1268. [PDF]
[NeurIPS 2024] Jia Li, Ge Li, Xuanming Zhang, Yunfei Zhao, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li; EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations; Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024); Vancouver, Canada, Dec. 10-15, 2024. [PDF] (Cited by 39)
[TOSEM, 2024] Jia Li, Yunfei Zhao, Yongmin Li, Ge Li, Zhi Jin; AceCoder: An Effective Prompting Technique Specialized in Code Generation; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 33, No. 8, Article 204, Nov. 2024. [PDF] (Cited by 58)
[TOSEM, 2024] Yihong Dong, Xue Jiang, Zhi Jin, Ge Li; Self-collaboration Code Generation via ChatGPT; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 33, No. 7, Article 189, Sep. 2024. [PDF] (Cited by 463)
[TOSEM, 2024] Xue Jiang, Yihong Dong, Lecheng Wang, Zheng Fang, Qiwei Shang, Ge Li, Zhi Jin, Wenpin Jiao; Self-Planning Code Generation with Large Language Model; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 33, No. 7, Article 182, Sep. 2024. [PDF] (Cited by 326)
[JSEP, 2024] Huangzhao Zhang, Zhuo Li, Zhi Jin, Ge Li; WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised Learning; Journal of Software: Evolution and Process, Vol. 36, No. 9, Sep 01, 2024. doi: 10.1002/smr.2669. pp 1-23. [PDF] (Cited by 1)
[ACL 2024] Kechi Zhang, Ge Li, Huangzhao Zhang, Zhi Jin; HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position; Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand, Aug. 11-16, 2024. [PDF] (Cited by 15)
[ACL 2024] Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi Jin; CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges; Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand, Aug. 11-16, 2024. [PDF] (Cited by 228)
[ACL 2024] Yihong Dong, Xue Jiang, Huanyu Liu, Zhi Jin, Ge Li; Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models; Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand, Aug. 11-16, 2024. [PDF] (Cited by 186)
[ACL 2024] Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Zhi Jin, Hao Zhu, Huanyu Liu, Kaibo Liu, Lecheng Wang, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yihong Dong, Yuqi Zhu; DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories; Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand, Aug. 11-16, 2024. [PDF] (Cited by 54)
[ACL 2024] Yihong Dong, Kangcheng Luo, Xue Jiang, Zhi Jin, Ge Li; PACE: Improving Prompt with Actor-Critic Editing for Large Language Model; Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand, Aug. 11-16, 2024. [PDF] (Cited by 31)
[FSE 2024] Bolun Li, Zhihong Sun, Tao Huang, Hongyu Zhang, Yao Wan, Ge Li, Zhi Jin, Chen Lyu; IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion; Proceedings of the 2024 ACM International Conference on the Foundations of Software Engineering (FSE), Porto de Galinhas, Brazil, July 15-19, 2024. [PDF] (Cited by 22)
[FSE 2024] Zhen Yang, Fang Liu, Zhongxing Yu, Jacy Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, Ge Li; Exploring and Unleashing the Power of Large Language Models in Automated Code Translation; Proceedings of the 2024 ACM International Conference on the Foundations of Software Engineering (FSE), Porto de Galinhas, Brazil, July 15-19, 2024. [PDF] (Cited by 142)
[TSE, 2024] Xin-Cheng Wen, Cuiyun Gao, Feng Luo, Haoyu Wang, Ge Li, and Qing Liao; LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability Types; IEEE Transactions on Software Engineering (TSE), Vol. 50, Iss. 6, Jun. 2024, pp. 1325-1339. [PDF] (Cited by 27)
[LREC-COLING 2024] Zhihong Sun, Chen Lyu, Yao Wan, Hongyu Zhang, Ge Li, Zhi Jin; Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs; Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), Torino, Italia, May 20-25, 2024. [PDF] (Cited by 27)
[ICPC 2024] Tao Huang, Zhihong Sun, Zhi Jin, Ge Li, Chen Lyu; Knowledge-Aware Code Generation with Large Language Models; Proceedings of the 32nd ACM/IEEE International Conference on Program Comprehension (ICPC), Lisbon, Portugal, April 15-16, 2024. [PDF] (Cited by 29)
[ICSE 2024] Tao Huang, Zhihong Sun, Zhi Jin, Ge Li, Chen Lyu; KareCoder: A New Knowledge-Enriched Code Generation System; Proceedings of the 46th ACM/IEEE International Conference on Software Engineering (ICSE), Lisbon, Portugal, April 14-20, 2024.(Short) [PDF] (Cited by 4)
[JSEP, 2024] Huangzhao Zhang, Shuai Lu, Zhuo Li, Zhi Jin, Lei Ma, Yang Liu, Ge Li; Codebert-Attack: Adversarial Attack against Source Code Deep Learning Models via Pre-trained Model; Journal of Software: Evolution and Process, Vol. 36, Iss. 3, Mar., 2024 [PDF] (Cited by 21)
[SCIS, 2024] Huangzhao Zhang, Kechi Zhang, Zhuo Li, Jia Li, Jia Li, Yongmin Li, Yunfei Zhao, Yuqi Zhu, Fang Liu, Ge Li, Zhi Jin; Deep Learning for Code Generation: A Survey; Science China Information Sciences (SCIS), doi: 10.1007/s11432-023-3956-3, Feb 6, 2024. [PDF] (Cited by 23)
[TOSEM, 2024] Jia Li, Zhuo Li, Huangzhao Zhang, Ge Li, Zhi Jin, Xing Hu, Xin Xia; Poison Attack and Poison Detection on Deep Source Code Processing Models; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 33, No. 62, Mar. 14, 2024, pp 1-31. [PDF] (Cited by 41)
[AAAI 2024] Yuqi Zhu, Jia Allen Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei; Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models; Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI), Vancouver, Canada, Feb 20-27, 2024. [PDF] (Cited by 89)
[ASEJ, 2023] Zejun Wang, Fang Liu, Yiyang Hao, Zhi Jin; AdaComplete: improve DL-based code completion method’s domain adaptability; Automated Software Engineering (ASEJ), Vol. 30, No. 1, Mar 06, 2023, pp 28-39. [PDF]
[Internetware 2023] Jia Li, Fang Liu, Jia Allen Li, Yunfei Zhao, Ge Li, and Zhi Jin; Mcodesearcher: Multi-view Contrastive Learning for Code Search; Proceedings of the 14th Asia-Pacific Symposium on Internetware (Internetware), Hangzhou, China, August 4-6, 2023. [PDF] (Cited by 6)
[Internetware 2023] Yunfei Zhao, Yihong Dong, Ge Li; Seq2Seq or Seq2Tree: Generating Code Using Both Paradigms via Mutual Learning; Proceedings of the 14th Asia-Pacific Symposium on Internetware (Internetware), Hangzhou, China, August 4-6, 2023, pp 238 - 248. [PDF] (Cited by 4)
[ECAI 2023] Yihong Dong, Ge Li, Xue Jiang, Zhi Jin; Antecedent Predictions Are More Important Than You Think: An Effective Method for Tree-Based Code Generation; Proceedings of the 26th European Conference on Artificial Intelligence (ECAI), Kraków, Poland, Sept. 30 - Oct. 4, 2023. pp 565-574. [PDF] (Cited by 3)
[ASE 2023] Jia Li, Chongyang Tao, Zhi Jin, Fang Liu, Jia Allen Li, Ge Li; ZC3 Zero-Shot Cross-Language Code Clone Detection; Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Kirchberg, Luxembourg, September 11-15, 2023. [PDF] (Cited by 15)
[ACL 2023] Kechi Zhang, Zhuo Li, Jia Allen Li, Ge Li, Zhi Jin; Self-Edit: Fault-Aware Code Editor for Code Generation; Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, Canada, July 9-14, 2023. [PDF] (Cited by 171)
[TOSEM, 2023] Jia Allen Li, Ge Li, Zhuo Li, Zhi Jin, Xing Hu, Kechi Zhang, Zhiyi Fu; CodeEditor: Learning to Edit Source Code with Pre-trained Models; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 32, No. 6, May 22, 2023, pp 143-165. [PDF] (Cited by 52)
[ISSTA 2023] Yihong Dong, Ge Li, Jiazheng Ding, Zhi Jin; CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation; Proceedings of the ACM Sigsoft International Symposium on Software Testing and Analysis (ISSTA'23), Seattle, Washington, United States, July 17-21, 2023. [PDF] (Cited by 35)
[ICSE 2023] Jia Allen Li, Yongmin Li, Ge Li, Zhi Jin, Xing Hu; SkCoder: A Sketch-based Approach for Automatic Code Generation; Proceedings of the 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, May 14-20, 2023. [PDF] (Cited by 97)
[JSEP, 2023] Huangzhao Zhang, Shuai Lu, Zhi Jin, Lei Ma, Zhuo Li, Yang Liu, Ge Li; CodeBERT-Attack: Adversarial Attack against Source Code Deep Learning Models via Pre-Trained Model; Journal of Software: Evolution and Process, Vol. 36, No. 3, Apr. 11, 2023. pp 1-29. [PDF] (Cited by 21)
[ICPC 2023] Kechi Zhang, Zhou Li, Zhi Jin, Ge Li; Implant Global and Local Hierarchy Information to Sequence based Code Representation Models; Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension (ICPC), Melbourne Australia, May 15-16, 2023. (ACM SIGSOFT Distinguished Paper Award) [PDF] (Cited by 10)
[SANER 2023] Wenhan Wang, Kechi Zhang, Ge Li, Shangqing Liu, Anran Li, Zhi Jin, Yang Liu; Learning Program Representations with a Tree-Structured Transformer; Proceedings of the 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Macao SAR, China, March 21st-24th, 2023. [PDF] (Cited by 21)
[EMNLP 2022] Han Peng, Ge Li, Yunfei Zhao and Zhi Jin; Rethinking Positional Encoding in Tree Transformer for Code Representation; Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(EMNLP 2022), Abu Dhabi, December 7–11, 2022, pp 3204 - 3214. [PDF] (Cited by 26)
[NeurIPS 2022] Zhang Haojie, Ge Li, Jia Allen Li, Zhongjin Zhang, Yuqi Zhu, Zhi Jin; Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively; Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), Online, Nov. 29 - Dec.1, 2022. [PDF] (Cited by 47)
[FSE 2022] Sijie Shen, Xiang Zhu, Yihong Dong, Qizhi Guo, Yankun Zhen, Ge Li; Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation; Proceedings of The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Singapore, 14th - 16th November 2022. [PDF] (Cited by 39)
[FSE 2022] Lin Shi, Fangwen Mu, Xiao Chen, Song Wang, Junjie Wang, Ye Yang, Ge Li, Xin Xia, Qing Wang; We Building on the Rock? On the Importance of Data Preprocessing for Code Summarization; Proceedings of The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Singapore, 14th - 16th November 2022. [PDF] (Cited by 59)
[CIKM 2022] Jia Li, Yuyuan Zhao, Zhi Jin, Ge Li, Tao Shen, Zhengwei Tao, Chongyang Tao; SK2: Integrating Implicit Sentiment Knowledge and Explicit Syntax Knowledge for Aspect-Based Sentiment Analysis; Proceedings of 31st ACM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, Oct. 17-21, 2022. [PDF] (Cited by 20)
[TOSEM, 2022] Hao Yu, Xing Hu, Ge Li, Ying Li, Qianxiang Wang, Tao Xie; Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 31, No. 4, Article 62, July, 2022, pp 1–25. [PDF] (Cited by 12)
[ICSE 2022] Fang Liu, Ge Li, Zhiyi Fu, Shuai Lu, Yiyang Hao, Zhi Jin; Learning to Recommend Method Names with Global Context; Proceedings of the 44th International Conference on Software Engineering (ICSE 2022), Pittsburgh, PA, USA, May 21-29, 2022. [PDF] (Cited by 39)
[ICSE 2022] Hao Yu, Yiling Lou, Ke Sun, Dezhi Ran, Tao Xie, Dan Hao, Ying Li, Ge Li, Qianxiang Wang; Automated Assertion Generation via Information Retrieval and Its Integration with Deep Learning; Proceedings of the 44th International Conference on Software Engineering (ICSE 2022), Pittsburgh, PA, USA, May 21-29, 2022. [PDF] (Cited by 65)
[ICPC 2022] Kechi Zhang, Wenhan Wang, Huangzhao Zhang, Ge Li, Zhi Jin; Learning to Represent Programs with Heterogeneous Graphs; Proceedings of the 30th ACM/IEEE International Conference on Program Comprehension (ICPC), Pittsburgh, PA, USA, May 16-17, 2022. [PDF] (Cited by 83)
[EMSE, 2022] Fang Liu, Ge Li, Bolin Wei, Xin Xia, Zhiyi Fu, Zhi Jin; A Unified Multi-task Learning Model for AST-level and Token-level Code Completion; Empirical Software Engineering(EMSE), Vol. 27, Iss. 4, Apr. 18, 2022, pp. 1-38. [PDF] (Cited by 38)
[TOSEM, 2022] Huangzhao Zhang, Zhiyi Fu, Ge Li, Lei Ma, Zhehao Zhao, Hua’an Yang, Yizhe Sun, Yang Liu, Zhi Jin; Towards Robustness of Deep Program Processing Models—Detection, Estimation, and Enhancement; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 31, Iss. 3, Apr. 9, 2022, pp. 1-40. [PDF] (Cited by 78)
[TSE, 2022] Hui Liu, Mingzhu Shen, Jiaqi Zhu, Nan Niu , Ge Li, Lu Zhang; Deep Learning Based Program Generation From Requirements Text: Are We There Yet?; IEEE Transactions on Software Engineering (TSE), Vol. 48, Iss. 4, Apr. 1, 2022. [PDF] (Cited by 69)
[JSS, 2022] Zhehao Zhao, Bo Yang, Ge Li, Huai Liu, Zhi Jin; Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks; Journal of Systems and Software, Volume 184, February 2022. [PDF] (Cited by 33)
[NeurIPS 2021] Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu; Codexglue: A Machine Learning Benchmark Dataset for Code Understanding and Generation; Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), Online, December 6-14, 2021. [PDF] (Cited by 1226)
[NeurIPS 2021] Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin; Integrating Tree Path in Transformer for Code Representation; Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), Online, December 6-14, 2021. [PDF] (Cited by 68)
[ASE 2021] Jia Allen Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, Zhi Jin; EDITSUM: A Retrieve-and-Edit Framework for Source Code Summarization; Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, Sun 14 - Sat 20 November, 2021. [PDF] (Cited by 94)
[IJCAI 2020] Wenjie Zhang, Zeyu Sun, Qihao Zhu, Ge Li, Shaowei Cai, Yingfei Xiong, Lu Zhang; NLocalSAT: Boosting Local Search with Solution Prediction; Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), Yokohama, Japan, January 7-15, 2021, pp. 1177-1183. [PDF]
[ASE 2020] Fang Liu, Ge Li, Yunfei Zhao, Zhi Jin; Multi-task Learning based Pre-trained Language Model for Code Completion; Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, Sep. 21-25, 2020. [PDF] (Cited by 277)
[ASE 2020] Bolin Wei, Yongmin Li, Ge Li, Xin Xia, Zhi Jin; Retrieve and Refine: Exemplar-based Neural Comment Generation; Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, Sep. 21-25, 2020. [PDF] (Cited by 154)
[TOSEM, 2020] Wenhan Wang, Ge Li, Sijie Shen, Xin Xia, Zhi Jin; Modular Tree Network for Source Code Representation Learning; ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 29, No. 4, Article 31, September 2020. [PDF] (Cited by 66)
[ICPC 2020] Fang Liu, Ge Li, Xin Xia, Bolin Wei, Zhi Jin; A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning; Proceedings of the 28th IEEE/ACM International Conference on Program Comprehension (ICPC), Seoul, South Korea, May 23-24, 2020, Pages 37–47. (ACM SIGSOFT Distinguished Paper Award) [PDF] (Cited by 101)
[SANER 2020] Wenhan Wang, Ge Li, Bo Ma, Xin Xia, Zhi Jin; Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree; Proceedings of the 27th IEEE International Conference on Software Analysis (SANER), Evolution and Reengineering London, Ontario, Canada, February 18-21, 2020. [PDF] (Cited by 394)
[AAAI 2020] Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, Zhi Jin; Generating Adversarial Examples for Holding Robustness of Source Code Processing Models; Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, USA, Feb 7-12, 2020. [PDF] (Cited by 158)
[NeurIPS 2019] Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, Zhi Jin; Code Generation as a Dual Task of Code Summarization; Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, Dec 8-14, 2019, pp.6563-6573. [PDF] (Cited by 279)
[ICASSP 2019] Bolin Wei, Shuai Lu, Lili Mou, Hao Zhou, Pascal Poupart, Ge Li, Zhi Jin; Why Do Neural Dialog Systems Generate Short and Meaningless Replies? a Comparison between Dialog and Translation; Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, May 12-17, 2019, pp.7290-7294. [PDF] (Cited by 38)
[COMPSAC 2019] Xing Hu, Rui Men, Ge Li, Zhi Jin; Deep-AutoCoder: Learning to Complete Code Precisely with Induced Code Tokens; Proceedings of 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, Wisconsin, USA, Jul. 15-19, 2019. [PDF] (Cited by 11)
[EMSE, 2019] Xing Hu, Ge Li, Xin Xia, David Lo, Zhi Jin; Deep Code Comment Generation with Hybrid Lexical and Syntactical Information; Empirical Software Engineering (EMSE), Vol. 25, Iss. 3, Jun. 18, 2019. pp 2179–2217. [PDF] (Cited by 338)
[ICPC 2019] Hao Yu, Wing Lam, Long Chen, Ge Li, Tao Xie, Qianxiang Wang; Neural Detection of Semantic Code Clones via Tree-based Convolution; Proceedings of the 27th International Conference on Program Comprehension (ICPC), Montreal, QC, Canada, May 25-31, 2019, pp. 70-80. [PDF] (Cited by 184)
[AAAI 2019] Zeyu Sun, Qihao Zhu, Lili Mou, Yingfei Xiong, Ge Li, Lu Zhang; A Grammar-Based Structural CNN Decoder for Code Generation; Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), Honolulu, Hawaii, USA, Jan. 27 – Feb. 1, 2019. [PDF] (Cited by 166)
[ICPC 2018] Xiaochen Li, He Jiang, Dong Liu, Zhilei Ren, Ge Li; Unsupervised Deep Bug Report Summarization; Proceedings of the 26th Conference on Program Comprehension (ICPC), 2018. pp. 144-155. [PDF] (Cited by 82)
[IJCAI 2018] Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, Zhi Jin; Summarizing Source Code with Transferred API Knowledge; Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), July 13-19, 2018, Stockholm, Sweden. pp. 2269-2275. [PDF] (Cited by 377)
[ICPC 2018] Xing Hu, Ge Li, Xia Xin, David Lo, Zhi Jin; Deep Code Comment Generation; Proceedings of IEEE/ACM 26th International Conference on Program Comprehension (ICPC), Gothenburg, Sweden, 27-28 May 2018, pp.200-210. (ACM SIGSOFT Distinguished Paper Award) [PDF] (Cited by 948)
[KSEM 2017] Yunchuan Chen, Ge Li and Zhi Jin; Learning Sparse Overcomplete Word Vectors without Intermediate Dense Representations; Proceedings of the 10th International Conference on Knowledge Science, Engineering and Management (KSEM), Melbourne, Australia, August,19-20, 2017. [PDF] (Cited by 5)
[KSEM 2017] Yangyang Lu, Ge Li, Zelong hao, Lingfeng Wen and Zhi Jin; Learning To Infer API Mappings From API Documents; Proceedings of the 10th International Conference on Knowledge Science, Engineering and Management (KSEM), Melbourne, Australia, August,19-20, 2017. [PDF] (Cited by 17)
[KSEM 2017] Wenhao Huang, Ge Li and Zhi Jin; Improved Knowledge Base Completion by the Path-Augmented TransR Model; Proceedings of the 10th International Conference on Knowledge Science, Engineering and Management (KSEM), Melbourne, Australia, August,19-20, 2017. [PDF] (Cited by 18)
[COLING 2016] Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, Zhi Jin; Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation; Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, December 11-17, 2016, pp. 3349–3358. [PDF] (Cited by 303)
[COLING 2016] Yan Xu, Ran Jia, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu and Zhi Jin; Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation; Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, December 11-17, 2016, pp. 1461–1470. [PDF] (Cited by 304)
[EMNLP 2016] Lili Mou, Zhao Meng, Rui Yan, Ge Li, Yan Xu, Lu Zhang, Zhi Jin; How Transferable are Neural Networks in NLP Applications?; Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, Texas, November 1-5, 2016, pp. 479–489. [PDF] (Cited by 3)
[ACL 2016] Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, Zhi Jin; Compressing Neural Language Models by Sparse Word Representations; Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, August 7-12, 2016, pp. 226–235. [PDF] (Cited by 35)
[ACL 2016] Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, Zhi Jin; Natural Language Inference by Tree-Based Convolution and Heuristic Matching; Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, August 7-12, 2016, pp. 130–136. [PDF] (Cited by 431)
[AAAI 2016] Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin; Convolutional Neural Networks over Tree Structures for Programming Language Processing; Proceedings of 2016 AAAI Conference on Artificial Intelligence, pages 1287-1293, Phoenix, USA, January 12-18, 2016. [PDF] (Cited by 1170)
[CIKM 2016] Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, Zhi Jin; Distilling Word Embeddings: An Encoding Approach; Proceedings of the 25th ACM International Conference on Information and Knowledge Management, Indianapolis, USA, October 24-28, 2016. [PDF] (Cited by 31)
[KSEM 2016] Zhao Meng, Lili Mou, Ge Li and Zhi Jin; Context-Aware Tree-Based Convolutional Neural Networks for Natural Language Inference; Proceedings of 9th International Conference on Knowledge Science, Engineering and Management, Passau, Germany, October 4-8, 2016, LNAI 9983, pp. 515–526. [PDF] (Cited by 1)
[KSEM 2016] Yangyang Lu, Ge Li, Rui Miao, Zhi Jin; Learning Embeddings Of API Tokens To Facilitate Deep Learning Based Program Processing; Proceedings of 9th International Conference on Knowledge Science, Engineering and Management, Passau, Germany, October 4-8, 2016, LNAI 9983, pp. 527–539. [PDF] (Cited by 4)
[EMNLP 2015] Hao Peng, Lili Mou, Ge Li, Yan Xu, Lu Zhang, Zhi Jin; A Comparative Study on Regularization Strategies for Embedding-based Neural Networks; Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisboa, Portugal, September 17–21, 2015. [PDF] (Cited by 42)
[KSEM 2015] Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang and Zhi Jin; Building Program Vector Representations for Deep Learning; Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management, Chongqing, China October 28-30, 2015. pp. 547-553. [PDF] (Cited by 227)
[EMNLP 2015] Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, Zhi Jin; Discriminative Neural Sentence Modeling by Tree-Based Convolution; Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17-21 September, 2015. pp. 2315–2325. [PDF] (Cited by 160)
[EMNLP 2015] Yan Xu, Lili Mou, Ge Li, Lu Zhang, Zhi Jin; Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths; Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisboa, Portugal, September 17–21, 2015. [PDF] (Cited by 901)
[KSEM 2014] Lili Mou, Ge Li, Zhi Jin, Lu Zhang; Verification based on Hyponymy Hierarchical Characteristics for Web-based Hyponymy Discovery; Proceedings of the International Conference on Knowledge Science, Engineering and Management 2014, Lecture Notes in Computer Science Volume 8793, 2014, pp 81-92. [PDF]
[IJSEKE, 2014] Yan Xu, Ge Li, Lili Mou, Yangyang Lu; Learning Non-taxonomy Relations on Demand for Ontology Extension; Proceedings of the International Journal of Software Engineering and Knowledge Engineering, October 2014, Vol.24, No.08, pp.1159-1175. [PDF] (Cited by 10)

Basic Papers on Deep Learning based Code Processing

[arXiv 2015] Lili Mou, Rui Men, Ge Li, Lu Zhang, Zhi Jin; On End-to-End Program Generation from User Intention by Deep Neural Networks; arXiv preprint, arXiv: 1510.07211, 2015. (Cited by 84)
[arXiv 2014] Lili Mou, Ge Li, Zhi Jin, Lu Zhang, Tao Wang; TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing; arXiv preprint, arXiv: 1409.5718, 2014.
[arXiv 2014] Lili Mou, Ge Li, Yuxuan Liu, Hao Peng, Zhi Jin, Yan Xu, Lu Zhang; Building Program Vector Representations for Deep Learning; arXiv preprint, arXiv: 1409.3358, 2014. (Cited by 227)