Huimin Cui
Research Direction: Compiler Construction; Compiler Optimization; Heterogeneous Computing
Department: State Key Lab of Processors
Tutor Category: PhD advisor in Computer Architecture
Contact: cuihm@ict.ac.cn
Personal Page: https://cuihuimin.github.io/
About Me
Huimin Cui is a Professor in the State Key Lab of Processors, ICT, CAS where she leads the programming languages and compilers group. She received her Bachelor and Master degrees from Tsinghua University in 2001 and 2004, respectively, and her PhD degree in ICT, CAS in 2012.
Huimin Cui's research interests include programming languages, compiler technology and program optimizations. Her current focus is on two areas: (1) compiler optimizations for heterogeneous architectures (including GPU, NPU, DPU and other ASICs), especially for AI and big data applications. (2) software and hardware co-design for new architectures by leveraging compiler analysis. A number of her work have been used in industrial community, including Huawei, Sunway, etc.
Huimin Cui serves as the Editorial Board of Young Scientists of JCST. She also served as the PC member of a number of major conferences in her field, including the Track Chair of Programming Models and System Software in ISC'20, the PC member of PPoPP, CC, CGO, ISCA (ERC), NPC, HIPS, ICPE, PACT, ICPP Cluster and SC.
Our Team
Jiacheng Zhao, Compiler optimization for GPUs and NPUs
Ying Liu, Hetergeneous compiler
Fang Lv, Performance analysis and optimizations
Chenxi Wang, Programming Languague and System
Publications
-
Qiwu: Exploiting Ciphertext-Level SIMD Parallelism in Homomorphic Encryption Programs
By Zhongcheng Zhang, Ying Liu, Yuyang Zhang, Zhenchuan Chen, Jiacheng Zhao, Xiaobing Feng, Huimin Cui, Jingling Xue
CGO, 2025
-
Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization
By Feng Yu, Guangli Li, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue
ASPLOS, 2024
-
Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions
By Chunwei Xia, Jiacheng Zhao, Qianqi Sun, Zheng Wang, Yuan Wen, Teng Yu, Huimin Cui, Xiaobing Feng
ASPLOS, 2024
-
Reinvent Cloud Software Stacks for Resource Disaggregation (Cover Article)
By Chenxi Wang, Yizhou Shan, Pengfei Zuo, Huimin Cui
JCST, 2023
-
VTensor: Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework
By Feng Yu, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue
JCST, 2023
-
Portable and Scalable All-Electron Quantum Perturbation Simulations on Exascale Supercomputers
By Zhikun Wu, Yangjun Wu, Ying Liu, Honghui Shang, Yingxiang Gao, Zhongcheng Zhang, Yuyang Zhang, Yingchi Long, Xiaobing Feng, Huimin Cui
SC, 2023
-
Honeycomb: An Secure, Efficient GPU Execution Environment with Minimal TCB
By Haohui Mai, Jiacheng Zhao, Christos Kozyrakis, Mingyu Gao, Hongren Zheng, Quanxi Li, Zibin Liu, Cong Wang, Huimin Cui, Xiaobing Feng
OSDI, 2023
-
OpenCL-Accelerated First-Principles All-Electron Quantum Perturbation Calculations on HPC Resources
By Zhikun Wu, Honghui Shang, Yangjun Wu, Zhongcheng Zhang, Ying Liu, Yuyang Zhang, Yucheng Ouyang, Huimin Cui, Xiaobing Feng
Front. Chem., 26 May 2023 Sec. Theoretical and Computational Chemistry Volume 11 - 2023
-
Sirius: Harvesting Whole-Program Optimization Opportunities for DNNs.
By Yijin Li, Jiacheng Zhao, Qianqi Sun, Haohui Mai, Lei Chen, Wanlu Cao, Yanfan Chen, Zhicheng Li, Ying Liu, Xinyuan Zhang, Xiyu Shi, Jie Zhao, Jingling Xue, Huimin Cui, Xiaobing Feng
Sixth Conference on Machine Learning and Systems (MLSys), 2023
-
OCCAMY:Elastically Sharing a SIMD Co-processor Across Multiple CPU Cores.
By Zhongcheng Zhang, Yan Ou, Ying Liu, Chenxi Wang, Yongbin Zhou, Xiaoyu Wang, Yuyang Zhang, Yucheng Ouyang, Jiahao Shan, Ying Wang, Jingling Xue, Huimin Cui, Xiaobing Feng.
The ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
-
Scaling Poisson Solvers on Many Cores via MMEwald.
By Mingchuan Wu, Yangjun Wu, Honghui Shang, Ying Liu, Huimin Cui, Fang Li, Xiaohui Duan, Yunquan Zhang, Xiaobing Feng.
IEEE Trans. Parallel Distributed Syst. 33(8): 1888-1901, 2022
-
Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over
Hybrid Memories.
By Chen Lei, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang
Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui.
ACM Transactions on Computer Systems (TOCS), 2022
-
NRHI: A Concurrent Non-Rehashing Hash Index for Persistent Memory.
By Xinyu Li, Huimin Cui, Lei Liu.
ICCD 2021: 146-153
-
Accelerating all-electron ab initio simulation of raman spectra for biological systems.
By Honghui Shang, Fang Li, Yunquan Zhang, Ying Liu, Libo Zhang, Mingchuan Wu, Yangjun Wu, Di Wei, Huimin Cui, Xin Liu, Fei Wang, Yuxi Ye, Yingxiang Gao, Shuang Ni, Xin Chen, Dexun Chen.
systems. SC 2021: 41
-
DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing.
By Chunwei Xia, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue.
ACM Trans. Archit. Code Optim. 16(4): 49:1-49:26 (2020)
-
Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory.
By Mingchuan Wu, Ying Liu, Huimin Cui, Qingfu Wei, Quanfeng Li, Limin Li, Fang Lv, Jingling Xue, Xiaobing Feng.
PACT 2020: 97-109
-
VTensor: Using Virtual Tensors to Build a Layout-oblivious AI Programming Framework.
By Feng Yu, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue.
PACT 2020: 345-346
-
Referee: A Pattern-Guided Approach for Auto Design in Compiler-Based Analyzers.
By Fang Lv, Hao Li, Lei Wang, Ying Liu, Huimin Cui, Jingling Xue, Xiaobing Feng.
SANER 2020: 1-12
-
PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion.
By Ying Liu, Lei Huang, Mingchuan Wu, Huimin Cui, Fang Lv, Xiaobing Feng, Jingling Xue.
CC 2019: 2-16
-
Panthera: holistic memory management for big data processing over hybrid memories.
By Chenxi Wang, Huimin Cui, Ting Cao, John N. Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu.
PLDI 2019: 347-362
-
NVM Streaker: a fast and reconfigurable performance simulator for non-volatile memory-based memory architecture.
By Danqi Hu, Fang Lv, Chenxi Wang, Huimin Cui, Lei Wang, Ying Liu, Xiaobing Feng.
J. Supercomput. 74(8): 3875-3903 (2018)
-
Revisiting Loop Tiling for Datacenters: Live and Let Live.
By Jiacheng Zhao, Huimin Cui, Yalin Zhang, Jingling Xue, Xiaobing Feng.
ICS 2018: 328-340
-
Characterizing DNN Models for Edge-Cloud Computing.
By Chunwei Xia, Jiacheng Zhao, Huimin Cui, Xiaobing Feng.
IISWC 2018: 82-83
-
Automating the Exchangeability of Shared Data Abstractions.
By Jiange Zhang, Qian Wang, Qing Yi, Huimin Cui.
LCPC 2018: 185-192
-
On Retargeting the AI Programming Framework to New Hardwares.
By Jiacheng Zhao, Yisong Chang, Denghui Li, Chunwei Xia, Huimin Cui, Ke Zhang, Xiaobing Feng.
NPC 2018: 39-51
-
Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation.
By Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Ying Liu, Xiaobing Feng.
PPoPP 2018: 276-289
-
Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis.
By Jiacheng Zhao, Huimin Cui, Jingling Xue, Xiaobing Feng.
IEEE Trans. Parallel
-
Articulation points guided redundancy elimination for betweenness centrality.
By Lei Wang, Fan Yang, Liangji Zhuang, Huimin Cui, Fang Lv, Xiaobing Feng.
PPoPP 2016: 7:1-7:13
-
WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers.
By Fang Lv, Lei Liu, Huimin Cui, Lei Wang, Ying Liu, Xiaobing Feng, Pen-Chung Yew.
J. Supercomput. 71(8): 3054-3093 (2015)
-
Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters.
By Wenting He, Huimin Cui, Binbin Lu, Jiacheng Zhao, Shengmei Li, Gong Ruan, Jingling Xue, Xiaobing Feng, Wensen Yang, Youliang Yan.
ICS 2015: 143-153
-
Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms.
By Fang Lu, Huimin Cui, Lei Wang, Lei Liu, Chenggang Wu, Xiaobing Feng, Pen-Chung Yew.
J. Comput. Sci. Technol. 29(1): 21-37 (2014)
-
A collaborative divide-and-conquer K-means clustering algorithm for processing large data.
By Huimin Cui, Gong Ruan, Jingling Xue, Rui Xie, Lei Wang, Xiaobing Feng.
Conf. Computing Frontiers 2014: 20:1-20:10
-
Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations.
By Qing Yi, Qian Wang, Huimin Cui.
MICRO 2014: 596-608
-
Layout-oblivious compiler optimization for matrix computations.
By Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng.
Layout-oblivious compiler optimization for matrix computations.
-
An empirical model for predicting cross-core performance interference on multicore processors.
By Jiacheng Zhao, Xiaobing Feng, Huimin Cui, Youliang Yan, Jingling Xue.
PACT 2013: 201-212
-
A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs.
By Yang Yang, Huimin Cui, Xiaobing Feng, Jingling Xue.
J. Comput. Sci. Technol. 27(1): 57-74 (2012)
-
Extendable pattern-oriented optimization directives.
By Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan.
ACM Trans. Archit. Code Optim. 9(3): 14:1-14:37 (2012)
-
Layout-oblivious optimization for matrix computations.
By Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng.
PACT 2012: 429-430
-
A Highly Parallel Reuse Distance Analysis Algorithm on GPUs.
By Huimin Cui, Qing Yi, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng.
IPDPS 2012: 1080-1092
-
Extendable pattern-oriented optimization directives.
By Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan.
CGO 2011: 107-118
-
Automatic Library Generation for BLAS3 on GPUs.
By Huimin Cui, Lei Wang, Jingling Xue, Yang Yang, Xiaobing Feng.
IPDPS 2011: 255-265
-
Landing Stencil Code on Godson-T.
By Huimin Cui, Lei Wang, Dong-Rui Fan, Xiaobing Feng.
J. Comput. Sci. Technol. 25(4): 886-894 (2010)
-
An adaptive task creation strategy for work-stealing scheduling.
By Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng, Pen-Chung Yew.
CGO 2010: 266-277
-
Optimized Register Renaming Scheme for Stack-Based x86 Operations.
By Xuehai Qian, He Huang, Zhenzhong Duan, Junchao Zhang, Nan Yuan, Yongbin Zhou, Hao Zhang, Huimin Cui, Dongrui Fan.
ARCS 2007: 43-56