Chenhe Gu

AI/ML Research Scientist | Multimodal LLMs & AI Safety

LinkedIn | GitHub

About

Highly motivated MS candidate specializing in AI Safety, Robustness, and Trustworthy Machine Learning with a strong focus on Multimodal Large Language Models (MLLMs). Proven expertise in developing advanced adversarial attacks, enhancing model generalization, and improving data valuation for high-impact research outcomes. Eager to leverage deep technical skills and research acumen to drive innovation in cutting-edge AI applications.

Work Experience

Grader

University of California, Irvine

Mar 2024 - May 2024

Provided academic support and evaluation for undergraduate courses in Databases and Data Management at the School of Information and Computer Sciences.

  • Graded assignments and provided feedback for CS220P: Databases and Data Management in Fall 2024, ensuring accurate assessment of student comprehension.
  • Supported CS122A: Introduction to Data Management in Spring 2024, contributing to effective student learning outcomes.
  • Assisted faculty in maintaining high academic standards and facilitating student success in core computer science subjects.

Research Intern

University of California, Santa Barbara

Feb 2023 - May 2024

Conducted advanced research on AI safety and robustness in Multimodal Large Language Models (MLLMs), developing novel adversarial attacks and enhancing model generalization techniques.

  • Developed and demonstrated that MLLMs are susceptible to fine-tuning risks, revealing that fine-tuning can break their safety alignment, critical for AI safety research.
  • Proposed the Dynamic Vision-Language Alignment (DynVLA) Attack, a transfer-based adversarial method that leverages Gaussian kernel perturbation to generate adversarial examples.
  • Enabled closed-source models like Google Gemini to generate target text, highlighting significant real-world security vulnerabilities in MLLMs.
  • Identified crucial roles of vision-language connector architecture, LLM size, and LLM type in effective surrogate model selection and improved transferability for adversarial attacks.
  • Utilized a model merging method on Low Rank Adaptations (LoRAs) to achieve superior generalization, outperforming model soup in both ID and OOD accuracy under few-shot learning settings.

Part-time Open Source Contributor

Cleanlab Inc.

Nov 2023 - Apr 2024

Contributed to Cleanlab's open-source library by developing advanced Out-of-Distribution (OOD) detection methods and integrating data valuation modules.

  • Developed GEN Out-of-Distribution (OOD) detection methods, significantly enhancing Cleanlab's OOD detection performance for high-resolution image datasets, including ImageNet.
  • Integrated a data valuation module leveraging KNN-Shapley value to score data point contributions, establishing it as a core feature of the Cleanlab library.
  • Improved the utility and robustness of Cleanlab's offerings, directly impacting the quality of data-centric AI applications.

Research Intern

Virginia Tech

May 2022 - Oct 2022

Investigated the linearity of representation in backdoored models, focusing on data poisoning and stealthy trigger generation.

  • Analyzed the Pearson correlation coefficient of representation between clean and poisoned inputs to understand backdoor attack mechanisms.
  • Proposed an innovative training process and a method to generate a more stealthy trigger, significantly improving the covertness of backdoor attacks.
  • Contributed to research on data poisoning and backdoor attack mitigation strategies, enhancing the robustness of machine learning models.

Education

Computer Science (Networked Systems)

University of California, Irvine

3.83/4.00

Sep 2023 - Jun 2025

Computer Science

Southeast University

3.78/4.00 | 88.18

Sep 2019 - Jun 2023

Awards

Student Scholarship

Southeast University

Jun 2023

Received a prestigious student scholarship from Southeast University, acknowledging academic excellence and research potential.

Student Travel Support Award

IEEE Conference on Secure and Trustworthy ML (SaTML)

Jan 2023

Awarded travel support to attend the inaugural IEEE Conference on Secure and Trustworthy ML (SaTML) in 2023, recognizing contributions to the field.

Publications

Prompt-insensitive evaluation: Generalizing llm evaluation across prompts through fine-tuning

Not specified

Jan 2025

Co-authored a forthcoming paper on prompt-insensitive evaluation for LLMs, focusing on generalizing evaluation across prompts via fine-tuning.

Improving adversarial transferability in MLLMs via dynamic vision-language alignment attack

Under Review

Jan 2024

Co-authored a paper proposing a novel Dynamic Vision-Language Alignment (DynVLA) Attack to enhance adversarial transferability in Multimodal Large Language Models, currently under review for publication.

Skills

Frameworks & Libraries

  • PyTorch
  • Hugging Face Transformers
  • PEFT
  • TRL
  • Diffusers

Machine Learning & AI

  • Fine-tuning CLIP
  • Diffusion Models
  • Multimodal Large Language Models (MLLMs)
  • LLaVA
  • Llama-Vision
  • Large Language Models (LLMs)
  • SFT (Supervised Fine-tuning)
  • RLHF (Reinforcement Learning from Human Feedback)
  • AI Safety
  • Robustness
  • Trustworthy ML
  • Alignment
  • Post-training
  • Data-Centric AI
  • Adversarial Attacks
  • Out-of-Distribution (OOD) Detection
  • Model Generalization
  • Low Rank Adaptations (LoRA)
  • Data Valuation
  • KNN-Shapley Value
  • Backdoor Attack
  • Data Poisoning
  • Representation Analysis

Programming Languages

  • Python
  • C/C++
  • Go
  • Java
  • Shell
  • JavaScript

Interests

Research Interests

  • AI Safety
  • Robustness
  • Trustworthy ML
  • Alignment/Post-training
  • Data-Centric AI
  • Multimodal Large Language Models
  • Adversarial Examples
  • Backdoor Attacks

References

Prof. Yao Qin

Assistant Professor and Senior Research Scientist, University of California, Santa Barbara; Google DeepMind

Dr. Jindong Gu

Senior Research Fellow and Faculty Researcher, University of Oxford; Google DeepMind