MedAI #131: Analyzing and Exposing Vulnerabilities in Language Models | Yibo Wang

MedAI #131: Analyzing and Exposing Vulnerabilities in Language Models | Yibo Wang

443 Lượt nghe
MedAI #131: Analyzing and Exposing Vulnerabilities in Language Models | Yibo Wang
Title: Analyzing and Exposing Vulnerabilities in Language Models Speaker: Yibo Wang Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various applications, yet they remain vulnerable to biases and adversarial attacks, compromising their trustworthiness. This presentation introduces two papers exploring these critical issues: robustness and fairness in LLMs. The first paper introduces a new adversarial attack method with lower detectability and better transferability to LLMs. While recent attacks achieve high success rates, the adversarial examples often deviate from the original data distribution, making them detectable. This paper proposes a Distribution-Aware Adversarial Attack method that considers distribution shifts to enhance attack effectiveness. Experiments validate the method’s efficacy and transferability to LLMs across multiple datasets and models. The second paper explores gender affiliations in text generation, where LLMs often infer gender from inputs without explicit gender information, reinforcing stereotypes. The paper systematically investigates, quantifies, and mitigates gender affiliations in LLMs. Speaker Bio: Yibo Wang is a Ph.D. student in the Computer Science Department at University of Illinois Chicago, under the supervision of Professor Philip S. Yu. Her primary research areas include natural language processing and large language models, with a focus on trustworthy large language models, and code generation using large language models. ------ The MedAI Group Exchange Sessions are a platform where we can critically examine key topics in AI and medicine, generate fresh ideas and discussion around their intersection and most importantly, learn from each other. We will be having weekly sessions where invited speakers will give a talk presenting their work followed by an interactive discussion and Q&A. Our sessions are held every Monday from 1pm-2pm PST. To get notifications about upcoming sessions, please join our mailing list: https://mailman.stanford.edu/mailman/listinfo/medai_announce For more details about MedAI, check out our website: https://medai.stanford.edu. You can follow us on Twitter @MedaiStanford Organized by members of the Rubin Lab (http://rubinlab.stanford.edu) and Machine Intelligence in Medicine and Imaging (MI-2) Lab: - Nandita Bhaskhar (https://www.stanford.edu/~nanbhas) - Amara Tariq (https://www.linkedin.com/in/amara-tariq-475815158/) - Avisha Das (https://dasavisha.github.io/)