#239 Stealing part of a production language model

174 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

#239 Stealing part of a production language model

This paper introduces the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2. Specifically, the attack recovers the embedding projection layer of a transformer model, given typical API access. For under $20 USD, the attack extracts the entire projection matrix of OpenAI’s ada and babbage language models. The attack also helped recover the exact hidden dimension size of the gpt-3.5-turbo model, and the authors estimate that it would cost under $2,000 in queries to recover the entire projection matrix. 

In this video, I talk about the following: What part of LMs can you steal and how? How can you extract hidden dimensionality and full projection matrix for logit-vector APIs? Extraction Attack for Top-5 Logit Bias APIs . Extraction Attack for top-1 Binary Logit Bias APIs. How to defend against such attacks?

For more details, please look at https://arxiv.org/pdf/2403.06634

Carlini, Nicholas, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee et al. "Stealing part of a production language model." In Forty-first International Conference on Machine Learning.

#239 Stealing part of a production language model

Nhạc Theo Chủ Đề

Liên kết website