From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

2310.00492

YC

0

Reddit

0

Published 4/5/2024 by Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu

šŸ’¬

Abstract

Large Language Models (LLMs) have achieved remarkable success, where instruction tuning is the critical step in aligning LLMs with user intentions. In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes. Specifically, we first develop several local and global explanation methods, including a gradient-based method for input-output attribution, and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. This approach provides an internal perspective of the model shifts on a human-comprehensible level. Our findings reveal three significant impacts of instruction tuning: 1) It empowers LLMs to recognize the instruction parts of user prompts, and promotes the response generation constantly conditioned on the instructions. 2) It encourages the self-attention heads to capture more word-word relationships about instruction verbs. 3) It encourages the feed-forward networks to rotate their pre-trained knowledge toward user-oriented tasks. These insights contribute to a more comprehensive understanding of instruction tuning and lay the groundwork for future work that aims at explaining and optimizing LLMs for various applications. Our code and data are publicly available at https://github.com/JacksonWuxs/Interpret_Instruction_Tuning_LLMs.

Get summaries of the top AI research delivered straight to your inbox:

Overview

  • Large language models (LLMs) have achieved remarkable success, with instruction tuning being a critical step in aligning them with user intentions.
  • This work investigates how instruction tuning adjusts pre-trained models, focusing on intrinsic changes.
  • The researchers developed explanation methods to understand the impact of instruction tuning on LLMs.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. A key step in making these models useful is "instruction tuning," which trains them to follow specific instructions from users.

In this study, the researchers looked at how instruction tuning changes the inner workings of LLMs. They developed new ways to "look under the hood" and explain what's happening inside the models, both before and after instruction tuning.

The main findings are:

  1. Instruction tuning helps LLMs recognize when users are giving them instructions, and generates responses that are constantly focused on following those instructions.

  2. Instruction tuning changes the model's "attention" mechanism, making it better at recognizing relationships between instruction words.

  3. Instruction tuning also adjusts the model's core knowledge and abilities, rotating them towards being more useful for user-oriented tasks.

These insights help us better understand how instruction tuning shapes LLMs to be more aligned with human intentions. This lays the groundwork for future work on explaining and improving LLMs for real-world applications.

Technical Explanation

The researchers first developed new "explanation methods" to analyze the inner workings of LLMs. This includes a gradient-based technique for tracing the influence of different input parts on the model's outputs, as well as ways to interpret the patterns and concepts captured by the model's attention and core processing layers.

They then used these explanation methods to compare pre-trained LLMs and LLMs that had undergone instruction tuning. This allowed them to see how the instruction tuning process changes the model's behavior and inner representations.

The key findings were:

  1. Instruction tuning helps the model recognize when the input contains instructions, and shapes the response generation to be constantly conditioned on following those instructions.

  2. Instruction tuning encourages the model's attention mechanism to better capture relationships between instruction-related words.

  3. Instruction tuning also encourages the model's core "feed-forward" processing to shift its pre-trained knowledge towards being more useful for user-oriented tasks.

These insights contribute to a deeper understanding of how instruction tuning works to align LLMs with human intentions. The researchers make their code and data publicly available to support further work in this area.

Critical Analysis

The paper provides a thoughtful and rigorous analysis of the impact of instruction tuning on LLMs. The researchers' development of new explanation methods is a valuable contribution, as it allows for a more nuanced and interpretable view of these complex models.

However, the study is limited to a single LLM architecture (GPT-3) and a specific set of tasks. It would be important to see if the findings generalize to other LLM models and a broader range of applications. Additionally, the paper does not delve into potential risks or unintended consequences of instruction tuning, such as backdoor vulnerabilities or misalignment with human feedback.

Further research could explore the psychometric and predictive power of LLMs under different instruction tuning regimes, or investigate more effective instruction tuning techniques that balance capabilities and alignment.

Conclusion

This study provides valuable insights into how instruction tuning shapes the inner workings of large language models. By developing new explanation methods, the researchers were able to uncover several key impacts of the instruction tuning process, including improved recognition of instructions, better attention to instruction-related words, and a shift in the model's core knowledge towards user-oriented tasks.

These findings contribute to a deeper understanding of how to align LLMs with human intentions, which is crucial as these models become increasingly ubiquitous in real-world applications. The publicly available code and data from this work will also support further research in this important area of AI development.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

āœ…

Instruction Tuning With Loss Over Instructions

Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani

YC

0

Reddit

0

Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that the improvement can be attributed to reduced overfitting to instruction tuning datasets. Our work provides practical guidance for instruction tuning LMs, especially in low-resource scenarios.

Read more

5/24/2024

Instruction-tuned Language Models are Better Knowledge Learners

Instruction-tuned Language Models are Better Knowledge Learners

Zhengbao Jiang, Zhiqing Sun, Weijia Shi, Pedro Rodriguez, Chunting Zhou, Graham Neubig, Xi Victoria Lin, Wen-tau Yih, Srinivasan Iyer

YC

0

Reddit

0

In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe struggle to answer questions, even though the perplexity of documents is minimized. We found that QA pairs are generally straightforward, while documents are more complex, weaving many factual statements together in an intricate manner. Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. This contrasts with standard instruction-tuning, which learns how to extract knowledge after training on documents. Extensive experiments and ablation studies demonstrate that pre-instruction-tuning significantly enhances the ability of LLMs to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%.

Read more

5/28/2024

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan Szpektor, Reut Tsarfaty, Matan Eyal

YC

0

Reddit

0

As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.

Read more

5/22/2024

šŸ’¬

Psychometric Predictive Power of Large Language Models

Tatsuki Kuribayashi, Yohei Oseki, Timothy Baldwin

YC

0

Reddit

0

Instruction tuning aligns the response of large language models (LLMs) with human preferences. Despite such efforts in human--LLM alignment, we find that instruction tuning does not always make LLMs human-like from a cognitive modeling perspective. More specifically, next-word probabilities estimated by instruction-tuned LLMs are often worse at simulating human reading behavior than those estimated by base LLMs. In addition, we explore prompting methodologies for simulating human reading behavior with LLMs. Our results show that prompts reflecting a particular linguistic hypothesis improve psychometric predictive power, but are still inferior to small base models. These findings highlight that recent advancements in LLMs, i.e., instruction tuning and prompting, do not offer better estimates than direct probability measurements from base LLMs in cognitive modeling. In other words, pure next-word probability remains a strong predictor for human reading behavior, even in the age of LLMs.

Read more

4/16/2024