Using LLMs to Model the Beliefs and Preferences of Targeted Populations
0
Sign in to get full access
Introduction
The paper discusses using large language models (LLMs) as statistical proxies to study human beliefs, preferences, and behaviors. Existing work shows conflicting results on whether LLMs can accurately model human behaviors. The paper aims to align LLM beliefs and preferences with real humans expressed in surveys.
The key contributions are:
-
Using battery-electric vehicle survey data, the authors demonstrate parameter-efficient fine-tuning techniques to improve LLM agreement with human preferences.
-
They investigate the effects of model size, finding larger pre-trained models perform best out-of-the-box, but this advantage diminishes after fine-tuning.
-
They explore quantization and sampling temperature, finding quantized fine-tuning provides computational savings with minimal degradation, and temperature allows trading off population-wide vs. per-individual response matching.
-
They propose a novel penalty term in the loss function to improve performance on numerical survey questions.
-
They benchmark against baseline algorithms trained on survey data, demonstrating fine-tuned LLMs can outperform them in specific settings.
The paper aims to arrive at interactive models for studying target populations, useful for applications like marketing, community simulations, and testing interventions when impractical or unethical in the real world.
Related Work
The provided text discusses the potential of large language models (LLMs) to simulate human behaviors, personalities, and judgments. Recent studies have demonstrated LLMs' ability to reliably simulate personalities, align with human moral judgments, and generate human-like behaviors in interactive environments. Initial attempts have been made to use LLMs as synthetic human participants in behavioral studies, reproducing findings from previous experiments with real participants. However, previous work has primarily focused on measuring concordance between LLMs and participant data, with some studies finding a lack of agreement in certain domains. The text introduces the authors' work, which provides a framework to align LLMs with human preferences and explores techniques to improve the level of agreement between LLMs and humans.
Background
The passage discusses auto-regressive large language models and techniques for fine-tuning them. Auto-regressive language models learn to predict the next token in a sequence of text, trained in an unsupervised manner on large natural language corpora. A common architecture is the transformer decoder with a causal mask.
To adapt these large pre-trained models to specific tasks like sentiment analysis or summarization, a common approach is fine-tuning, which updates all model parameters and can be computationally expensive for large models.
The paper introduces Low-Rank Adaptation (LoRA), a technique that freezes the pre-trained model weights and only trains low-rank matrices added throughout the model, dramatically reducing computational cost. Quantized LoRA (QLoRA) further quantizes model weights to 4-bit representations while preserving information, and employs memory optimization techniques.
Problem setting
This problem involves using a small amount of survey data from a representative sample of a target human population, along with available demographic information relevant to characterizing the population. The answer a_ij of a participant i with demographics x_i^d is generated from a probability distribution p(a|x_i^d, x_j^q), where x_j^q denotes the questionnaire j.
The specific example used is the EV-shift dataset, which examines the impact of interventions on people's preferences for electric vehicles (EVs) compared to internal combustion vehicles. The study aimed to identify how effectively different text-based interventions changed people's preferences for EVs.
In the study, subjects provided an initial preference rating for EVs on a 0-100 scale. They were then shown one of 35 text interventions aimed at increasing their EV preference. After the intervention, subjects provided a post-intervention preference rating, also on a 0-100 scale. Each subject also provided demographic information.
The dataset contains demographic information, initial and post-intervention preference ratings for 4,045 subjects, and the interventions seen by each subject (5 for most subjects).
Proposed method
The proposed approach generates virtual survey participants by prompting a fine-tuned language model (LLM) to produce responses that statistically match those of a target population.
Each virtual participant is implemented by prompting the LLM with demographic information to generate a token sequence representing the survey response. Formally, the virtual participants aim to model the conditional distribution p(y|x), where y is the generated token sequence (answer) and x is the token sequence representing the demographics and survey question.
The output sequence y is preprocessed by a function F to extract the intended answer a, which is useful when the expected answer has a specific structure (e.g. numerical rating, multiple choice). In the experiments, F simply extracts the first number appearing in the sequence as a basic implementation.
This section describes generating virtual participants with demographics drawn from any distribution to target specific populations. Prompts are used to set the characteristics of the virtual participant and survey preferences. The pre-trained language model is fine-tuned on survey data to emulate human preferences. A novel numerical penalty term is introduced to the loss function, providing feedback if the generated numerical response is close to the correct answer. Agreement between the language model and humans is measured using KL-divergence (for population statistics) and root mean square error (for individual responses).
Experiments
The paper explores the performance of fine-tuned language models on predicting individual preferences from demographic data. Several experiments were conducted using different model sizes (7B, 13B, 70B) and techniques (LoRA, QLoRA) of the Llama 2 model.
Key findings:
- Larger models (70B) generally had better performance in terms of lower KL divergence (population-level metric) but not necessarily lower RMSE (individual-level metric).
- Fine-tuning improved both KL divergence and RMSE compared to pre-trained models.
- QLoRA (quantized LoRA) performed similarly to LoRA, suggesting quantization has little impact on response quality.
- Varying the sampling temperature allowed trading off between KL divergence and RMSE. Higher temperatures improved KL divergence but worsened RMSE.
- Using a numerical penalty term during fine-tuning helped decrease both KL divergence and RMSE.
The paper benchmarked the language models against supervised learning methods (SVR, CatBoost) and found that while the language models did not outperform these baselines, they offer more versatility for natural language interactions.
Conclusions and Future Work
The paper investigates using large language models (LLMs) to model the beliefs and preferences of a human population. This can be useful for simulated focus groups, virtual surveys, or piloting interventions without involving real humans. Out-of-the-box pre-trained models perform poorly at predicting human survey responses, but LLMs can be fine-tuned to better model the target population. Larger models provide better performance, but this advantage shrinks after task-specific fine-tuning. Quantization has minimal impact on fine-tuning, confirming its viability for reducing computation costs. Sampling temperature allows trading off population-wide KL-divergence against per-individual RMSE. A penalty loss term improves performance on numerical output questions by providing additional training information about response correctness. While the approach demonstrates matching survey data, future work will study aligning the model to unseen behavioral scenarios.
Appendix A EV-Shift Survey Details
Table 6 presents the questionnaires used to collect demographic information from participants, along with the possible response options. Table 7 lists the intervention texts employed in the study, which likely aimed to influence participants' perceptions or behaviors related to electric vehicles.
Appendix B Effects of model size
The QLoRA experiments showed that after 1 epoch, the validation loss increased, indicating overfitting. Larger model sizes led to faster decreases in train loss. The success rate for QLoRA on the test data was higher than the pre-trained model. An example sentence showed the pre-trained model tended to generate non-numerical tokens at the start of answers, lacking the ability to follow instructions to reply with only a number rating. While prompt engineering and in-context learning could improve the success rate, they were not the focus of this work.
Appendix C Quantization effects
The paper discusses the answer distribution for QLoRA and LoRA models, as shown in Figure 7. While there is a high correlation between the two models, individual responses may differ. Figure 7 illustrates this correlation and variation in responses.
Appendix D Temperature effects
The paper compares the preference distributions resulting from different decoding strategies employed by the model. For greedy sampling, the model exhibited a tendency to respond with a specific value from the range of 0 to 100, indicating a concentrated preference distribution. In contrast, calibrated sampling led to increased variation in the responses, suggesting a more diverse preference distribution.
Appendix E Penalty term effects
The paper discusses the learning curve and model performance with a penalty term. For the case with a penalty term, the validation loss begins increasing after 1 epoch, similar to the case with only the cross-entropy term. As the value of d is increased, the change in loss becomes smaller. This indicates that increasing d leads to higher weights for non-answer preference tokens, making it harder to decrease the loss compared to cross-entropy which evaluates the entire prompt.
Appendix F Amount of data effects
The paper examines the impact of varying the amount of training data on the model's performance on the test data. Figure 10 illustrates this effect. To maintain consistency with the 100% data volume case, the training epochs were set to 30 for 10% of the data and 6 for 50% of the data. The model checkpoints before the validation loss increased were used for evaluation. The results indicate that for some questionnaires, the KL divergence and RMSE improved as the volume of training data increased.
Appendix G Baselines
The text discusses the hyperparameter combinations utilized for the Support Vector Regression (SVR) and CatBoost models, presented in Table 10. Additionally, Figure 11 illustrates the KL-RMSE plots for each model. Despite the existence of numerous non-dominated models, the Pareto fronts for the two models exhibit general consistency. The provided information focuses on the hyperparameter settings and performance visualization through KL-RMSE plots, without delving into further details or analysis.
This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!
Related Papers
0
Using LLMs to Model the Beliefs and Preferences of Targeted Populations
Keiichi Namikoshi, Alex Filipowicz, David A. Shamma, Rumen Iliev, Candice L. Hogan, Nikos Arechiga
We consider the problem of aligning a large language model (LLM) to model the preferences of a human population. Modeling the beliefs, preferences, and behaviors of a specific population can be useful for a variety of different applications, such as conducting simulated focus groups for new products, conducting virtual surveys, and testing behavioral interventions, especially for interventions that are expensive, impractical, or unethical. Existing work has had mixed success using LLMs to accurately model human behavior in different contexts. We benchmark and evaluate two well-known fine-tuning approaches and evaluate the resulting populations on their ability to match the preferences of real human respondents on a survey of preferences for battery electric vehicles (BEVs). We evaluate our models against their ability to match population-wide statistics as well as their ability to match individual responses, and we investigate the role of temperature in controlling the trade-offs between these two. Additionally, we propose and evaluate a novel loss term to improve model performance on responses that require a numeric response.
Read more4/1/2024
💬
0
Aligning language models with human preferences
Tomasz Korbak
Language models (LMs) trained on vast quantities of text data can acquire sophisticated skills such as generating summaries, answering questions or generating code. However, they also manifest behaviors that violate human preferences, e.g., they can generate offensive content, falsehoods or perpetuate social biases. In this thesis, I explore several approaches to aligning LMs with human preferences. First, I argue that aligning LMs can be seen as Bayesian inference: conditioning a prior (base, pretrained LM) on evidence about human preferences (Chapter 2). Conditioning on human preferences can be implemented in numerous ways. In Chapter 3, I investigate the relation between two approaches to finetuning pretrained LMs using feedback given by a scoring function: reinforcement learning from human feedback (RLHF) and distribution matching. I show that RLHF can be seen as a special case of distribution matching but distributional matching is strictly more general. In chapter 4, I show how to extend the distribution matching to conditional language models. Finally, in chapter 5 I explore a different root: conditioning an LM on human preferences already during pretraining. I show that involving human feedback from the very start tends to be more effective than using it only during supervised finetuning. Overall, these results highlight the room for alignment techniques different from and complementary to RLHF.
Read more4/19/2024
0
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which may prevent a deeper comprehension of the relationships between human preferences and LLMs as well as the realization of their limitations. In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs. We first categorize the human feedback according to data sources and formats. We then summarize techniques for human preferences modeling and compare the advantages and disadvantages of different schools of models. Moreover, we present various preference usage methods sorted by the objectives to utilize human preference signals. Finally, we summarize some prevailing approaches to evaluate LLMs in terms of alignment with human intentions and discuss our outlooks on the human intention alignment for LLMs.
Read more6/19/2024
0
Aligning Large Language Models with Self-generated Preference Data
Dongyoung Kim, Kimin Lee, Jinwoo Shin, Jaehyung Kim
Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preference dataset. To tackle this problem, we propose a new framework that boosts the alignment of LLMs through Self-generated Preference data (Selfie) using only a very small amount of human-annotated preference data. Our key idea is leveraging the human prior knowledge within the small (seed) data and progressively improving the alignment of LLM, by iteratively generating the responses and learning from them with the self-annotated preference data. To be specific, we propose to derive the preference label from the logits of LLM to explicitly extract the model's inherent preference. Compared to the previous approaches using external reward models or implicit in-context learning, we observe that the proposed approach is significantly more effective. In addition, we introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data. Our experimental results demonstrate that the proposed framework significantly boosts the alignment of LLMs. For example, we achieve superior alignment performance on AlpacaEval 2.0 with only 3.3% of the ground-truth preference labels in the Ultrafeedback data compared to the cases using the entire data or state-of-the-art baselines.
Read more6/10/2024