0

0

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

    Published 11/20/2024 by Laura Ruis, Maximilian Mozes, Juhan Bae, Siddhartha Rao Kamalakara, Dwarak Talupuru, Acyr Locatelli, Robert Kirk, Tim Rocktaschel, Edward Grefenstette, Max Bartolo

    Overview

    • Research examines how large language models (LLMs) leverage procedural knowledge from pretraining data

    • Study reveals pretrained knowledge drives reasoning abilities more than previously thought

    • Novel influence tracing method developed to analyze document impact on model outputs

    • Findings show models rely heavily on procedural patterns learned during initial training

    LLMs rely more on procedural than factual knowledge for reasoning.

    1/2

    LLMs rely more on procedural than factual knowledge for reasoning.

    Original caption: Figure 1: A summary of our most important findings towards answering the question “how do LLMs learn to reason from pretraining data?” We rank 5 million pretraining documents according to their influence on the likelihood of completions of two models, Cohere’s Command R 7B and 35B, for 40 factual and 40 reasoning queries. We find that procedural knowledge drives influence on reasoning traces: a document’s influence on the reasoning traces of one query is strongly predictive of that document’s influence on another query with the same mathematical task, in 3 of the 4 cases. We show this on the left through arrows indicating influence, and on the right through correlations of all 5M document influences between a random sample of 10 queries per task (a plot with all queries can be found in Figure 12 in Appendix A.9.1). Further, we find that the answers to factual queries often show up in the top 0.01% of pretraining documents (see text in bottom row of documents), but not for the reasoning questions. Finally, individual documents influence reasoning traces much less strongly than factual answer generations, indicating models rely on documents less when reasoning. All documents and queries shown are redacted versions of real data, and the relations are based on documents found in the top 50 for the queries.

    Example of a two-step arithmetic reasoning problem.

    1/2

    Reasoning Query Set (Arithmetic) Prompt Completion (by 7B)
    Calculate the answer: (7 - 4) * 7
    Think step-by-step.
    First, let’s evaluate the expression inside the parentheses:
    7 - 4 = 3
    Now, let’s multiply the result by 7:
    3 * 7 = 21
    Therefore, the answer to the expression is 21.

    Original caption: Table 1: Example from the reasoning set that involves simple two-step arithmetic.

    Plain English Explanation

    Large language models learn fundamental reasoning skills during their initial training, similar to how humans learn basic problem-solving patterns early in life. Rather than memorizing specific answers, these models pick up general approaches for tackling problems.

    The researchers developed a way to track how different training documents influence a model's reasoning abilities. Like tracing footprints in sand, this method reveals which training experiences shaped the model's problem-solving strategies.

    Reliable reasoning emerges from exposure to many examples of logical thinking patterns. Just as a child learns to solve puzzles by seeing many examples, language models develop reasoning capabilities by processing millions of documents containing procedural knowledge.

    Key Findings

    The research uncovered that procedural knowledge gained during pretraining plays a crucial role in model reasoning. Models don't simply memorize answers - they learn general problem-solving approaches.

    Learning dynamics show that exposure to procedural texts helps models develop systematic reasoning abilities. The study found strong connections between pretraining data and downstream reasoning performance.

    Models demonstrate better reasoning on tasks that align with procedural patterns encountered during training. This suggests that careful curation of training data could enhance reasoning capabilities.

    Technical Explanation

    The researchers developed an influence tracing methodology to analyze how pretraining documents affect model outputs. This involves calculating importance scores for training examples based on their impact on specific reasoning tasks.

    Procedural knowledge transfer occurs through exposure to step-by-step explanations, logical arguments, and problem-solving demonstrations in the training data. The study tracked how this knowledge impacts downstream task performance.

    The methodology revealed that models rely heavily on procedural patterns learned during pretraining when tackling new reasoning challenges. This suggests that reasoning abilities emerge from exposure to structured thinking patterns rather than task-specific fine-tuning.

    Critical Analysis

    Several limitations exist in the current study. The influence tracing method may not capture all relevant training influences, and the analysis focuses primarily on English language content.

    The research doesn't fully address how different types of procedural knowledge interact or how to optimize training data selection for enhanced reasoning capabilities. More work is needed to understand the precise mechanisms of knowledge transfer.

    Beyond accuracy, questions remain about the reliability and consistency of model reasoning across different domains and task types.

    Conclusion

    This research demonstrates that procedural knowledge acquired during pretraining fundamentally shapes how language models reason. The findings suggest that careful curation of training data could lead to more capable and reliable AI systems.

    The study opens new paths for improving model reasoning abilities through better understanding of knowledge transfer mechanisms. Future work should focus on optimizing training data selection and developing more robust evaluation methods for reasoning capabilities.

    Full paper

    Loading...

    Loading PDF viewer...

    Read original: arXiv:2411.12580



    This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

    Total Score

    15

    Follow @aimodelsfyi on 𝕏 →