Get a weekly rundown of the latest AI models and research... subscribe! https://aimodels.substack.com/

Stream of Search (SoS): Learning to Search in Language

2404.03683

YC

2

Reddit

29

Published 4/8/2024 by Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman
Stream of Search (SoS): Learning to Search in Language

Abstract

Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.

Get summaries of the top AI research delivered straight to your inbox:

Overview

  • This paper introduces a novel approach called "Stream of Search" (SoS) that enables language models to learn how to search effectively within their own language space.
  • The key idea is to train the model to generate a stream of search queries that incrementally refine an original query, rather than just producing a single output.
  • This allows the model to actively explore and navigate the language space to find the most relevant information, similar to how humans search on the internet.

Plain English Explanation

The paper presents a new way for language models, like the ones that power chatbots and virtual assistants, to improve their search capabilities. Integrating Hyperparameter Search into GRAM and Can Small Language Models Help Large Language Models? have explored related ideas.

Traditionally, these models would simply generate a single response to a query. But the "Stream of Search" (SoS) approach trains the model to instead produce a series of refined search queries. This allows the model to explore the space of possible responses, much like how humans might refine their searches on the internet to find the most relevant information.

By learning to search within its own language abilities, the model can better understand the nuances and context of the original query. This is similar to how Dwell: Beginning How Language Models Embed Long-Term Memory explored how language models can build up an understanding over multiple interactions.

The key insight is that training the model to actively search, rather than just produce a single output, can lead to more accurate and useful responses. This could have interesting applications for virtual assistants, chatbots, and other language-based AI systems.

Technical Explanation

The core of the SoS approach is to train the language model to generate a "stream" of search queries, where each query iteratively refines the previous one. This is done by structuring the training process as a multi-step search task.

The model is first given an initial query, and then tasked with producing a sequence of refined queries that gradually hone in on the most relevant information. The quality of the final query in the sequence is then used to provide feedback and update the model's parameters.

This training process encourages the model to explore the space of possible queries, rather than simply outputting a single fixed response. The authors show that this leads to better performance on a range of language understanding and retrieval tasks, compared to standard language models.

The Solving Ability Amplification Strategy (SAAS) paper explored a related idea of using iterative refinement to improve model performance. The SoS approach builds on these insights, applying them specifically to the domain of language search and retrieval.

Critical Analysis

One key limitation of the SoS approach is that it relies on being able to provide high-quality feedback on the final query in the sequence. In real-world applications, obtaining such detailed feedback may be challenging.

The paper also does not address how the SoS approach would scale to more complex, open-ended search tasks. The experiments focus on relatively constrained, factual retrieval scenarios. Applying the technique to more ambiguous, exploratory search tasks may require further innovations.

Additionally, the computational overhead of generating and evaluating multiple search queries may limit the practical deployment of SoS in some settings. The tradeoffs between search quality and efficiency would need to be carefully considered.

Overall, the SoS approach represents an interesting step towards improving the search capabilities of language models. However, further research is needed to fully understand its strengths, weaknesses, and potential real-world applications. Readers are encouraged to Do Sentence Transformers Learn Quasi-Geospatial Concepts? and form their own conclusions about the merits of this work.

Conclusion

The "Stream of Search" (SoS) approach introduced in this paper offers a novel way for language models to learn how to search effectively within their own language space. By training the model to generate a sequence of refined search queries, rather than a single output, the authors demonstrate improvements in language understanding and retrieval tasks.

This work represents an interesting step towards more sophisticated language-based AI systems that can actively explore and navigate information, similar to how humans search the internet. While the current implementation has some limitations, the core ideas behind SoS could have important implications for the development of virtual assistants, chatbots, and other language-based applications.

As the field of language AI continues to evolve, techniques like SoS that enhance the search and exploration capabilities of models will likely become increasingly important. Readers are encouraged to stay up-to-date with the latest advancements in this rapidly advancing area of research.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park

YC

0

Reddit

0

This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs.

Read more

4/9/2024

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian

YC

0

Reddit

0

While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^*$ search algorithm. We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than the $A^*$ implementation that was used for training initially. In our training method, $A^*$'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$times$ smaller model size and a 10$times$ smaller training dataset. Lastly, we demonstrate how Searchformer scales to larger and more complex decision making tasks with improved percentage of solved tasks and shortened search dynamics.

Read more

4/30/2024

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

Chenyang An, Zhibo Chen, Qihao Ye, Emily First, Letian Peng, Jiayun Zhang, Zihan Wang, Sorin Lerner, Jingbo Shang

YC

0

Reddit

0

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.

Read more

4/12/2024

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam

YC

0

Reddit

0

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.

Read more

4/9/2024