0

0

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

    Published 12/4/2024 by Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Wenyi Zhao, Yu Yang, Xinyue Yang, Jiadai Sun, Shuntian Yao and 4 others

    Overview

    • This paper introduces WebRL, a self-evolving online curriculum reinforcement learning (RL) approach for training large language model (LLM) agents to complete web-based tasks.
    • WebRL aims to enable LLMs to learn effective web navigation and interaction skills through a process of gradually increasing task complexity.
    • The key innovation is a self-evolving curriculum where the agent's performance on current tasks determines the difficulty of future tasks, allowing it to continually challenge itself.

    Example showing a key result.

    1/4

    Example showing a key result.

    Original caption: ((a))

    WebRL task success rates compared to other methods on a human-verified dataset.

    1/2

    Models #Params Reddit Gitlab CMS Map OSS Avg. SR
    Proprietary LLMs
    GPT-4-Turbo N/A 10.5 16.7 14.3 36.7 13.3 17.6
    GPT-4o N/A 10.5 10.0 20.0 20.0 11.1 13.9
    AWM + GPT-4-0613 (Wang et al., 2024) N/A 50.9 31.8 29.1 43.3 30.8 35.5
    WebPilot + GPT-4o (Zhang et al., 2024f) N/A 65.1 39.4 24.7 33.9 36.9 37.2
    Open-sourced LLMs
    AutoWebGLM (Lai et al., 2024) 6B 9.4 15.0 28.6 24.8 17.1 18.2
    GLM-4-Chat (GLM et al., 2024) 9B 5.3 10.0 6.7 3.3 6.7 6.1
    GLM-4 + SFT (BC) 9B 47.4 13.3 31.4 23.3 13.3 22.4
    GLM-4 + Filtered BC 9B 52.6 10.0 31.4 26.7 20.0 24.8
    GLM-4 + AWR (Peng et al., 2019) 9B 52.6 16.7 34.3 30.0 22.2 27.9
    GLM-4 + DigiRL (Bai et al., 2024) 9B 63.2 30.0 34.3 26.7 26.7 31.5
    GLM-4 + WebRL (ours) 9B 57.9 50.0 48.6 36.7 37.8 43.0
    Llama3.1-Instruct (Dubey et al., 2024) 8B 0.0 3.3 2.9 3.3 11.1 4.8
    Llama3.1 + SFT (BC) 8B 36.8 6.7 20.0 33.3 17.8 20.6
    Llama3.1 + Filtered BC 8B 52.6 20.0 31.4 23.3 8.9 23.0
    Llama3.1 + WebRL (ours) 70B 78.9 50.0 54.3 40.0 44.4 49.1
    Llama3.1-Instruct (Dubey et al., 2024) 70B 10.5 16.7 17.1 20.0 4.4 12.7
    Llama3.1 + SFT (BC) 70B 52.6 20.0 20.0 26.7 13.3 23.0

    Original caption: Table 1: Task success rate (SR) of WebRL and other comparison methods, evaluated on WebArena-Lite (Zhou et al., 2023a; Liu et al., 2024), a human-verified subset of WebArena (* denotes results on full WebArena taken from literature reporting). The best and second-best models are highlighted.

    Plain English Explanation

    The paper presents a new training method called WebRL that teaches large language models how to complete tasks on the web. The idea is to start the model on simple web-based activities and then gradually increase the difficulty as the model gets better, kind of like how a human learns.

    The key aspect of WebRL is that it can automatically adjust the difficulty of the tasks based on how well the model is performing. If the model is doing well, it will move on to harder tasks to keep challenging itself. This "self-evolving curriculum" allows the model to continuously improve its web navigation and interaction skills over time.

    The researchers believe this approach can help train agents to use the vast information and capabilities of the web more effectively, which could have important applications in areas like educational AI and general web automation.

    Key Findings

    • WebRL successfully trains LLM agents to complete increasingly complex web-based tasks, demonstrating the potential of self-evolving online curriculum reinforcement learning.
    • The self-evolving curriculum allows the agents to continually challenge themselves and improve their web navigation and interaction skills over time.
    • WebRL outperforms standard RL approaches on web-based benchmark tasks, suggesting it is an effective method for training capable web agents.

    Technical Explanation

    The core idea behind WebRL is to use a self-evolving curriculum to train LLM agents in web environments. The agents start on simple web tasks and are then progressively given more difficult challenges based on their current performance.

    The key components of WebRL are:

    1. Web Environment: The agent interacts with a simulated web environment, where it can navigate pages, interact with UI elements, and complete various tasks.

    2. Online Curriculum: The difficulty of the tasks automatically adjusts based on the agent's performance. If the agent is succeeding, the tasks get harder; if it's struggling, the tasks get easier.

    3. Reinforcement Learning: The agents are trained using RL, where they receive rewards for completing tasks successfully. This incentivizes them to learn effective web interaction skills.

    4. LLM Integration: The agents use a large language model as their core policy, allowing them to leverage the model's powerful language understanding and generation capabilities.

    The researchers evaluate WebRL on a suite of web-based benchmark tasks and show that it outperforms standard RL approaches. This suggests the self-evolving curriculum is an effective way to train capable web agents using LLMs.

    Implications for the Field

    The WebRL approach represents an important step towards training general web agents that can leverage the vast information and functionality of the internet. By using self-evolving curriculum RL, the agents can continuously challenge themselves and acquire increasingly sophisticated web skills.

    This has potential applications in areas like educational AI, where agents could help students navigate online educational resources more effectively. It could also enable more powerful web automation and assistance, allowing AI systems to independently complete a wide range of web-based tasks.

    Overall, the WebRL work demonstrates the value of combining large language models, reinforcement learning, and adaptive curriculum design to train capable agents for complex, open-ended environments like the web.

    Critical Analysis

    One limitation of the WebRL approach is that it was only evaluated in simulated web environments, not on real websites. While the simulated tasks were designed to be representative of real-world web interactions, there may be additional challenges that arise when deploying these agents on the live web.

    Additionally, the paper does not provide much detail on the specific web tasks or the reward structure used in the RL training. More information on the task design and evaluation metrics would help readers better understand the capabilities and limitations of the WebRL agents.

    It would also be valuable to see how WebRL compares to other approaches for training web agents, such as those that use large language models in different ways or incorporate additional inductive biases. Comparing WebRL to a broader set of baselines could further contextualize its strengths and weaknesses.

    Overall, the WebRL work is a promising step forward, but additional research is needed to fully understand the potential and limitations of this approach for training web-savvy AI agents.

    Conclusion

    The WebRL paper introduces a novel self-evolving curriculum RL method for training large language model agents to navigate and interact with web environments. By gradually increasing task difficulty based on the agent's performance, WebRL enables continual skill development and the acquisition of sophisticated web capabilities.

    This work represents an important advance towards more capable and adaptable web agents, with potential applications in educational AI, web automation, and other areas that require fluid interaction with online information and functionality. While further research is needed, the WebRL approach demonstrates the value of integrating advanced training techniques with powerful language models to tackle complex, open-ended environments like the web.

    Full paper

    Loading...

    Loading PDF viewer...

    Read original: arXiv:2411.02337



    This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

    Total Score

    16

    Follow @aimodelsfyi on 𝕏 →

    Related Papers

    💬

    Total Score

    0

    Large Language Models Can Self-Improve At Web Agent Tasks

    Ajay Patel, Markus Hofmarcher, Claudiu Leoveanu-Condrei, Marius-Constantin Dinu, Chris Callison-Burch, Sepp Hochreiter

    Training models to act as agents that can effectively navigate and perform actions in a complex environment, such as a web browser, has typically been challenging due to lack of training data. Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion, purely guided by natural language instructions as prompts. Recent research has also demonstrated LLMs have the capability to exceed their base performance through self-improvement, i.e. fine-tuning on data generated by the model itself. In this work, we explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. In WebArena, an agent must autonomously navigate and perform actions on web pages to achieve a specified objective. We explore fine-tuning on three distinct synthetic training data mixtures and achieve a 31% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure. We additionally contribute novel evaluation metrics for assessing the performance, robustness, capabilities, and quality of trajectories of our fine-tuned agent models to a greater degree than simple, aggregate-level benchmark scores currently used to measure self-improvement.

    Read more

    10/3/2024

    💬

    Total Score

    0

    Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

    Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, Bin Liu

    Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.

    Read more

    4/23/2024

    Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs
    Total Score

    0

    Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs

    Bahar Radmehr, Adish Singla, Tanja Kaser

    There has been a growing interest in developing learner models to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the agent's ability to generalize skills across tasks. In this paper, we aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations to find the best interaction strategy, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL agents that combine these two strategies to improve agents' performance and generalization. To support the development and evaluation of these agents, we introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations. Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions. In contrast, LLM-based agents perform better in asking diagnostic questions but fall short of completing the task. Finally, hybrid LLM-assisted RL agents enable us to overcome these limitations, highlighting the potential of combining RL and LLMs to develop high-performing agents for open-ended learning environments.

    Read more

    5/1/2024

    Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
    Total Score

    0

    Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning

    Lucas-Andrei Thil, Mirela Popa, Gerasimos Spanakis

    Recent advancements in language models have demonstrated remarkable improvements in various natural language processing (NLP) tasks such as web navigation. Supervised learning (SL) approaches have achieved impressive performance while utilizing significantly less training data compared to previous methods. However, these SL-based models fall short when compared to reinforcement learning (RL) approaches, which have shown superior results. In this paper, we propose a novel approach that combines SL and RL techniques over the MiniWoB benchmark to leverage the strengths of both methods. We also address a critical limitation in previous models' understanding of HTML content, revealing a tendency to memorize target elements rather than comprehend the underlying structure. To rectify this, we propose methods to enhance true understanding and present a new baseline of results. Our experiments demonstrate that our approach outperforms previous SL methods on certain tasks using less data and narrows the performance gap with RL models, achieving 43.58% average accuracy in SL and 36.69% when combined with a multimodal RL approach. This study sets a new direction for future web navigation and offers insights into the limitations and potential of language modeling for computer tasks.

    Read more

    5/2/2024