[](#octopus-v2-on-device-language-model-for-super-agent)Octopus V2: On-device language model for super agent
============================================================================================================

[](#octopus-v3-release)Octopus V3 Release
-----------------------------------------

We are excited to announce that Octopus v3 is now available! check our [technical report](https://arxiv.org/abs/2404.11459) and [Octopus V3 tweet](https://twitter.com/nexa4ai/status/1780783383737676236)!

Key Features of Octopus v3:

*   **Efficiency**: **Sub-billion** parameters, making it less than half the size of its predecessor, Octopus v2.
*   **Multi-Modal Capabilities**: Proceed both text and images inputs.
*   **Speed and Accuracy**: Incorporate our **patented** functional token technology, achieving function calling accuracy on par with GPT-4V and GPT-4.
*   **Multilingual Support**: Simultaneous support for English and Mandarin.

Check the Octopus V3 demo video for [Android and iOS](https://octopus3.nexa4ai.com/).

![nexa-octopus-v3](/NexaAIDev/Octopus-v2/resolve/main/octopus-v3.jpeg)

[](#octopus-v2)Octopus V2
-------------------------

We are a very small team with many work. Please give us more time to prepare the code, and we will **open source** it. We hope Octopus v2 model will be helpful for you. Let's democratize AI agents for everyone. We've received many requests from car industry, health care, financial system etc. Octopus model is able to be applied to **any function**, and you can start to think about it now.

\- [Nexa AI Product](https://www.nexa4ai.com/) - [ArXiv](https://arxiv.org/abs/2404.01744) - [Video Demo](https://www.youtube.com/watch?v=jhM0D0OObOw&ab_channel=NexaAI)

![nexa-octopus](/NexaAIDev/Octopus-v2/resolve/main/Octopus-logo.jpeg)

[](#introduction)Introduction
-----------------------------

Octopus-V2-2B, an advanced open-source language model with 2 billion parameters, represents Nexa AI's research breakthrough in the application of large language models (LLMs) for function calling, specifically tailored for Android APIs. Unlike Retrieval-Augmented Generation (RAG) methods, which require detailed descriptions of potential function argumentssometimes needing up to tens of thousands of input tokensOctopus-V2-2B introduces a unique **functional token** strategy for both its training and inference stages. This approach not only allows it to achieve performance levels comparable to GPT-4 but also significantly enhances its inference speed beyond that of RAG-based methods, making it especially beneficial for edge computing devices.

 **On-device Applications**: Octopus-V2-2B is engineered to operate seamlessly on Android devices, extending its utility across a wide range of applications, from Android system management to the orchestration of multiple devices.

 **Inference Speed**: When benchmarked, Octopus-V2-2B demonstrates a remarkable inference speed, outperforming the combination of "Llama7B + RAG solution" by a factor of 36X on a single A100 GPU. Furthermore, compared to GPT-4-turbo (gpt-4-0125-preview), which relies on clusters A100/H100 GPUs, Octopus-V2-2B is 168% faster. This efficiency is attributed to our **functional token** design.

 **Accuracy**: Octopus-V2-2B not only excels in speed but also in accuracy, surpassing the "Llama7B + RAG solution" in function call accuracy by 31%. It achieves a function call accuracy comparable to GPT-4 and RAG + GPT-3.5, with scores ranging between 98% and 100% across benchmark datasets.

 **Function Calling Capabilities**: Octopus-V2-2B is capable of generating individual, nested, and parallel function calls across a variety of complex scenarios.

[](#example-use-cases)Example Use Cases
---------------------------------------

![ondevice](/NexaAIDev/Octopus-v2/resolve/main/tool-usage-compressed.png)

You can run the model on a GPU using the following code.

    from transformers import AutoTokenizer, GemmaForCausalLM
    import torch
    import time
    
    def inference(input_text):
        start_time = time.time()
        input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
        input_length = input_ids["input_ids"].shape[1]
        outputs = model.generate(
            input_ids=input_ids["input_ids"], 
            max_length=1024,
            do_sample=False)
        generated_sequence = outputs[:, input_length:].tolist()
        res = tokenizer.decode(generated_sequence[0])
        end_time = time.time()
        return {"output": res, "latency": end_time - start_time}
    
    model_id = "NexaAIDev/Octopus-v2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = GemmaForCausalLM.from_pretrained(
        model_id, torch_dtype=torch.bfloat16, device_map="auto"
    )
    
    input_text = "Take a selfie for me with front camera"
    nexa_query = f"Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:"
    start_time = time.time()
    print("nexa model result:\n", inference(nexa_query))
    print("latency:", time.time() - start_time," s")
    

[](#evaluation)Evaluation
-------------------------

The benchmark result can be viewed in [this excel](/NexaAIDev/Octopus-v2/blob/main/android_benchmark.xlsx), which is manually verified. All the queries in the benchmark test are sampled by Gemini.

![ondevice](/NexaAIDev/Octopus-v2/resolve/main/latency_plot.jpg) ![ondevice](/NexaAIDev/Octopus-v2/resolve/main/accuracy_plot.jpg)

**Note**: One can notice that the query includes all necessary parameters used for a function. It is expected that query includes all parameters during inference as well.

[](#training-data)Training Data
-------------------------------

We wrote 20 Android API descriptions to used to train the models, see [this file](/NexaAIDev/Octopus-v2/blob/main/android_functions.txt) for details. The Android API implementations for our demos, and our training data will be published later. Below is one Android API description example

    def get_trending_news(category=None, region='US', language='en', max_results=5):
        """
        Fetches trending news articles based on category, region, and language.
    
        Parameters:
        - category (str, optional): News category to filter by, by default use None for all categories. Optional to provide.
        - region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses 'US'. Optional to provide.
        - language (str, optional): ISO 639-1 language code for article language, by default uses 'en'. Optional to provide.
        - max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide.
    
        Returns:
        - list[str]: A list of strings, each representing an article. Each string contains the article's heading and URL.
        """
    

[](#license)License
-------------------

This model was trained on commercially viable data. For use of our model, refer to the [license information](https://www.nexa4ai.com/licenses).

[](#references)References
-------------------------

We thank the Google Gemma team for their amazing models!

    @misc{gemma-2023-open-models,
      author = {{Gemma Team, Google DeepMind}},
      title = {Gemma: Open Models Based on Gemini Research and Technology},
      url = {https://goo.gle/GemmaReport},  
      year = {2023},
    }
    

[](#citation)Citation
---------------------

    @misc{chen2024octopus,
          title={Octopus v2: On-device language model for super agent}, 
          author={Wei Chen and Zhiyuan Li},
          year={2024},
          eprint={2404.01744},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }
    

[](#contact)Contact
-------------------

Please [contact us](mailto:alexchen@nexa4ai.com) to reach out for any issues and comments!

## Model overview

`Octopus-V2-2B` is an advanced open-source language model with 2 billion parameters, representing a research breakthrough from Nexa AI in applying large language models (LLMs) for function calling, specifically tailored for Android APIs. Unlike Retrieval-Augmented Generation (RAG) methods, which require detailed descriptions of potential function arguments sometimes needing up to tens of thousands of input tokens, `Octopus-V2-2B` introduces a unique **functional token** strategy for both its training and inference stages. This approach not only allows it to achieve performance levels comparable to GPT-4 but also significantly enhances its inference speed beyond that of RAG-based methods, making it especially beneficial for edge computing devices.

Similar models include the [2B instruct version of Google's Gemma model](https://aimodels.fyi/models/huggingFace/gemma-2b-it-google-deepmind), the [7B instruct version of Google's Gemma model](https://aimodels.fyi/models/huggingFace/gemma-7b-it-google-deepmind), and [GPT-2B-001](https://aimodels.fyi/models/huggingFace/gpt-2b-001-nvidia) from NVIDIA, all of which are large language models with similar capabilities.

## Model inputs and outputs

### Inputs
- **Text**: The model can process a variety of text-based inputs, such as questions, prompts, or documents.

### Outputs
- **Generated text**: The model outputs generated English-language text in response to the input, such as answers to questions, summaries of documents, or function call code.

## Capabilities

`Octopus-V2-2B` is engineered to operate seamlessly on Android devices, extending its utility across a wide range of applications, from Android system management to the orchestration of multiple devices. Its key capabilities include high performance on function calling tasks, comparable to GPT-4, and significantly faster inference speed than RAG-based methods, making it well-suited for edge computing use cases.

## What can I use it for?

The `Octopus-V2-2B` model can be used for a variety of text-based applications, such as:

- **Content Creation and Communication**: Generating creative text formats like poems, scripts, marketing copy, or chatbot responses.
- **Research and Education**: Powering NLP research, developing language learning tools, or assisting with knowledge exploration.

The model's fast inference speed and Android-focused design make it particularly well-suited for mobile and edge computing applications, such as on-device system management or device coordination.

## Things to try

One key capability of `Octopus-V2-2B` is its high performance on function calling tasks, which is achieved through its unique **functional token** strategy. This approach allows the model to generate accurate function call code without requiring long, detailed input descriptions, making it more efficient and practical for certain use cases.

Developers and researchers may want to experiment with using `Octopus-V2-2B` for tasks that involve generating or manipulating code, such as automating Android API calls or creating custom device coordination scripts. The model's speed and accuracy on these types of tasks could make it a valuable tool for a range of edge computing and mobile development projects.