An open-source model for program synthesis. Competitive with OpenAI Codex.

## Model overview

`CodeGen` is an open-source AI model for program synthesis, developed by the Salesforce AI Research team. It is competitive with OpenAI's Codex model, and has been released in several versions - CodeGen1, CodeGen2, and the latest CodeGen2.5. The model is designed to generate and complete code snippets, with a focus on multi-turn program synthesis tasks.

Similar models include [`stable-diffusion`](https://aimodels.fyi/models/replicate/stable-diffusion-stability-ai), a latent text-to-image diffusion model, [`clip-features`](https://aimodels.fyi/models/replicate/clip-features-andreasjansson), a model for extracting CLIP features, and [`codellama-7b-instruct-gguf`](https://aimodels.fyi/models/replicate/codellama-7b-instruct-gguf-andreasjansson), a code-focused language model. However, `CodeGen` is uniquely focused on program synthesis and code generation tasks.

## Model inputs and outputs

`CodeGen` takes in a prompt or starting code snippet, and generates relevant code completions. The model can handle a range of programming languages and tasks, from simple function implementations to more complex multi-step programs.

### Inputs
- **Context**: The starting code snippet or prompt for the model to complete.
- **Num Return Sequences**: The number of code completions to generate.
- **Max Length**: The maximum length of the generated code snippets.
- **Temperature**: A parameter controlling the diversity of the generated outputs.
- **Top P**: A sampling parameter that controls the "creativity" of the model.
- **Seed**: A seed value for reproducibility.
- **Prepend Imports**: Whether to automatically prepend a `numpy` import to the context.
- **Prepend Context to Output**: Whether to prepend the input context to the generated output.

### Outputs
- **Code Completions**: The generated code snippets, which can be either returned as a single string or as a Markdown URL for download.

## Capabilities

`CodeGen` is highly capable at generating and completing code snippets across a wide range of programming tasks and languages. It can handle multi-step program synthesis, where the model is given a high-level description or partial implementation and asked to complete the full program. The model also exhibits strong performance on code infilling, where it can insert relevant code into a partially completed program.

## What can I use it for?

`CodeGen` is well-suited for a variety of software development and automation tasks. It can be used to assist programmers by generating boilerplate code, providing code completions, and aiding in program synthesis. The model could also be integrated into tools for automated code generation, low-code/no-code development, and programming education.

## Things to try

One interesting aspect of `CodeGen` is its ability to handle multi-turn program synthesis tasks. You could try providing the model with a high-level description of a program, and then iteratively refine and expand the code through multiple generations. Another interesting experiment would be to explore the model's performance on code infilling, where you provide a partially completed program and ask `CodeGen` to insert the relevant code.