AI model preview image
CogVideo is a model that generates videos from textual descriptions. It uses a combination of natural language processing and computer vision techniques to understand the input text and convert it into visual content. The model employs pre-trained neural networks to extract relevant information from the text and then constructs a video by sequencing and blending the appropriate video clips. CogVideo can be used in various applications such as video production, multimedia content creation, and virtual reality experiences.

Use cases

CogVideo's text-to-video generation model has a wide range of potential use cases in various industries. In the field of video production, the model can greatly streamline the process of creating promotional videos, explainer videos, and advertisements. Instead of manually selecting and editing video clips to match a script, CogVideo can automatically generate the entire video based on the provided textual description. This can save time and effort for video producers and allow for more efficient content creation. In addition, CogVideo can be applied in the realm of multimedia content creation. For example, it can be used to automatically generate video summaries of articles, blog posts, or research papers. This can be particularly beneficial for news organizations or educational platforms that want to present textual information in a more engaging and visually appealing format. Furthermore, the model's capabilities can be leveraged to enhance virtual reality experiences. Virtual reality developers can use CogVideo to convert textual descriptions of virtual environments into immersive videos that provide users with a realistic preview of what to expect before entering the virtual world. This enhances the overall experience and can be valuable in fields such as architecture, tourism, and gaming. Overall, CogVideo has the potential to revolutionize the way videos are created and consumed. Its applications in video production, multimedia content creation, and virtual reality experiences open up possibilities for innovative products and practical uses that can enhance various industries.



Cost per run
Avg run time
Nvidia A100 (40GB) GPU

Creator Models

Latent Viz$0.001165,010
Arf Svox2$?14,546
Majesty Diffusion$?8,032
K Diffusion$?6,806
Disco Diffusion$?63,295

Similar Models

Try it!

You can use this area to play around with demo applications that incorporate the Cogvideo model. These demos are maintained and hosted externally by third-party creators. If you see an error, message me on Twitter.

Currently, there are no demos available for this model.


Summary of this model and related resources.

Model NameCogvideo
Text-to-video generation
Model LinkView on Replicate
API SpecView on Replicate
Github LinkView on Github
Paper LinkView on Arxiv


How popular is this model, by number of runs? How popular is the creator, by the sum of all their runs?

Model Rank
Creator Rank


How much does it cost to run this model? How long, on average, does it take to complete a run?

Cost per Run$-
Prediction HardwareNvidia A100 (40GB) GPU
Average Completion Time-