✨DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

## Model overview

`diffbir` is a versatile AI model developed by researcher [Xinqi Lin](https://0x3f3f3f3fun.github.io/) and team that can tackle various blind image restoration tasks, including blind image super-resolution, blind face restoration, and blind image denoising. Unlike traditional image restoration models that rely on fixed degradation assumptions, `diffbir` leverages the power of generative diffusion models to handle a wide range of real-world image degradations in a blind manner. This approach enables `diffbir` to produce high-quality restored images without requiring prior knowledge about the specific degradation process.

The model is similar to other powerful image restoration models like [GFPGAN](https://aimodels.fyi/models/replicate/gfpgan-tencentarc), which specializes in restoring old photos and AI-generated faces, and [SuperIR](https://aimodels.fyi/models/replicate/supir-cjwbw), which practices model scaling for photo-realistic image restoration. However, `diffbir` distinguishes itself by its broad applicability and its ability to handle a wide range of real-world image degradations in a unified manner.

## Model inputs and outputs

### Inputs
- **input**: Path to the input image you want to enhance.
- **upscaling_model_type**: Choose the type of model best suited for the primary content of the image: 'faces' for portraits and 'general_scenes' for everything else.
- **restoration_model_type**: Select the restoration model that aligns with the content of your image. This model is responsible for image restoration which removes degradations.
- **super_resolution_factor**: Factor by which the input image resolution should be increased. For instance, a factor of 4 will make the resolution 4 times greater in both height and width.
- **steps**: The number of enhancement iterations to perform. More steps might result in a clearer image but can also introduce artifacts.
- **repeat_times**: Number of times the enhancement process is repeated by feeding the output back as input. This can refine the result but might also introduce over-enhancement issues.
- **tiled**: Whether to use patch-based sampling. This can be useful for very large images to enhance them in smaller chunks rather than all at once.
- **tile_size**: Size of each tile (or patch) when 'tiled' option is enabled. Determines how the image is divided during patch-based enhancement.
- **tile_stride**: Distance between the start of each tile when the image is divided for patch-based enhancement. A smaller stride means more overlap between tiles.
- **use_guidance**: Use latent image guidance for enhancement. This can help in achieving more accurate and contextually relevant enhancements.
- **guidance_scale**: For 'general_scenes': Scale factor for the guidance mechanism. Adjusts the influence of guidance on the enhancement process.
- **guidance_space**: For 'general_scenes': Determines in which space (RGB or latent) the guidance operates. 'latent' can often provide more subtle and context-aware enhancements.
- **guidance_repeat**: For 'general_scenes': Number of times the guidance process is repeated during enhancement.
- **guidance_time_start**: For 'general_scenes': Specifies when (at which step) the guidance mechanism starts influencing the enhancement.
- **guidance_time_stop**: For 'general_scenes': Specifies when (at which step) the guidance mechanism stops influencing the enhancement.
- **has_aligned**: For 'faces' mode: Indicates if the input images are already cropped and aligned to faces. If not, the model will attempt to do this.
- **only_center_face**: For 'faces' mode: If multiple faces are detected, only enhance the center-most face in the image.
- **background_upsampler**: For 'faces' mode: Model used to upscale the background in images where the primary subject is a face.
- **face_detection_model**: For 'faces' mode: Model used for detecting faces in the image. Choose based on accuracy and speed preferences.
- **background_upsampler_tile**: For 'faces' mode: Size of each tile used by the background upsampler when dividing the image into patches.
- **background_upsampler_tile_stride**: For 'faces' mode: Distance between the start of each tile when the background is divided for upscaling. A smaller stride means more overlap between tiles.

### Outputs
- **Output**: The enhanced image(s) produced by the `diffbir` model.

## Capabilities

`diffbir` can handle a wide range of real-world image degradations, including low resolution, noise, and blur, without requiring prior knowledge about the specific degradation process. The model is capable of performing blind image super-resolution, blind face restoration, and blind image denoising, producing high-quality results that outperform traditional restoration methods.

## What can I use it for?

You can use `diffbir` to enhance various types of images, from portraits and landscapes to old photos and AI-generated images. The model's versatility makes it a powerful tool for tasks such as:

- Upscaling low-resolution images while preserving details and avoiding artifacts
- Restoring degraded or low-quality facial images, such as those from old photos or AI-generated faces
- Removing noise and artifacts from images, improving their overall quality and clarity

The broad applicability of `diffbir` makes it a valuable resource for photographers, digital artists, and anyone working with visual content that requires restoration or enhancement.

## Things to try

One interesting aspect of `diffbir` is its ability to leverage latent image guidance for more accurate and context-aware enhancements. By specifying the appropriate guidance settings, you can explore how this feature affects the restoration results and find the right balance between quality and fidelity.

Another feature worth experimenting with is the patch-based sampling approach, which can be useful for enhancing very large images. By dividing the image into smaller tiles and processing them individually, you can reduce the memory requirements and potentially achieve better results, especially for high upscaling factors.

Overall, the versatility and performance of `diffbir` make it a compelling choice for a wide range of image restoration and enhancement tasks. By exploring the various options and capabilities of the model, you can unlock its full potential and achieve impressive results.

Blip 3 / XGen-MM, Answers questions about images ({blip3,xgen-mm}-phi3-mini-base-r-v1)

## Model overview

`blip-3` is a series of large multimodal models (LMMs) developed by Salesforce AI Research. These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. `blip3-phi3-mini-instruct-r-v1` is a fine-tuned version of the pretrained `blip3-phi3-mini-base-r-v1` model that achieves state-of-the-art performance among open-source and closed-source vision-language models under 5 billion parameters. It supports flexible high-resolution image encoding with efficient visual token sampling.

The `blip-3` model series is related to other multimodal models like [SDXL-Lightning](https://aimodels.fyi/models/replicate/sdxl-lightning-4step-bytedance) from ByteDance, which generates high-quality images in 4 steps, and the original [BLIP](https://aimodels.fyi/models/replicate/blip-salesforce) model from Salesforce, which generates image captions. The [BLIP-2](https://aimodels.fyi/models/replicate/blip-2-andreasjansson) model from Andreas Jansson also answers questions about images.

## Model inputs and outputs

### Inputs
- **Image**: The input image to generate captions or answer questions about.
- **Question**: The question to ask about the input image.
- **Context** (optional): Previous questions and answers to use as context for answering the current question.
- **Miscellaneous parameters**: Options to control the output, such as the number of top tokens to consider, the temperature for sampling, and whether to use beam search.

### Outputs
- **String**: The model's response to the input question, either a caption or an answer.

## Capabilities

The `blip-3` models excel at answering questions about images, with state-of-the-art performance on benchmarks like COCO, NoCaps, TextCaps, OKVQA, TextVQA, VizWiz, and VQAv2. They can provide detailed, polite, and helpful answers to a wide variety of image-related questions.

## What can I use it for?

The `blip-3` models can be useful for building applications that need to understand and reason about images, such as:

- Visual question answering systems
- Image captioning tools
- Multimodal search engines
- Automated image analysis for e-commerce or other domains

The [maintainer's profile](https://aimodels.fyi/creators/replicate/zsxkib) also showcases their work on the related [uform-gen](https://aimodels.fyi/models/replicate/uform-gen-zsxkib) model, which is a fast 1.5B image captioning and VQA multimodal language model.

## Things to try

One interesting aspect of the `blip-3` models is their ability to perform in-context learning, where they can leverage previous questions and answers to provide more contextual responses. You could experiment with different ways of providing context to the model and see how it affects the quality and relevance of the answers.

Another area to explore is the model's performance on specialized tasks like document understanding, chart analysis, or OCR-related questions. The README mentions the model was trained on a mixture of academic VQA datasets covering these types of tasks, so it could be worth testing its capabilities in these domains.

Make realistic images of real people instantly

## Model overview

`instant-id` is a state-of-the-art AI model developed by the InstantX team that can generate realistic images of real people instantly. It utilizes a tuning-free approach to achieve identity-preserving generation with only a single input image. The model is capable of various downstream tasks such as stylized synthesis, where it can blend the facial features and style of the input image. Compared to similar models like [AbsoluteReality V1.8.1](https://aimodels.fyi/models/replicate/absolutereality-v181-asiryan), [Reliberate v3](https://aimodels.fyi/models/replicate/reliberate-v3-asiryan), [Stable Diffusion](https://aimodels.fyi/models/replicate/stable-diffusion-stability-ai), [Photomaker](https://aimodels.fyi/models/replicate/photomaker-tencentarc), and [Photomaker Style](https://aimodels.fyi/models/replicate/photomaker-style-tencentarc), `instant-id` achieves better fidelity and retains good text editability, allowing the generated faces and styles to blend more seamlessly.

## Model inputs and outputs

`instant-id` takes a single input image of a face and a text prompt, and generates one or more realistic images that preserve the identity of the input face while incorporating the desired style and content from the text prompt. The model utilizes a novel identity-preserving generation technique that allows it to generate high-quality, identity-preserving images in a matter of seconds.

### Inputs
- **Image**: The input face image used as a reference for the generated images.
- **Prompt**: The text prompt describing the desired style and content of the generated images.
- **Seed** (optional): A random seed value to control the randomness of the generated images.
- **Pose Image** (optional): A reference image used to guide the pose of the generated images.

### Outputs
- **Images**: One or more realistic images that preserve the identity of the input face while incorporating the desired style and content from the text prompt.

## Capabilities

`instant-id` is capable of generating highly realistic images of people in a variety of styles and settings, while preserving the identity of the input face. The model can seamlessly blend the facial features and style of the input image, allowing for unique and captivating results. This makes the model a powerful tool for a wide range of applications, from creative content generation to virtual avatars and character design.

## What can I use it for?

`instant-id` can be used for a variety of applications, such as:
- **Creative Content Generation**: Quickly generate unique and realistic images for use in art, design, and multimedia projects.
- **Virtual Avatars**: Create personalized virtual avatars that can be used in games, social media, or other digital environments.
- **Character Design**: Develop realistic and expressive character designs for use in animation, films, or video games.
- **Augmented Reality**: Integrate generated images into augmented reality experiences, allowing for the seamless blending of real and virtual elements.

## Things to try

With `instant-id`, you can experiment with a wide range of text prompts and input images to generate unique and captivating results. Try prompts that explore different styles, genres, or themes, and see how the model can blend the facial features and aesthetics in unexpected ways. You can also experiment with different input images, from close-up portraits to more expressive or stylized faces, to see how the model adapts and responds. By pushing the boundaries of what's possible with identity-preserving generation, you can unlock a world of creative possibilities.

Create song covers with any RVC v2 trained AI voice from audio files.

## Model overview

The `realistic-voice-cloning` model, created by [zsxkib](https://aimodels.fyi/creators/replicate/zsxkib), is an AI model that can create song covers by cloning a specific voice from audio files. It builds upon the Realistic Voice Cloning (RVC v2) technology, allowing users to generate vocals in the style of any RVC v2 trained voice. This model offers an alternative to similar voice cloning models like [create-rvc-dataset](https://aimodels.fyi/models/replicate/create-rvc-dataset-zsxkib), [openvoice](https://aimodels.fyi/models/replicate/openvoice-cjwbw), [free-vc](https://aimodels.fyi/models/replicate/free-vc-jagilley), [train-rvc-model](https://aimodels.fyi/models/replicate/train-rvc-model-replicate), and [voicecraft](https://aimodels.fyi/models/replicate/voicecraft-cjwbw), each with its own unique features and capabilities.

## Model inputs and outputs

The `realistic-voice-cloning` model takes a variety of inputs that allow users to fine-tune the generated vocals, including the RVC model to use, pitch changes, reverb settings, and more. The output is a generated audio file in either MP3 or WAV format, containing the original song's vocals replaced with the cloned voice.

### Inputs
- **Song Input**: The audio file to use as the source for the song
- **RVC Model**: The specific RVC v2 model to use for the voice cloning
- **Pitch Change**: Adjust the pitch of the AI-generated vocals
- **Index Rate**: Control the balance between the AI's accent and the original vocals
- **RMS Mix Rate**: Adjust the balance between the original vocal's loudness and a fixed loudness
- **Filter Radius**: Apply median filtering to the harvested pitch results
- **Pitch Detection Algorithm**: Choose between different pitch detection algorithms
- **Protect**: Control the amount of original vocals' breath and voiceless consonants to leave in the AI vocals
- **Reverb Size, Damping, Dryness, and Wetness**: Adjust the reverb settings
- **Pitch Change All**: Change the pitch/key of the background music, backup vocals, and AI vocals
- **Volume Changes**: Adjust the volume of the main AI vocals, backup vocals, and background music

### Outputs
- The generated audio file in either MP3 or WAV format, with the original vocals replaced by the cloned voice

## Capabilities

The `realistic-voice-cloning` model can create high-quality song covers by replacing the original vocals with a cloned voice. Users can fine-tune the generated vocals to achieve their desired sound, adjusting parameters like pitch, reverb, and volume. This model is particularly useful for musicians, content creators, and audio engineers who want to create unique vocal covers or experiments with different voice styles.

## What can I use it for?

The `realistic-voice-cloning` model can be used to create song covers, remixes, and other audio projects where you want to replace the original vocals with a different voice. This can be useful for musicians who want to experiment with different vocal styles, content creators who want to create unique covers, or audio engineers who need to modify existing vocal tracks. The model's ability to fine-tune the generated vocals also makes it suitable for professional audio production work.

## Things to try

With the `realistic-voice-cloning` model, you can try creating unique song covers by cloning the voice of your favorite singers or even your own voice. Experiment with different RVC models, pitch changes, and reverb settings to achieve the desired sound. You could also explore using the model to create custom vocal samples or background vocals for your music productions. The versatility of the model allows for a wide range of creative possibilities.

📖 PuLID: Pure and Lightning ID Customization via Contrastive Alignment

## Model overview

`PuLID` is a powerful text-to-image model developed by researchers at ByteDance Inc. Similar to other advanced models like [Stable Diffusion](https://aimodels.fyi/models/replicate/stable-diffusion-stability-ai), [SDXL-Lightning](https://aimodels.fyi/models/replicate/sdxl-lightning-4step-bytedance), and [BLIP](https://aimodels.fyi/models/replicate/blip-salesforce), `PuLID` uses contrastive learning techniques to generate high-quality, customized images from textual prompts. Unlike traditional text-to-image models, `PuLID` has a unique focus on identity customization, allowing for fine-grained control over the appearance of generated faces and portraits.

## Model inputs and outputs

`PuLID` takes in a textual prompt, as well as one or more reference images of a person's face. The model then generates a set of new images that match the provided prompt while retaining the identity and appearance of the reference face(s). 

### Inputs
- **Prompt**: A text description of the desired image, such as "portrait, color, cinematic, in garden, soft light, detailed face"
- **Seed**: An optional integer value to control the randomness of the generated images
- **CF Scale**: A scaling factor that controls the influence of the textual prompt on the generated image
- **Num Steps**: The number of iterative refinement steps to perform during image generation
- **Image Size**: The desired width and height of the output images
- **Num Samples**: The number of unique images to generate
- **Identity Scale**: A scaling factor that controls the influence of the reference face(s) on the generated images
- **Mix Identities**: A boolean flag to enable mixing of multiple reference face images
- **Main Face Image**: The primary reference face image
- **Auxiliary Face Image(s)**: Additional reference face images (up to 3) to be used for identity mixing

### Outputs
- **Images**: A set of generated images that match the provided prompt and retain the identity and appearance of the reference face(s)

## Capabilities

`PuLID` excels at generating high-quality, customized portraits and face images. By leveraging contrastive alignment techniques, the model is able to faithfully preserve the identity and appearance of the reference face(s) while seamlessly blending them with the desired textual prompt. This makes `PuLID` a powerful tool for applications such as photo editing, character design, and virtual avatar creation.

## What can I use it for?

`PuLID` can be used in a variety of creative and commercial applications. For example, artists and designers could use it to quickly generate concept art for characters or illustrations, while businesses could leverage it to create custom virtual avatars or product visualizations. The model's ability to mix and match different facial features also opens up possibilities for personalized image generation, such as creating unique profile pictures or avatars.

## Things to try

One interesting aspect of `PuLID` is its ability to mix and match different facial features from multiple reference images. By experimenting with the "Mix Identities" setting, users can create unique hybrid faces that combine the characteristics of several individuals. This can be a powerful tool for creative expression or character design. Additionally, exploring the various input parameters, such as the prompt, CFG scale, and number of steps, can help users fine-tune the generated images to their specific needs and preferences.

Age prediction using CLIP - Patched version of `https://replicate.com/andreasjansson/clip-age-predictor` that works with the new version of cog!

## Model overview

The `clip-age-predictor` model is a tool that uses the CLIP (Contrastive Language-Image Pretraining) algorithm to predict the age of a person in an input image. This model is a patched version of the original `clip-age-predictor` model by [andreasjansson](https://aimodels.fyi/creators/replicate/andreasjansson) that works with the new version of Cog. Similar models include [clip-features](https://aimodels.fyi/models/replicate/clip-features-andreasjansson), which returns CLIP features for the clip-vit-large-patch14 model, and [stable-diffusion](https://aimodels.fyi/models/replicate/stable-diffusion-stability-ai), a latent text-to-image diffusion model.

## Model inputs and outputs

The `clip-age-predictor` model takes a single input - an image of a person whose age we want to predict. The model then outputs a string representing the predicted age of the person in the image.

### Inputs
- **Image**: The input image of the person whose age we'd like to predict

### Outputs
- **Predicted Age**: A string representing the predicted age of the person in the input image

## Capabilities

The `clip-age-predictor` model uses the CLIP algorithm to analyze the input image and compare it to prompts of the form "this person is {age} years old". The model then outputs the age that has the highest similarity to the input image.

## What can I use it for?

The `clip-age-predictor` model could be useful for applications that require estimating the age of people in images, such as demographic analysis, age-restricted content filtering, or even as a feature in photo editing software. For example, a marketing team could use this model to analyze the age distribution of their customer base from product photos.

## Things to try

One interesting thing to try with the `clip-age-predictor` model is to experiment with different types of input images, such as portraits, group photos, or even images of people in different poses or environments. You could also try combining this model with other AI tools, like the [gfpgan](https://aimodels.fyi/models/replicate/gfpgan-tencentarc) model for face restoration, to see if it can improve the accuracy of the age predictions.

✍️✨Prompts to auto-magically relights your images

## Model overview

`ic-light` is an AI model developed by [zsxkib](https://aimodels.fyi/creators/replicate/zsxkib) that can automatically relight images. It can manipulate the illumination of images, including adjusting the lighting conditions, adding shadows, and creating different moods and atmospheres. The model is capable of producing highly consistent relight results, even to the point of being able to estimate normal maps from the relighting. This consistency is achieved through a novel technique called "Imposing Consistent Light" which ensures that the blending of different light sources is mathematically equivalent to the appearance with mixed light sources.

The `ic-light` model is similar to other image editing and enhancement models like [GFPGAN](https://aimodels.fyi/models/replicate/gfpgan-tencentarc), which focuses on face restoration, and [LedNet](https://aimodels.fyi/models/replicate/lednet-sczhou), which handles joint low-light enhancement and deblurring. However, `ic-light` is specifically designed for relighting images, allowing users to adjust the lighting conditions in creative ways.

## Model inputs and outputs

### Inputs
- **Prompt**: A text description guiding the relighting and generation process
- **Subject Image**: The main foreground image to be relighted
- **Lighting Preference**: The type and position of lighting to apply to the initial background latent
- **Various hyperparameters**: Including number of steps, image size, denoising strength, etc.

### Outputs
- **Relighted Images**: The generated images with the desired lighting conditions applied

## Capabilities

The `ic-light` model can automatically relight images based on textual prompts and lighting preferences. It can add shadows, adjust the mood and atmosphere, and create cinematic lighting effects. The model's ability to maintain consistent lighting across different relighting conditions is a key strength, allowing users to experiment and iterate on the lighting without losing coherence.

## What can I use it for?

`ic-light` can be used for a variety of image editing and enhancement tasks, such as:

- Enhancing portrait photography by adjusting the lighting to create a more flattering or artistic look
- Generating stylized images with specific lighting conditions, such as warm, moody bedroom scenes or bright, sunny outdoor settings
- Adjusting the lighting in product or architectural photography to better showcase the subject
- Experimenting with different lighting setups for CGI or 3D rendering projects

The model's consistent relighting capabilities also make it useful for tasks like normal map estimation, which can be leveraged in 3D modeling and game development workflows.

## Things to try

One interesting aspect of `ic-light` is its ability to generate normal maps from the relighting results, despite not being trained on any normal map data. This suggests the model has learned to maintain a consistent 3D lighting representation, which could be useful for a variety of applications beyond just image editing.

Another interesting feature is the background-conditioned model, which allows for simple prompting without the need for careful text guidance. This could be useful for users who want to quickly generate relighted images without the overhead of fine-tuning the prompts.

Overall, `ic-light` is a powerful tool for creative image manipulation and lighting experimentation, with potential applications in photography, digital art, and 3D modeling.

🎨 AnimateDiff (w/ MotionLoRAs for Panning, Zooming, etc): Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

## Model overview

`animate-diff` is a plug-and-play module developed by Yuwei Guo, Ceyuan Yang, and others that can turn most community text-to-image diffusion models into animation generators, without the need for additional training. It was presented as a spotlight paper at ICLR 2024. 

The model builds on previous work like [Tune-a-Video](https://github.com/showlab/Tune-A-Video) and provides several versions that are compatible with Stable Diffusion V1.5 and Stable Diffusion XL. It can be used to animate personalized text-to-image models from the community, such as [RealisticVision V5.1](https://civitai.com/models/4201?modelVersionId=130072) and [ToonYou Beta6](https://civitai.com/models/30240?modelVersionId=125771).

## Model inputs and outputs

`animate-diff` takes in a text prompt, a base text-to-image model, and various optional parameters to control the animation, such as the number of frames, resolution, camera motions, etc. It outputs an animated video that brings the prompt to life.

### Inputs
- **Prompt**: The text description of the desired scene or object to animate
- **Base model**: A pre-trained text-to-image diffusion model, such as Stable Diffusion V1.5 or Stable Diffusion XL, potentially with a personalized LoRA model
- **Animation parameters**:
  - Number of frames
  - Resolution
  - Guidance scale
  - Camera movements (pan, zoom, tilt, roll)

### Outputs
- Animated video in MP4 or GIF format, with the desired scene or object moving and evolving over time

## Capabilities

`animate-diff` can take any text-to-image model and turn it into an animation generator, without the need for additional training. This allows users to animate their own personalized models, like those trained with [DreamBooth](https://github.com/bmaltais/dreambooth), and explore a wide range of creative possibilities.

The model supports various camera movements, such as panning, zooming, tilting, and rolling, which can be controlled through MotionLoRA modules. This gives users fine-grained control over the animation and allows for more dynamic and engaging outputs.

## What can I use it for?

`animate-diff` can be used for a variety of creative applications, such as:

- Animating personalized text-to-image models to bring your ideas to life
- Experimenting with different camera movements and visual styles
- Generating animated content for social media, videos, or illustrations
- Exploring the combination of text-to-image and text-to-video capabilities

The model's flexibility and ease of use make it a powerful tool for artists, designers, and content creators who want to add dynamic animation to their work.

## Things to try

One interesting aspect of `animate-diff` is its ability to animate personalized text-to-image models without additional training. Try experimenting with your own DreamBooth models or models from the community, and see how the animation process can enhance and transform your creations.

Additionally, explore the different camera movement controls, such as panning, zooming, and rolling, to create more dynamic and cinematic animations. Combine these camera motions with different text prompts and base models to discover unique visual styles and storytelling possibilities.

📽️ Increase Framerate 🎬 ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation

## Model overview

The `st-mfnet` is a Spatio-Temporal Multi-Flow Network for Frame Interpolation developed by researchers at the University of Bristol. It is designed to increase the framerate of videos by generating additional intermediate frames, which can be useful for various applications such as video editing, gaming, and virtual reality. The model is similar to other video frame interpolation models like [tokenflow](https://aimodels.fyi/models/replicate/tokenflow-cjwbw) and [xmem-propainter-inpainting](https://aimodels.fyi/models/replicate/xmem-propainter-inpainting-jd7h), which also aim to enhance video quality by creating new frames.

## Model inputs and outputs

The `st-mfnet` model takes a video as input and generates a new video with increased framerate. The model can maintain the original video duration or adjust the framerate to a custom value, depending on the user's preference.

### Inputs
- **mp4**: An MP4 video file to be processed.
- **framerate_multiplier**: Determines how many intermediate frames to generate between original frames. For example, a value of 2 will double the frame rate, and 4 will quadruple it.
- **keep_original_duration**: If set to `True`, the enhanced video will retain the original duration, with the frame rate adjusted accordingly. If set to `False`, the frame rate will be set based on the `custom_fps` parameter.
- **custom_fps**: The desired frame rate (frames per second) for the enhanced video, used only when `keep_original_duration` is set to `False`.

### Outputs
- **Video**: The enhanced video with increased framerate.

## Capabilities

The `st-mfnet` model is capable of generating high-quality intermediate frames that can significantly improve the smoothness and visual quality of videos, especially those with fast-moving objects or camera panning. The model uses a novel Spatio-Temporal Multi-Flow Network architecture to capture both spatial and temporal information, resulting in more accurate frame interpolation compared to simpler approaches.

## What can I use it for?

The `st-mfnet` model can be used in a variety of video-related applications, such as:

- **Video Editing**: Increasing the framerate of existing footage to create smoother slow-motion effects or improve the visual quality of fast-paced action sequences.
- **Gaming and Virtual Reality**: Enhancing the fluidity and responsiveness of video games and VR experiences by generating additional frames.
- **Video Compression**: Reducing file sizes by storing videos at a lower framerate and using the `st-mfnet` model to interpolate the missing frames during playback.

## Things to try

One interesting way to use the `st-mfnet` model is to experiment with different `framerate_multiplier` values to find the optimal balance between visual quality and file size. A higher multiplier will result in a smoother video, but may also lead to larger file sizes. Additionally, you can try using the model on a variety of video content, such as sports footage, animation, or documentary films, to see how it performs in different scenarios.

FILM: Frame Interpolation for Large Motion, In ECCV 2022.

## Model overview

`film-frame-interpolation-for-large-motion` is a state-of-the-art AI model for high-quality frame interpolation, particularly for videos with large motion. It was developed by researchers at Google and presented at the European Conference on Computer Vision (ECCV) in 2022. Unlike other approaches, this model does not rely on additional pre-trained networks like optical flow or depth estimation, yet it achieves superior results. The model uses a multi-scale feature extractor with shared convolution weights to effectively handle large motions.

The `film-frame-interpolation-for-large-motion` model is similar to other frame interpolation models like [`st-mfnet`](https://aimodels.fyi/models/replicate/st-mfnet-zsxkib), which also aims to increase video framerates, and [`lcm-video2video`](https://aimodels.fyi/models/replicate/lcm-video2video-fofr), which performs fast video-to-video translation. However, this model specifically focuses on handling large motions, making it well-suited for applications like slow-motion video creation.

## Model inputs and outputs

The `film-frame-interpolation-for-large-motion` model takes in a pair of images (or frames from a video) and generates intermediate frames between them. This allows transforming near-duplicate photos into slow-motion footage that looks like it was captured with a video camera.

### Inputs
- **mp4**: An MP4 video file for frame interpolation
- **num_interpolation_steps**: The number of steps to interpolate between animation frames (default is 3, max is 50)
- **playback_frames_per_second**: The desired playback speed in frames per second (default is 24, max is 60)

### Outputs
- **Output**: A URI pointing to the generated slow-motion video

## Capabilities

The `film-frame-interpolation-for-large-motion` model is capable of generating high-quality intermediate frames, even for videos with large motions. This allows smoothing out jerky or low-framerate footage and creating slow-motion effects. The model's single-network approach, without relying on additional pre-trained networks, makes it efficient and easy to use.

## What can I use it for?

The `film-frame-interpolation-for-large-motion` model can be particularly useful for creating slow-motion videos from near-duplicate photos or low-framerate footage. This could be helpful for various applications, such as:

- Enhancing video captured on smartphones or action cameras
- Creating cinematic slow-motion effects for short films or commercials
- Smoothing out animation sequences with large movements

## Things to try

One interesting aspect of the `film-frame-interpolation-for-large-motion` model is its ability to handle large motions in videos. Try experimenting with high-speed footage, such as sports or action scenes, and see how the model can transform the footage into smooth, slow-motion sequences. Additionally, you can try adjusting the number of interpolation steps and the desired playback frames per second to find the optimal settings for your use case.