metadata

license: other
license_name: nvidia-open-model-license
license_link: >-
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
language:
  - en
base_model:
  - nvidia/Cosmos-1.0-Diffusion-7B-Video2World
  - nvidia/Cosmos-1.0-Diffusion-7B-Text2World
pipeline_tag: image-to-video
tags:
  - text-to-video
  - video-to-video
  - nvidia
  - gguf-node
widget:
  - text: >-
      A crystalline waterfall stands partially frozen, its edges draped with
      translucent ice that catches the sunlight in prisms of blue and silver.
      Below, a half-frozen pool spreads out, bordered by delicate ice
      formations. Through the fresh snow, a red fox moves gracefully, its russet
      coat vibrant against the white landscape, leaving perfect star-shaped
      prints behind as steam rises from its breath in the crisp winter air. The
      scene is wrapped in snow-muffled silence, broken only by the gentle murmur
      of water still flowing beneath the ice. 
    parameters:
      negative_prompt: >-
        The video captures a series of frames showing ugly scenes, static with
        no motion, motion blur, over-saturation, shaky footage, low resolution,
        grainy texture, pixelated images, poorly lit areas, underexposed and
        overexposed scenes, poor color balance, washed out colors, choppy
        sequences, jerky movements, low frame rate, artifacting, color banding,
        unnatural transitions, outdated special effects, fake elements,
        unconvincing visuals, poorly edited content, jump cuts, visual noise,
        and flickering. Overall, the video is of poor quality.
    output:
      url: samples\ComfyUI_00002_.webp
  - text: >-
      anime style anime girl with massive fennec ears and one big fluffy tail,
      she has blonde long hair blue eyes wearing a maid outfit with a long black
      gold leaf pattern dress, walking slowly to the front with sweetie smile,
      holding a fancy black forest cake with candles on top in the kitchen of an
      old dark Victorian mansion lit by candlelight with a bright window to the
      foggy forest
    parameters:
      negative_prompt: >-
        The video captures a series of frames showing ugly scenes, static with
        no motion, motion blur, over-saturation, shaky footage, low resolution,
        grainy texture, pixelated images, poorly lit areas, underexposed and
        overexposed scenes, poor color balance, washed out colors, choppy
        sequences, jerky movements, low frame rate, artifacting, color banding,
        unnatural transitions, outdated special effects, fake elements,
        unconvincing visuals, poorly edited content, jump cuts, visual noise,
        and flickering. Overall, the video is of poor quality.
    output:
      url: samples\ComfyUI_00001_.webp
  - text: drag it to browser <metadata> same descriptor to the 1st one
    output:
      url: samples\ComfyUI_00003_.webp

gguf/fp8 quantized version of video2world and text2world (test in progress)

setup (once)

drag Cosmos-1_0-Diffusion-7B-Video2World_fp8_e4m3fn.safetensors [7.24GB] or/and Cosmos-1_0-Diffusion-7B-Text2World_fp8_e4m3fn.safetensors [7.24GB] to > ./ComfyUI/models/diffusion_models
drag oldt5_xxl_fp8_e4m3fn.safetensors [4.9GB] to > ./ComfyUI/models/text_encoders
drag cosmos_cv8x8x8_1.0_vae_bf16.safetensors [211MB] to > ./ComfyUI/models/vae

run it straight (no installation needed way)

run the .bat file in the main directory (assuming you are using the gguf-node pack below)
drag the workflow json file (below), or the sample webp file, to > your browser

workflow

example workflow for text2world
example workflow for video2world

review

working roughly; but not very stable/consistent for the time being
gguf file might not work; if so, please wait for the code update
btw, you are able to replicate the conversion step and get exactly the same file with tools in gguf node (this point is not directly related to the model)

reference

base model from nvidia (text2world:7b|14b & video2world:7b|14b)
comfyui from comfyanonymous
gguf-node (pypi|repo|pack)

Prompt
A crystalline waterfall stands partially frozen, its edges draped with translucent ice that catches the sunlight in prisms of blue and silver. Below, a half-frozen pool spreads out, bordered by delicate ice formations. Through the fresh snow, a red fox moves gracefully, its russet coat vibrant against the white landscape, leaving perfect star-shaped prints behind as steam rises from its breath in the crisp winter air. The scene is wrapped in snow-muffled silence, broken only by the gentle murmur of water still flowing beneath the ice.

Negative Prompt
The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality.

Prompt
anime style anime girl with massive fennec ears and one big fluffy tail, she has blonde long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress, walking slowly to the front with sweetie smile, holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest

Negative Prompt
The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality.

Prompt
drag it to browser <metadata> same descriptor to the 1st one