joy-caption-beta-one

Running on Zero

fancyfeast commited on May 11

Commit

27c9477

1 Parent(s): 89e9fac

Update version and tweak the UI

Files changed (3) hide show

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ emoji: 🖼️💬
 colorFrom: yellow
 colorTo: blue
 sdk: gradio
-sdk_version: 5.6.0
 app_file: app.py
 pinned: false
 ---

 colorFrom: yellow
 colorTo: blue
 sdk: gradio
+sdk_version: 5.29.0
 app_file: app.py
 pinned: false
 ---

app.py CHANGED Viewed

@@ -8,11 +8,10 @@ from typing import Generator
 MODEL_PATH = "fancyfeast/llama-joycaption-beta-one-hf-llava"
-TITLE = "<h1><center>JoyCaption Beta One - (2025-05-10a)</center></h1>"
 DESCRIPTION = """
 <div>
 <p></p>
-<p>**This model cannot see any chat history.**</p>
 <p>🚨🚨🚨 If the "Help improve JoyCaption" box is checked, the _text_ query you write will be logged and I _might_ use it to help improve JoyCaption.
 It does not log images, user data, etc; only the text query.  I cannot see what images you send, and frankly, I don't want to.  But knowing what kinds of instructions
 and queries users want JoyCaption to handle will help guide me in building JoyCaption's dataset.  This dataset will be made public.  As always, the model itself is completely
@@ -33,7 +32,7 @@ CAPTION_TYPE_MAP = {
 		"Write a descriptive caption for this image in a casual tone within {word_count} words.",
 		"Write a {length} descriptive caption for this image in a casual tone.",
 	],
-	"Training Prompt": [
 		"Write a stable diffusion prompt for this image.",
 		"Write a stable diffusion prompt for this image within {word_count} words.",
 		"Write a {length} stable diffusion prompt for this image.",
@@ -238,6 +237,9 @@ with gr.Blocks() as demo:
 		outputs=output_caption,
 	)
 	gr.Markdown(DESCRIPTION)

 MODEL_PATH = "fancyfeast/llama-joycaption-beta-one-hf-llava"
+TITLE = "<h1><center>JoyCaption Beta One - (2025-05-10a)</center></h1>JoyCaption is an image captioning model"
 DESCRIPTION = """
 <div>
 <p></p>
 <p>🚨🚨🚨 If the "Help improve JoyCaption" box is checked, the _text_ query you write will be logged and I _might_ use it to help improve JoyCaption.
 It does not log images, user data, etc; only the text query.  I cannot see what images you send, and frankly, I don't want to.  But knowing what kinds of instructions
 and queries users want JoyCaption to handle will help guide me in building JoyCaption's dataset.  This dataset will be made public.  As always, the model itself is completely
 		"Write a descriptive caption for this image in a casual tone within {word_count} words.",
 		"Write a {length} descriptive caption for this image in a casual tone.",
 	],
+	"Stable Diffusion Prompt": [
 		"Write a stable diffusion prompt for this image.",
 		"Write a stable diffusion prompt for this image within {word_count} words.",
 		"Write a {length} stable diffusion prompt for this image.",
 		outputs=output_caption,
 	)
+	# Initial prompt
+	prompt_box.value = build_prompt(caption_type.value, caption_length.value, extra_options.value, name_input.value)
 	gr.Markdown(DESCRIPTION)

requirements.txt CHANGED Viewed

@@ -3,5 +3,4 @@ accelerate
 torch
 transformers==4.51.0
 sentencepiece
-torchvision
-pydantic==2.10.6

 torch
 transformers==4.51.0
 sentencepiece
+torchvision