Spaces:

jbilcke-hf
/

ai-tube

Running

App Files Files Community

jbilcke commited on Feb 6, 2024

Commit

566d763

1 Parent(s): 2156c54

update clap spec

Browse files

Files changed (4) hide show

src/clap/clap-specification-draft.md +147 -0
src/clap/parseClap.ts +2 -1
src/clap/serializeClap.ts +1 -0
src/clap/types.ts +1 -0

src/clap/clap-specification-draft.md ADDED Viewed

	@@ -0,0 +1,147 @@

+# CLAP Format Specification
+Status: DRAFT
+Document revision: 0.0.1
+Last updated: Feb 6th, 2024
+## BEFORE YOU READ
+The CLAP format spec is experimental and not finished yet!
+There might be inconsistencies, unnecessary redundancies or blatant omissions.
+## What are CLAP files?
+The CLAP format (.clap) is a video project format used to store information about a generative AI video.
+It preserves prompts and assets into the same container, making easier to share a project between different people or applications.
+## Structure
+A CLAP is an array of objects serialized into a YAML text string, then finally compressed using gzip to a binary file.
+The file extension is `.clap`
+The mime type is `application/x-yaml`
+There can be 5 different types of objects:
+- one HEADER
+- one METADATA
+- zero, one or more MODEL(s)
+- zero, one or more SCENE(s)
+- zero, one or more SEGMENT(s)
+This can be represented in javascript like this:
+```javascript
+  const entries = [
+    clapHeader,
+    clapMeta,
+    ...clapModels,
+    ...clapScenes,
+    ...clapSegments
+  ]
+```
+## Header
+The HEADER provides information about how to decode a CLAP.
+Knowing in advance the number of models, scenes and segments helps the decoder parsing the information,
+and in some implementation, help with debugging, logging, and provisioning memory usage.
+However in the future, it is possible that a different scheme is used, in order to support streaming.
+Either by recognizing the shape of each object (fields), or by using a specific field eg. a `_type`.
+```typescript
+{
+  // used to know which format version is used.
+  // CLAP is still in development and the format is not fully specified yet,
+  // during the period most .clap file will have the "clap-0" format
+  format: "clap-0"
+  numberOfModels: number // integer
+  numberOfScenes: number // integer
+  numberOfSegments: number // integer
+}
+```
+## Metadata
+```typescript
+{
+  id: string // "<a valid UUID V4>"
+  title: string // "project title"
+  description: string // "project description"
+  licence: string // "information about licensing"
+  // this provides information about the image ratio
+  // this might be removed in the final spec, as this
+  // can be re-computed from width and height
+  orientation: "landscape" | "vertical" | "square"
+  // the suggested width and height of the video
+  // note that this is just an indicator,
+  // and might be superseeded by the application reading the .clap file
+  width: number // integer between 256 and 8192 (value in pixels)
+  height: number // integer between 256 and 8192 (value in pixels)
+  // name of the suggested video model to use
+  // note that this is just an indicator,
+  // and might be superseeded by the application reading the .clap file
+  defaultVideoModel: string
+  // additional prompt to use in the video generation
+  // this helps adding some magic touch and flair to the videos,
+  // but perhaps the field should be renamed
+  extraPositivePrompt: string
+  // the screenplay (script) of the video
+  screenplay: string
+}
+## Models
+Before talking about models, first we should describe the concept of entity:
+in a story, an entity is something (person, place, vehicle, animal, robot, alien, object) with a name, a description of the appearance, an age, mileage or quality, an origin, and so on.
+An example could be "a giant magical school bus, with appearance of a cat with wheels, and which talks"
+The CLAP model would be an instance (an interpretation) of this entity, where we would assign it an identity:
+- a name and age
+- a visual style (a photo of the magic school bus cat)
+- a voice style
+- and maybe other things eg. an origin or background story
+As you can see, it can be difficult to create clearly separated categories, like "vehicule", "character", or "location"
+(the magical cat bus could turn into a location in some scene, a speaking character in another etc)
+This is why there is a common schema for all models:
+```typescript
+{
+  id: string
+  category: ClapSegmentCategory
+  triggerName: string
+  label: string
+  description: string
+  author: string
+  thumbnailUrl: string
+  seed: number
+  assetSourceType: ClapAssetSource
+  assetUrl: string
+  age: number
+  gender: ClapModelGender
+  region: ClapModelRegion
+  appearance: ClapModelAppearance
+  voiceVendor: ClapVoiceVendor
+  voiceId: string
+}
+```
+TO BE CONTINUED
+(you can read "./types.ts" for more information)

src/clap/parseClap.ts CHANGED Viewed

@@ -50,7 +50,8 @@ export async function parseClap(inputStringOrBlob: string | Blob): Promise<ClapP
     width: getValidNumber(maybeClapMeta.width, 256, 8192, 1024),
     height: getValidNumber(maybeClapMeta.height, 256, 8192, 576),
     defaultVideoModel: typeof maybeClapMeta.defaultVideoModel === "string" ? maybeClapMeta.defaultVideoModel : "SVD",
-    extraPositivePrompt: Array.isArray(maybeClapMeta.extraPositivePrompt) ? maybeClapMeta.extraPositivePrompt : []
   }
   /*

     width: getValidNumber(maybeClapMeta.width, 256, 8192, 1024),
     height: getValidNumber(maybeClapMeta.height, 256, 8192, 576),
     defaultVideoModel: typeof maybeClapMeta.defaultVideoModel === "string" ? maybeClapMeta.defaultVideoModel : "SVD",
+    extraPositivePrompt: Array.isArray(maybeClapMeta.extraPositivePrompt) ? maybeClapMeta.extraPositivePrompt : [],
+    screenplay: typeof maybeClapMeta.screenplay === "string" ? maybeClapMeta.screenplay : "",
   }
   /*

src/clap/serializeClap.ts CHANGED Viewed

@@ -130,6 +130,7 @@ export async function serializeClap({
     height: getValidNumber(meta.height, 256, 8192, 576),
     defaultVideoModel:  typeof meta.defaultVideoModel === "string" ? meta.defaultVideoModel : "SVD",
     extraPositivePrompt: Array.isArray(meta.extraPositivePrompt) ? meta.extraPositivePrompt : [],
   }
   const entries = [

     height: getValidNumber(meta.height, 256, 8192, 576),
     defaultVideoModel:  typeof meta.defaultVideoModel === "string" ? meta.defaultVideoModel : "SVD",
     extraPositivePrompt: Array.isArray(meta.extraPositivePrompt) ? meta.extraPositivePrompt : [],
+    screenplay: typeof meta.screenplay == "string" ? meta.screenplay : "",
   }
   const entries = [

src/clap/types.ts CHANGED Viewed

@@ -84,6 +84,7 @@ export type ClapMeta = {
   height: number
   defaultVideoModel: string
   extraPositivePrompt: string[]
 }
 export type ClapSceneEvent = {

   height: number
   defaultVideoModel: string
   extraPositivePrompt: string[]
+  screenplay: string
 }
 export type ClapSceneEvent = {