desuAnon commited on
Commit
096a33f
·
verified ·
1 Parent(s): 1687744
Files changed (1) hide show
  1. README.md +14 -19
README.md CHANGED
@@ -5,45 +5,40 @@ license: cc0-1.0
5
 
6
  Temporary access to OpenAI's video generation model Sora (turbo) was provided by the HF repo [PR-Puppet-Sora](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora), on November 26th. After a few hours, OpenAI revoked the API key used by the repo and removed access to the generated videos. In anticipation of that event, the publicly displayed videos and their prompts were archived.
7
 
8
- This release contains 87 archived videos (~702 MB) and 83 of their prompts, and dedicated to the public domain (CC0 1.0 Universal).
9
- The generation parameters may be found in the app.py of the original repo [here](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora/blob/main/app.py). An archive of this script is available [here](https://archive.is/r70Ao).
10
- User prompts are often "augmented" (changed by some LLM) before generating videos, and this may be true for these videos as well.
11
- The Sora backend that was used for generation was `https://sora.openai.com/backend/video_gen`
 
12
 
13
- Contrary to claims online, the generations were *not* uncensored. User prompts, as well as the generated videos, passed through OpenAI's content moderation normally.
14
- This is partly the reason why none of the videos in this archive are NSFW, or similar, despite a few *brave attempts* in the prompts.
15
- It is also incorrect that "Sora leaked", since the model itself (its model parameters) had not been acquired by outsiders.
16
- The only thing that "leaked" was previewer/beta tester access to Sora video generation, via a single HF repo - while keeping its API keys secret.
17
 
18
- ---
19
- ### Archive versions
20
 
21
  All videos are `.mp4`, of varying resolutions, and a framerate of 30 FPS.
22
  Not all of the videos that were generated were able to be archived, due to HF server load issues.
23
  The prompts used for four videos are not known, and these are denoted as [unknown_n].
24
- Hugging Face performs *File Security Scans* of uploaded files, and you can click on the icon next to each file to see the result of this.
25
 
26
  **sora-turbo-vids.zip**
27
- This is the original archive containing both videos and their prompts, and some users experienced encoding/compatibility issues with it.
28
- Consider using the more recent "separated" uploads if you encounter similar issues.
29
- The filenames in the `short_prompts` directory are the full prompts used for each video generation request.
30
- The filenames in the `long_prompts` directory are shortened versions of the long prompts (above 256 chars), and their full versions are found in `full_long_prompts.txt`.
31
 
32
  **videos_only.zip** & **videos_only.7z**
33
  These identical archives (in different compression formats) contain only the original videos, with names such as `video_24.mp4`.
34
  The `video_24` part is the video ID, and the prompt used for a specific video ID is listed in the separate CSV and JSONL files (video_id, prompt).
35
  You may easily view both those files in a text editor, and they are easy to import and process in various programming languages.
36
 
37
- ---
38
- ### YouTube Versions
39
 
40
- You can watch the videos from this dataset on YouTube:
 
 
41
 
42
  - [All "Leaked" Sora Videos & Prompts! (No Commentary, just Videos)](https://www.youtube.com/watch?v=FI0wWpmraW0)
43
 
44
  - [Sora Leak - all new videos](https://www.youtube.com/watch?v=Gz33LlwsPVM)
45
 
46
- ---
47
 
48
  Even though this is a *dataset* upload, I went with a *model* repo because a) the URL is shorter, and b) the original upload wasn't compatible with the HF dataset viewer.
49
 
 
5
 
6
  Temporary access to OpenAI's video generation model Sora (turbo) was provided by the HF repo [PR-Puppet-Sora](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora), on November 26th. After a few hours, OpenAI revoked the API key used by the repo and removed access to the generated videos. In anticipation of that event, the publicly displayed videos and their prompts were archived.
7
 
8
+ This release contains 87 archived videos (~702 MB) and 83 of their prompts, and is dedicated to the public domain (CC0 1.0 Universal).
9
+ The generation parameters may be found in the app.py of the original repo [here](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora/blob/main/app.py).
10
+ An archive of this script is available [here](https://archive.is/r70Ao). User prompts are often "augmented" (changed by some LLM) before generating videos, and this may be true for these videos as well.
11
+ The Sora backend that was used for generation was:
12
+ `https://sora.openai.com/backend/video_gen`
13
 
14
+ Contrary to some rumors online, the generations were *not* uncensored. User prompts, as well as the generated videos, passed through OpenAI's content moderation normally. This is partly the reason why none of the videos in this archive are NSFW, or similar, despite a few *brave attempts* in the prompts.
15
+ It is also incorrect that "Sora leaked", since the model itself (its model parameters) had not been acquired by outsiders. The only thing that "leaked" was previewer/beta tester access to Sora video generation, via a single HF repo - while keeping its API keys secret.
 
 
16
 
17
+ ### Archive Versions
 
18
 
19
  All videos are `.mp4`, of varying resolutions, and a framerate of 30 FPS.
20
  Not all of the videos that were generated were able to be archived, due to HF server load issues.
21
  The prompts used for four videos are not known, and these are denoted as [unknown_n].
22
+ Hugging Face performs *File Security Scans* of uploaded files, and you can click on the icon next to each file to see the result of this scan.
23
 
24
  **sora-turbo-vids.zip**
25
+ This is the original archive containing both videos and their prompts, and some users experienced encoding/compatibility issues with it. Consider using the more recent "separated" uploads if you encounter similar issues.
26
+ The filenames in the `short_prompts` directory are the full prompts used for each video generation request. The filenames in the `long_prompts` directory are shortened versions of the long prompts (above 256 chars), and their full versions are found in `full_long_prompts.txt`.
 
 
27
 
28
  **videos_only.zip** & **videos_only.7z**
29
  These identical archives (in different compression formats) contain only the original videos, with names such as `video_24.mp4`.
30
  The `video_24` part is the video ID, and the prompt used for a specific video ID is listed in the separate CSV and JSONL files (video_id, prompt).
31
  You may easily view both those files in a text editor, and they are easy to import and process in various programming languages.
32
 
 
 
33
 
34
+ ### YouTube Compilations
35
+
36
+ This collection of videos have been uploaded to YouTube for easy viewing (by someone other than me). You can watch them here:
37
 
38
  - [All "Leaked" Sora Videos & Prompts! (No Commentary, just Videos)](https://www.youtube.com/watch?v=FI0wWpmraW0)
39
 
40
  - [Sora Leak - all new videos](https://www.youtube.com/watch?v=Gz33LlwsPVM)
41
 
 
42
 
43
  Even though this is a *dataset* upload, I went with a *model* repo because a) the URL is shorter, and b) the original upload wasn't compatible with the HF dataset viewer.
44