Spaces:
Runtime error
Runtime error
Upload 11 files
Browse files- NOTICE.md +76 -0
- README.md +167 -8
- app.py +569 -383
- flux_kontext_helpers.py +98 -0
- ltx_helpers.py +190 -0
- ltx_upscaler_manager_helpers.py +62 -0
- ltx_worker_base.py +133 -0
- ltx_worker_upscaler.py +99 -0
- requirements.txt +6 -4
NOTICE.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# NOTICE
|
2 |
+
|
3 |
+
Copyright (C) 2025 Carlos Rodrigues dos Santos. All rights reserved.
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
## Aviso de Propriedade Intelectual e Licenciamento
|
8 |
+
|
9 |
+
### **Processo de Patenteamento em Andamento (EM PORTUGUΓS):**
|
10 |
+
|
11 |
+
O mΓ©todo e o sistema de orquestraΓ§Γ£o de prompts denominados **ADUC (Automated Discovery and Orchestration of Complex tasks)**, conforme descritos neste documento e implementados neste software, estΓ£o atualmente em processo de patenteamento.
|
12 |
+
|
13 |
+
O titular dos direitos, Carlos Rodrigues dos Santos, estΓ‘ buscando proteΓ§Γ£o legal para as inovaΓ§Γ΅es chave da arquitetura ADUC, incluindo, mas nΓ£o se limitando a:
|
14 |
+
|
15 |
+
* FragmentaΓ§Γ£o e escalonamento de solicitaΓ§Γ΅es que excedem limites de contexto de modelos de IA.
|
16 |
+
* DistribuiΓ§Γ£o inteligente de sub-tarefas para especialistas heterogΓͺneos.
|
17 |
+
* Gerenciamento de estado persistido com avaliaΓ§Γ£o iterativa e realimentaΓ§Γ£o para o planejamento de prΓ³ximas etapas.
|
18 |
+
* Planejamento e roteamento sensΓvel a custo, latΓͺncia e requisitos de qualidade.
|
19 |
+
* O uso de "tokens universais" para comunicaΓ§Γ£o agnΓ³stica a modelos.
|
20 |
+
|
21 |
+
### **Reconhecimento e ImplicaΓ§Γ΅es (EM PORTUGUΓS):**
|
22 |
+
|
23 |
+
Ao acessar ou utilizar este software e a arquitetura ADUC aqui implementada, vocΓͺ reconhece:
|
24 |
+
|
25 |
+
1. A natureza inovadora e a importΓ’ncia da arquitetura ADUC no campo da orquestraΓ§Γ£o de prompts para IA.
|
26 |
+
2. Que a essΓͺncia desta arquitetura, ou suas implementaΓ§Γ΅es derivadas, podem estar sujeitas a direitos de propriedade intelectual, incluindo patentes.
|
27 |
+
3. Que o uso comercial, a reproduΓ§Γ£o da lΓ³gica central da ADUC em sistemas independentes, ou a exploraΓ§Γ£o direta da invenΓ§Γ£o sem o devido licenciamento podem infringir os direitos de patente pendente.
|
28 |
+
|
29 |
+
---
|
30 |
+
|
31 |
+
### **Patent Pending (IN ENGLISH):**
|
32 |
+
|
33 |
+
The method and system for prompt orchestration named **ADUC (Automated Discovery and Orchestration of Complex tasks)**, as described herein and implemented in this software, are currently in the process of being patented.
|
34 |
+
|
35 |
+
The rights holder, Carlos Rodrigues dos Santos, is seeking legal protection for the key innovations of the ADUC architecture, including, but not limited to:
|
36 |
+
|
37 |
+
* Fragmentation and scaling of requests exceeding AI model context limits.
|
38 |
+
* Intelligent distribution of sub-tasks to heterogeneous specialists.
|
39 |
+
* Persistent state management with iterative evaluation and feedback for planning subsequent steps.
|
40 |
+
* Cost, latency, and quality-aware planning and routing.
|
41 |
+
* The use of "universal tokens" for model-agnostic communication.
|
42 |
+
|
43 |
+
### **Acknowledgement and Implications (IN ENGLISH):**
|
44 |
+
|
45 |
+
By accessing or using this software and the ADUC architecture implemented herein, you acknowledge:
|
46 |
+
|
47 |
+
1. The innovative nature and significance of the ADUC architecture in the field of AI prompt orchestration.
|
48 |
+
2. That the essence of this architecture, or its derivative implementations, may be subject to intellectual property rights, including patents.
|
49 |
+
3. That commercial use, reproduction of ADUC's core logic in independent systems, or direct exploitation of the invention without proper licensing may infringe upon pending patent rights.
|
50 |
+
|
51 |
+
---
|
52 |
+
|
53 |
+
## LicenΓ§a AGPLv3
|
54 |
+
|
55 |
+
This program is free software: you can redistribute it and/or modify
|
56 |
+
it under the terms of the GNU Affero General Public License as published by
|
57 |
+
the Free Software Foundation, either version 3 of the License, or
|
58 |
+
(at your option) any later version.
|
59 |
+
|
60 |
+
This program is distributed in the hope that it will be useful,
|
61 |
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
62 |
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
63 |
+
GNU Affero General Public License for more details.
|
64 |
+
|
65 |
+
You should have received a copy of the GNU Affero General Public License
|
66 |
+
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
67 |
+
|
68 |
+
---
|
69 |
+
|
70 |
+
**Contato para Consultas:**
|
71 |
+
|
72 |
+
Para mais informaΓ§Γ΅es sobre a arquitetura ADUC, o status do patenteamento, ou para discutir licenciamento para usos comerciais ou nΓ£o conformes com a AGPLv3, por favor, entre em contato:
|
73 |
+
|
74 |
+
Carlos Rodrigues dos Santos
|
75 | |
76 |
+
Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
|
README.md
CHANGED
@@ -1,12 +1,14 @@
|
|
1 |
---
|
2 |
title: Euia-AducSdr
|
3 |
-
emoji:
|
4 |
colorFrom: indigo
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
sdk_version: 5.42.0
|
8 |
app_file: app.py
|
9 |
-
pinned:
|
|
|
|
|
10 |
---
|
11 |
|
12 |
### π§π· PortuguΓͺs
|
@@ -31,14 +33,169 @@ An open and functional implementation of the ADUC-SDR (Architecture for Composit
|
|
31 |
|
32 |
---
|
33 |
|
34 |
-
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
|
40 |
-
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
---
|
44 |
|
@@ -49,4 +206,6 @@ Una implementaciΓ³n abierta y funcional de la arquitectura ADUC-SDR (Arquitectur
|
|
49 |
- **GitHub:** [https://github.com/carlex22/Aduc-sdr](https://github.com/carlex22/Aduc-sdr)
|
50 |
- **Hugging Face Spaces:**
|
51 |
- [Ltx-SuperTime-60Secondos](https://huggingface.co/spaces/Carlexx/Ltx-SuperTime-60Secondos/)
|
52 |
-
- [Novinho](https://huggingface.co/spaces/Carlexxx/Novinho/)
|
|
|
|
|
|
1 |
---
|
2 |
title: Euia-AducSdr
|
3 |
+
emoji: π₯
|
4 |
colorFrom: indigo
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
sdk_version: 5.42.0
|
8 |
app_file: app.py
|
9 |
+
pinned: true
|
10 |
+
license: agpl-3.0
|
11 |
+
short_description: Uma implementaΓ§Γ£o aberta e funcional da arquitetura ADUC-SDR
|
12 |
---
|
13 |
|
14 |
### π§π· PortuguΓͺs
|
|
|
33 |
|
34 |
---
|
35 |
|
36 |
+
## **Aviso de Propriedade Intelectual e Patenteamento**
|
37 |
|
38 |
+
### **Processo de Patenteamento em Andamento (EM PORTUGUΓS):**
|
39 |
|
40 |
+
A arquitetura e o mΓ©todo **ADUC (Automated Discovery and Orchestration of Complex tasks)**, conforme descritos neste projeto e nas reivindicaΓ§Γ΅es associadas, estΓ£o **atualmente em processo de patenteamento**.
|
41 |
|
42 |
+
O titular dos direitos, Carlos Rodrigues dos Santos, estΓ‘ buscando proteΓ§Γ£o legal para as inovaΓ§Γ΅es chave da arquitetura ADUC, que incluem, mas nΓ£o se limitam a:
|
43 |
+
|
44 |
+
* FragmentaΓ§Γ£o e escalonamento de solicitaΓ§Γ΅es que excedem limites de contexto de modelos de IA.
|
45 |
+
* DistribuiΓ§Γ£o inteligente de sub-tarefas para especialistas heterogΓͺneos.
|
46 |
+
* Gerenciamento de estado persistido com avaliaΓ§Γ£o iterativa e realimentaΓ§Γ£o para o planejamento de prΓ³ximas etapas.
|
47 |
+
* Planejamento e roteamento sensΓvel a custo, latΓͺncia e requisitos de qualidade.
|
48 |
+
* O uso de "tokens universais" para comunicaΓ§Γ£o agnΓ³stica a modelos.
|
49 |
+
|
50 |
+
Ao utilizar este software e a arquitetura ADUC aqui implementada, vocΓͺ reconhece a natureza inovadora desta arquitetura e que a **reproduΓ§Γ£o ou exploraΓ§Γ£o da lΓ³gica central da ADUC em sistemas independentes pode infringir direitos de patente pendente.**
|
51 |
+
|
52 |
+
---
|
53 |
+
|
54 |
+
### **Patent Pending (IN ENGLISH):**
|
55 |
+
|
56 |
+
The **ADUC (Automated Discovery and Orchestration of Complex tasks)** architecture and method, as described in this project and its associated claims, are **currently in the process of being patented.**
|
57 |
+
|
58 |
+
The rights holder, Carlos Rodrigues dos Santos, is seeking legal protection for the key innovations of the ADUC architecture, including, but not limited to:
|
59 |
+
|
60 |
+
* Fragmentation and scaling of requests exceeding AI model context limits.
|
61 |
+
* Intelligent distribution of sub-tasks to heterogeneous specialists.
|
62 |
+
* Persistent state management with iterative evaluation and feedback for planning subsequent steps.
|
63 |
+
* Cost, latency, and quality-aware planning and routing.
|
64 |
+
* The use of "universal tokens" for model-agnostic communication.
|
65 |
+
|
66 |
+
By using this software and the ADUC architecture implemented herein, you acknowledge the innovative nature of this architecture and that **the reproduction or exploitation of ADUC's core logic in independent systems may infringe upon pending patent rights.**
|
67 |
+
|
68 |
+
---
|
69 |
+
|
70 |
+
### Detalhes TΓ©cnicos e ReivindicaΓ§Γ΅es da ADUC
|
71 |
+
|
72 |
+
#### π§π· DefiniΓ§Γ£o Curta (para Tese e Patente)
|
73 |
+
|
74 |
+
**ADUC** Γ© um *framework prΓ©-input* e *intermediΓ‘rio* de **gerenciamento de prompts** que:
|
75 |
+
|
76 |
+
1. **fragmenta** solicitaΓ§Γ΅es acima do limite de contexto de qualquer modelo,
|
77 |
+
2. **escala linearmente** (processo sequencial com memΓ³ria persistida),
|
78 |
+
3. **distribui** sub-tarefas a **especialistas** (modelos/ferramentas heterogΓͺneos), e
|
79 |
+
4. **realimenta** a prΓ³xima etapa com avaliaΓ§Γ£o do que foi feito/esperado (LLM diretor).
|
80 |
+
|
81 |
+
NΓ£o Γ© um modelo; Γ© uma **camada orquestradora** plugΓ‘vel antes do input de modelos existentes (texto, imagem, Γ‘udio, vΓdeo), usando *tokens universais* e a tecnologia atual.
|
82 |
+
|
83 |
+
#### π¬π§ Short Definition (for Thesis and Patent)
|
84 |
+
|
85 |
+
**ADUC** is a *pre-input* and *intermediate* **prompt management framework** that:
|
86 |
+
|
87 |
+
1. **fragments** requests exceeding any model's context limit,
|
88 |
+
2. **scales linearly** (sequential process with persisted memory),
|
89 |
+
3. **distributes** sub-tasks to **specialists** (heterogeneous models/tools), and
|
90 |
+
4. **feeds back** to the next step with an evaluation of what was done/expected (director LLM).
|
91 |
+
|
92 |
+
It is not a model; it is a pluggable **orchestration layer** before the input of existing models (text, image, audio, video), using *universal tokens* and current technology.
|
93 |
+
|
94 |
+
---
|
95 |
+
|
96 |
+
#### π§π· Elementos Essenciais (TelegrΓ‘fico)
|
97 |
+
|
98 |
+
* **AgnΓ³stico a modelos:** opera com qualquer LLM/difusor/API.
|
99 |
+
* **PrΓ©-input manager:** recebe pedido do usuΓ‘rio, **divide** em blocos β€ limite de tokens, **prioriza**, **agenda** e **roteia**.
|
100 |
+
* **MemΓ³ria persistida:** resultados/latentes/βecoβ viram **estado compartilhado** para o prΓ³ximo bloco (nada Γ© ignorado).
|
101 |
+
* **Especialistas:** *routers* decidem quem faz o quΓͺ (ex.: βdescriΓ§Γ£o β LLM-Aβ, βkeyframe β Img-Bβ, βvΓdeo β Vid-Cβ).
|
102 |
+
* **Controle de qualidade:** LLM diretor compara *o que fez* Γ *o que deveria* Γ *o que falta* e **regenera objetivos** do prΓ³ximo fragmento.
|
103 |
+
* **Custo/latΓͺncia-aware:** planeja pela **VRAM/tempo/custo**, nΓ£o tenta βabraΓ§ar tudo de uma vezβ.
|
104 |
+
|
105 |
+
#### π¬π§ Essential Elements (Telegraphic)
|
106 |
+
|
107 |
+
* **Model-agnostic:** operates with any LLM/diffuser/API.
|
108 |
+
* **Pre-input manager:** receives user request, **divides** into blocks β€ token limit, **prioritizes**, **schedules**, and **routes**.
|
109 |
+
* **Persisted memory:** results/latents/βechoβ become **shared state** for the next block (nothing is ignored).
|
110 |
+
* **Specialists:** *routers* decide who does what (e.g., βdescription β LLM-Aβ, βkeyframe β Img-Bβ, βvideo β Vid-Cβ).
|
111 |
+
* **Quality control:** director LLM compares *what was done* Γ *what should be done* Γ *what is missing* and **regenerates objectives** for the next fragment.
|
112 |
+
* **Cost/latency-aware:** plans by **VRAM/time/cost**, does not try to βembrace everything at onceβ.
|
113 |
+
|
114 |
+
---
|
115 |
+
|
116 |
+
#### π§π· ReivindicaΓ§Γ΅es Independentes (MΓ©todo e Sistema)
|
117 |
+
|
118 |
+
**ReivindicaΓ§Γ£o Independente (MΓ©todo) β VersΓ£o Enxuta:**
|
119 |
+
|
120 |
+
1. **MΓ©todo** de **orquestraΓ§Γ£o de prompts** para execuΓ§Γ£o de tarefas acima do limite de contexto de modelos de IA, compreendendo:
|
121 |
+
(a) **receber** uma solicitaΓ§Γ£o que excede um limite de tokens;
|
122 |
+
(b) **analisar** a solicitaΓ§Γ£o por um **LLM diretor** e **fragmentΓ‘-la** em sub-tarefas β€ limite;
|
123 |
+
(c) **selecionar** especialistas de execuΓ§Γ£o para cada sub-tarefa com base em capacidades declaradas;
|
124 |
+
(d) **gerar** prompts especΓficos por sub-tarefa em **tokens universais**, incluindo referΓͺncias ao **estado persistido** de execuΓ§Γ΅es anteriores;
|
125 |
+
(e) **executar sequencialmente** as sub-tarefas e **persistir** suas saΓdas como memΓ³ria (incluindo latentes/eco/artefatos);
|
126 |
+
(f) **avaliar** automaticamente a saΓda versus metas declaradas e **regenerar objetivos** do prΓ³ximo fragmento;
|
127 |
+
(g) **iterar** (b)β(f) atΓ© que os critΓ©rios de completude sejam atendidos, produzindo o resultado agregado;
|
128 |
+
em que o framework **escala linearmente** no tempo e armazenamento fΓsico, **independente** da janela de contexto dos modelos subjacentes.
|
129 |
+
|
130 |
+
**ReivindicaΓ§Γ£o Independente (Sistema):**
|
131 |
+
|
132 |
+
2. **Sistema** de orquestraΓ§Γ£o de prompts, compreendendo: um **planejador LLM diretor**; um **roteador de especialistas**; um **banco de estado persistido** (incl. memΓ³ria cinΓ©tica para vΓdeo); um **gerador de prompts universais**; e um **mΓ³dulo de avaliaΓ§Γ£o/realimentaΓ§Γ£o**, acoplados por uma **API prΓ©-input** a modelos heterogΓͺneos.
|
133 |
+
|
134 |
+
#### π¬π§ Independent Claims (Method and System)
|
135 |
+
|
136 |
+
**Independent Claim (Method) β Concise Version:**
|
137 |
+
|
138 |
+
1. A **method** for **prompt orchestration** for executing tasks exceeding AI model context limits, comprising:
|
139 |
+
(a) **receiving** a request that exceeds a token limit;
|
140 |
+
(b) **analyzing** the request by a **director LLM** and **fragmenting it** into sub-tasks β€ the limit;
|
141 |
+
(c) **selecting** execution specialists for each sub-task based on declared capabilities;
|
142 |
+
(d) **generating** specific prompts per sub-task in **universal tokens**, including references to the **persisted state** of previous executions;
|
143 |
+
(e) **sequentially executing** the sub-tasks and **persisting** their outputs as memory (including latents/echo/artifacts);
|
144 |
+
(f) **automatically evaluating** the output against declared goals and **regenerating objectives** for the next fragment;
|
145 |
+
(g) **iterating** (b)β(f) until completion criteria are met, producing the aggregated result;
|
146 |
+
wherein the framework **scales linearly** in time and physical storage, **independent** of the context window of the underlying models.
|
147 |
+
|
148 |
+
**Independent Claim (System):**
|
149 |
+
|
150 |
+
2. A prompt orchestration **system**, comprising: a **director LLM planner**; a **specialist router**; a **persisted state bank** (incl. kinetic memory for video); a **universal prompt generator**; and an **evaluation/feedback module**, coupled via a **pre-input API** to heterogeneous models.
|
151 |
+
|
152 |
+
---
|
153 |
+
|
154 |
+
#### π§π· Dependentes Γteis
|
155 |
+
|
156 |
+
* (3) Onde o roteamento considera **custo/latΓͺncia/VRAM** e metas de qualidade.
|
157 |
+
* (4) Onde o banco de estado inclui **eco cinΓ©tico** para vΓdeo (ΓΊltimos *n* frames/latentes/fluxo).
|
158 |
+
* (5) Onde a avaliaΓ§Γ£o usa mΓ©tricas especΓficas por domΓnio (Lflow, consistΓͺncia semΓ’ntica, etc.).
|
159 |
+
* (6) Onde *tokens universais* padronizam instruΓ§Γ΅es entre especialistas.
|
160 |
+
* (7) Onde a orquestraΓ§Γ£o decide **cut vs continuous** e **corte regenerativo** (DΓ©jΓ -Vu) ao editar vΓdeo.
|
161 |
+
* (8) Onde o sistema **nunca descarta** conteΓΊdo excedente: **reagenda** em novos fragmentos.
|
162 |
+
|
163 |
+
#### π¬π§ Useful Dependents
|
164 |
+
|
165 |
+
* (3) Wherein routing considers **cost/latency/VRAM** and quality goals.
|
166 |
+
* (4) Wherein the state bank includes **kinetic echo** for video (last *n* frames/latents/flow).
|
167 |
+
* (5) Wherein evaluation uses domain-specific metrics (Lflow, semantic consistency, etc.).
|
168 |
+
* (6) Wherein *universal tokens* standardize instructions between specialists.
|
169 |
+
* (7) Wherein orchestration decides **cut vs continuous** and **regenerative cut** (DΓ©jΓ -Vu) when editing video.
|
170 |
+
* (8) Wherein the system **never discards** excess content: it **reschedules** it in new fragments.
|
171 |
+
|
172 |
+
---
|
173 |
+
|
174 |
+
#### π§π· Como isso conversa com SDR (VΓdeo)
|
175 |
+
|
176 |
+
* **Eco CinΓ©tico**: Γ© um **tipo de estado persistido** consumido pelo prΓ³ximo passo.
|
177 |
+
* **DΓ©jΓ -Vu (Corte Regenerativo)**: Γ© **uma polΓtica de orquestraΓ§Γ£o** aplicada quando hΓ‘ ediΓ§Γ£o; ADUC decide, monta os prompts certos e chama o especialista de vΓdeo.
|
178 |
+
* **Cut vs Continuous**: decisΓ£o do **diretor** com base em estado + metas; ADUC roteia e garante a sobreposiΓ§Γ£o/remoΓ§Γ£o final.
|
179 |
+
|
180 |
+
#### π¬π§ How this Converses with SDR (Video)
|
181 |
+
|
182 |
+
* **Kinetic Echo**: is a **type of persisted state** consumed by the next step.
|
183 |
+
* **DΓ©jΓ -Vu (Regenerative Cut)**: is an **orchestration policy** applied during editing; ADUC decides, crafts the right prompts, and calls the video specialist.
|
184 |
+
* **Cut vs Continuous**: decision made by the **director** based on state + goals; ADUC routes and ensures the final overlap/removal.
|
185 |
+
|
186 |
+
---
|
187 |
+
|
188 |
+
#### π§π· Mensagem Clara ao UsuΓ‘rio (ExperiΓͺncia)
|
189 |
+
|
190 |
+
> βSeu pedido excede o limite X do modelo Y. Em vez de truncar silenciosamente, o **ADUC** dividirΓ‘ e **entregarΓ‘ 100%** do conteΓΊdo por etapas coordenadas.β
|
191 |
+
|
192 |
+
Isso Γ© diferencial prΓ‘tico e jurΓdico: **nΓ£o-obviedade** por transformar limite de contexto em **pipeline controlado**, com **persistΓͺncia de estado** e **avaliaΓ§Γ£o iterativa**.
|
193 |
+
|
194 |
+
#### π¬π§ Clear User Message (Experience)
|
195 |
+
|
196 |
+
> "Your request exceeds model Y's limit X. Instead of silently truncating, **ADUC** will divide and **deliver 100%** of the content through coordinated steps."
|
197 |
+
|
198 |
+
This is a practical and legal differentiator: **non-obviousness** by transforming context limits into a **controlled pipeline**, with **state persistence** and **iterative evaluation**.
|
199 |
|
200 |
---
|
201 |
|
|
|
206 |
- **GitHub:** [https://github.com/carlex22/Aduc-sdr](https://github.com/carlex22/Aduc-sdr)
|
207 |
- **Hugging Face Spaces:**
|
208 |
- [Ltx-SuperTime-60Secondos](https://huggingface.co/spaces/Carlexx/Ltx-SuperTime-60Secondos/)
|
209 |
+
- [Novinho](https://huggingface.co/spaces/Carlexxx/Novinho/)
|
210 |
+
|
211 |
+
---
|
app.py
CHANGED
@@ -4,12 +4,11 @@
|
|
4 |
# Contato:
|
5 |
# Carlos Rodrigues dos Santos
|
6 | |
7 |
-
# Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
|
8 |
#
|
9 |
# RepositΓ³rios e Projetos Relacionados:
|
10 |
# GitHub: https://github.com/carlex22/Aduc-sdr
|
11 |
-
#
|
12 |
-
# Hugging Face: https://huggingface.co/spaces/
|
13 |
#
|
14 |
# Este programa Γ© software livre: vocΓͺ pode redistribuΓ-lo e/ou modificΓ‘-lo
|
15 |
# sob os termos da LicenΓ§a PΓΊblica Geral Affero da GNU como publicada pela
|
@@ -24,82 +23,238 @@
|
|
24 |
# VocΓͺ deve ter recebido uma cΓ³pia da LicenΓ§a PΓΊblica Geral Affero da GNU
|
25 |
# junto com este programa. Se nΓ£o, veja <https://www.gnu.org/licenses/>.
|
26 |
|
27 |
-
# ---
|
28 |
|
29 |
-
# --- Ato 1: A ConvocaΓ§Γ£o da Orquestra (ImportaΓ§Γ΅es) ---
|
30 |
import gradio as gr
|
31 |
import torch
|
32 |
import os
|
|
|
33 |
import yaml
|
34 |
from PIL import Image, ImageOps, ExifTags
|
35 |
import shutil
|
36 |
-
import gc
|
37 |
import subprocess
|
38 |
import google.generativeai as genai
|
39 |
import numpy as np
|
40 |
import imageio
|
41 |
from pathlib import Path
|
42 |
-
import huggingface_hub
|
43 |
import json
|
44 |
import time
|
45 |
-
import
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
# ImportaΓ§Γ΅es condicionais que dependem dos modelos
|
53 |
-
if ENABLE_MODELS:
|
54 |
-
from inference import create_ltx_video_pipeline, load_image_to_tensor_with_resize_and_crop, ConditioningItem, calculate_padding
|
55 |
-
from dreamo_helpers import dreamo_generator_singleton
|
56 |
-
else:
|
57 |
-
# Definimos placeholders para que o resto do cΓ³digo nΓ£o falhe na importaΓ§Γ£o
|
58 |
-
ConditioningItem = dict
|
59 |
-
|
60 |
-
# --- Ato 2: A PreparaΓ§Γ£o do Palco (ConfiguraΓ§Γ΅es Condicionais) ---
|
61 |
-
if ENABLE_MODELS:
|
62 |
-
config_file_path = "configs/ltxv-13b-0.9.8-distilled.yaml"
|
63 |
-
with open(config_file_path, "r") as file: PIPELINE_CONFIG_YAML = yaml.safe_load(file)
|
64 |
-
|
65 |
-
LTX_REPO = "Lightricks/LTX-Video"
|
66 |
-
models_dir = "downloaded_models_gradio"
|
67 |
-
Path(models_dir).mkdir(parents=True, exist_ok=True)
|
68 |
-
|
69 |
-
print("MODO COMPLETO ATIVADO: Carregando pipelines LTX na CPU (estado de repouso)...")
|
70 |
-
distilled_model_actual_path = huggingface_hub.hf_hub_download(repo_id=LTX_REPO, filename=PIPELINE_CONFIG_YAML["checkpoint_path"], local_dir=models_dir, local_dir_use_symlinks=False)
|
71 |
-
pipeline_instance = create_ltx_video_pipeline(
|
72 |
-
ckpt_path=distilled_model_actual_path,
|
73 |
-
precision=PIPELINE_CONFIG_YAML["precision"],
|
74 |
-
text_encoder_model_name_or_path=PIPELINE_CONFIG_YAML["text_encoder_model_name_or_path"],
|
75 |
-
sampler=PIPELINE_CONFIG_YAML["sampler"],
|
76 |
-
device='cpu'
|
77 |
-
)
|
78 |
-
print("Modelos LTX prontos (na CPU).")
|
79 |
-
else:
|
80 |
-
# Em modo demo, definimos as variΓ‘veis dos modelos como None para evitar erros.
|
81 |
-
pipeline_instance = None
|
82 |
-
dreamo_generator_singleton = None
|
83 |
-
PIPELINE_CONFIG_YAML = {}
|
84 |
-
print("MODO DEMO ATIVADO: Carregamento de modelos pesados ignorado.")
|
85 |
|
86 |
WORKSPACE_DIR = "aduc_workspace"
|
87 |
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
|
88 |
-
VIDEO_FPS = 24
|
89 |
-
TARGET_RESOLUTION = 420
|
90 |
-
|
91 |
|
92 |
-
#
|
|
|
|
|
93 |
|
94 |
def robust_json_parser(raw_text: str) -> dict:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
try:
|
96 |
-
start_index =
|
97 |
if start_index != -1 and end_index != -1 and end_index > start_index:
|
98 |
-
json_str =
|
|
|
99 |
else: raise ValueError("Nenhum objeto JSON vΓ‘lido encontrado na resposta da IA.")
|
100 |
except json.JSONDecodeError as e: raise ValueError(f"Falha ao decodificar JSON: {e}")
|
101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
def extract_image_exif(image_path: str) -> str:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
try:
|
104 |
img = Image.open(image_path); exif_data = img._getexif()
|
105 |
if not exif_data: return "No EXIF metadata found."
|
@@ -109,17 +264,39 @@ def extract_image_exif(image_path: str) -> str:
|
|
109 |
return metadata_str if metadata_str else "No relevant EXIF metadata found."
|
110 |
except Exception: return "Could not read EXIF data."
|
111 |
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
prompt_file = "prompts/unified_storyboard_prompt.txt"
|
117 |
with open(os.path.join(os.path.dirname(__file__), prompt_file), "r", encoding="utf-8") as f: template = f.read()
|
118 |
director_prompt = template.format(user_prompt=prompt, num_fragments=int(num_fragments), image_metadata=exif_metadata)
|
119 |
genai.configure(api_key=GEMINI_API_KEY)
|
120 |
-
model = genai.GenerativeModel('gemini-
|
121 |
-
|
122 |
-
|
|
|
|
|
|
|
|
|
123 |
try:
|
124 |
storyboard_data = robust_json_parser(response.text)
|
125 |
storyboard = storyboard_data.get("scene_storyboard", [])
|
@@ -127,62 +304,134 @@ def run_storyboard_generation(num_fragments: int, prompt: str, initial_image_pat
|
|
127 |
return storyboard
|
128 |
except Exception as e: raise gr.Error(f"O Roteirista (Gemini) falhou ao criar o roteiro: {e}. Resposta recebida: {response.text}")
|
129 |
|
130 |
-
def
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
|
142 |
-
|
143 |
-
|
144 |
-
|
|
|
|
|
|
|
145 |
if not storyboard: raise gr.Error("Nenhum roteiro para gerar keyframes.")
|
146 |
-
|
147 |
-
|
148 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
149 |
try:
|
150 |
-
|
151 |
-
|
152 |
-
|
153 |
for i, scene_description in enumerate(storyboard):
|
154 |
-
progress(i / len(storyboard), desc=f"
|
155 |
-
log_history += f"\n---
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
output_path = os.path.join(WORKSPACE_DIR, f"keyframe_{i+1}.png")
|
169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
170 |
image.save(output_path)
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
yield {keyframe_log_output: gr.update(value=log_history), keyframe_gallery_output: gr.update(value=generated_images_for_gallery), keyframe_images_state: keyframe_paths}
|
181 |
|
182 |
def get_initial_motion_prompt(user_prompt: str, start_image_path: str, destination_image_path: str, dest_scene_desc: str):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
183 |
if not GEMINI_API_KEY: raise gr.Error("Chave da API Gemini nΓ£o configurada!")
|
184 |
try:
|
185 |
-
genai.configure(api_key=GEMINI_API_KEY); model = genai.GenerativeModel('gemini-
|
186 |
with open(os.path.join(os.path.dirname(__file__), prompt_file), "r", encoding="utf-8") as f: template = f.read()
|
187 |
cinematographer_prompt = template.format(user_prompt=user_prompt, destination_scene_description=dest_scene_desc)
|
188 |
start_img, dest_img = Image.open(start_image_path), Image.open(destination_image_path)
|
@@ -191,363 +440,300 @@ def get_initial_motion_prompt(user_prompt: str, start_image_path: str, destinati
|
|
191 |
return response.text.strip()
|
192 |
except Exception as e: raise gr.Error(f"O Cineasta de IA (Inicial) falhou: {e}. Resposta: {getattr(e, 'text', 'No text available.')}")
|
193 |
|
194 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
195 |
if not GEMINI_API_KEY: raise gr.Error("Chave da API Gemini nΓ£o configurada!")
|
196 |
try:
|
197 |
-
genai.configure(api_key=GEMINI_API_KEY); model = genai.GenerativeModel('gemini-
|
198 |
with open(os.path.join(os.path.dirname(__file__), prompt_file), "r", encoding="utf-8") as f: template = f.read()
|
199 |
-
|
200 |
-
with imageio.get_reader(memory_media_path) as reader:
|
201 |
-
mem_img = Image.fromarray(reader.get_data(0))
|
202 |
path_img, dest_img = Image.open(path_image_path), Image.open(destination_image_path)
|
203 |
-
model_contents = ["START Image (from Kinetic Echo):", mem_img, "MIDPOINT Image (Path):", path_img, "DESTINATION Image (Destination):", dest_img,
|
204 |
response = model.generate_content(model_contents)
|
205 |
-
|
206 |
-
|
|
|
|
|
207 |
|
208 |
-
@spaces.GPU(duration=360)
|
209 |
def run_video_production(
|
|
|
210 |
video_duration_seconds, video_fps, eco_video_frames, use_attention_slicing,
|
211 |
-
fragment_duration_frames, mid_cond_strength, num_inference_steps,
|
212 |
-
|
|
|
213 |
progress=gr.Progress()
|
214 |
):
|
215 |
-
|
216 |
-
|
217 |
-
|
218 |
-
|
219 |
-
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
-
|
224 |
-
|
225 |
-
production_log_output: log_history, video_gallery_glitch: [],
|
226 |
-
prod_media_start_output: gr.update(value=None),
|
227 |
-
prod_media_mid_output: gr.update(value=None, visible=False),
|
228 |
-
prod_media_end_output: gr.update(value=None),
|
229 |
-
}
|
230 |
-
|
231 |
-
seed = int(time.time())
|
232 |
-
target_device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
233 |
try:
|
234 |
-
|
235 |
-
|
236 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
237 |
|
238 |
-
num_transitions = len(keyframe_images_state) - 2
|
239 |
for i in range(num_transitions):
|
240 |
fragment_num = i + 1
|
241 |
-
progress(i / num_transitions, desc=f"
|
242 |
log_history += f"\n--- FRAGMENTO {fragment_num}/{num_transitions} ---\n"
|
|
|
243 |
|
244 |
-
if i == 0:
|
245 |
-
start_path, destination_path =
|
246 |
-
dest_scene_desc = scene_storyboard[
|
247 |
-
log_history += f" - InΓcio (
|
248 |
current_motion_prompt = get_initial_motion_prompt(prompt_geral, start_path, destination_path, dest_scene_desc)
|
249 |
-
conditioning_items_data = [(start_path, 0, 1.0), (destination_path,
|
250 |
-
|
251 |
-
|
252 |
-
prod_media_start_output: gr.update(value=start_path),
|
253 |
-
prod_media_mid_output: gr.update(value=None, visible=False),
|
254 |
-
prod_media_end_output: gr.update(value=destination_path),
|
255 |
-
}
|
256 |
else:
|
257 |
-
memory_path, path_path, destination_path = kinetic_memory_path,
|
258 |
-
path_scene_desc, dest_scene_desc = scene_storyboard[i], scene_storyboard[i
|
259 |
-
log_history += f" -
|
260 |
-
|
261 |
-
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
|
266 |
-
|
267 |
-
prod_media_mid_output: gr.update(value=path_path, visible=True),
|
268 |
-
prod_media_end_output: gr.update(value=destination_path),
|
269 |
-
}
|
270 |
|
271 |
story_history += f"\n- Ato {fragment_num + 1}: {current_motion_prompt}"
|
272 |
log_history += f" - InstruΓ§Γ£o do Cineasta: '{current_motion_prompt}'\n"; yield {production_log_output: log_history}
|
273 |
|
274 |
-
|
275 |
-
full_fragment_path,
|
276 |
-
|
277 |
-
|
278 |
-
|
279 |
-
|
280 |
-
|
|
|
281 |
)
|
282 |
-
log_history += f" - LOG: Gerei
|
283 |
-
|
284 |
-
trimmed_fragment_path = os.path.join(WORKSPACE_DIR, f"fragment_{fragment_num}_trimmed.mp4")
|
285 |
-
trim_video_to_frames(full_fragment_path, trimmed_fragment_path, int(fragment_duration_frames))
|
286 |
-
log_history += f" - LOG: Reduzi o fragmento_{fragment_num} para {int(fragment_duration_frames)} frames.\n"
|
287 |
-
yield {production_log_output: log_history}
|
288 |
is_last_fragment = (i == num_transitions - 1)
|
289 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
290 |
eco_output_path = os.path.join(WORKSPACE_DIR, f"eco_from_frag_{fragment_num}.mp4")
|
291 |
kinetic_memory_path = extract_last_n_frames_as_video(trimmed_fragment_path, eco_output_path, int(eco_video_frames))
|
292 |
-
|
293 |
-
log_history += f" -
|
294 |
-
else:
|
295 |
-
log_history += f" - Este Γ© o ΓΊltimo fragmento, nΓ£o Γ© necessΓ‘rio gerar um eco.\n"
|
296 |
-
video_fragments.append(trimmed_fragment_path)
|
297 |
-
yield {production_log_output: log_history, video_gallery_glitch: video_fragments}
|
298 |
-
progress(1.0, desc="ProduΓ§Γ£o ConcluΓda.")
|
299 |
-
log_history += "\nProduΓ§Γ£o de todos os fragmentos concluΓda.\n"
|
300 |
-
yield {production_log_output: log_history, video_gallery_glitch: video_fragments, fragment_list_state: video_fragments}
|
301 |
-
finally:
|
302 |
-
if ENABLE_MODELS:
|
303 |
-
pipeline_instance.to('cpu')
|
304 |
-
gc.collect()
|
305 |
-
torch.cuda.empty_cache()
|
306 |
-
|
307 |
-
def process_image_to_square(image_path: str, size: int = TARGET_RESOLUTION) -> str:
|
308 |
-
if not image_path: return None
|
309 |
-
try:
|
310 |
-
img = Image.open(image_path).convert("RGB"); img_square = ImageOps.fit(img, (size, size), Image.Resampling.LANCZOS)
|
311 |
-
output_path = os.path.join(WORKSPACE_DIR, f"initial_ref_{size}x{size}.png"); img_square.save(output_path)
|
312 |
-
return output_path
|
313 |
-
except Exception as e: raise gr.Error(f"Falha ao processar a imagem de referΓͺncia: {e}")
|
314 |
|
315 |
-
|
316 |
-
if not ENABLE_MODELS: return None
|
317 |
-
if media_path.lower().endswith(('.mp4', '.mov', '.avi')):
|
318 |
-
with imageio.get_reader(media_path) as reader:
|
319 |
-
first_frame_np = reader.get_data(0)
|
320 |
-
temp_img_path = os.path.join(WORKSPACE_DIR, f"temp_frame_from_{os.path.basename(media_path)}.png")
|
321 |
-
Image.fromarray(first_frame_np).save(temp_img_path)
|
322 |
-
return load_image_to_tensor_with_resize_and_crop(temp_img_path, height, width)
|
323 |
-
else:
|
324 |
-
return load_image_to_tensor_with_resize_and_crop(media_path, height, width)
|
325 |
-
|
326 |
-
def run_ltx_animation(
|
327 |
-
current_fragment_index, motion_prompt, conditioning_items_data,
|
328 |
-
width, height, seed, cfg, progress,
|
329 |
-
video_total_frames, video_fps, use_attention_slicing, num_inference_steps
|
330 |
-
):
|
331 |
-
if not ENABLE_MODELS: return None, 0
|
332 |
-
progress(0, desc=f"[CΓ’mera LTX] Filmando Cena {current_fragment_index}...");
|
333 |
-
output_path = os.path.join(WORKSPACE_DIR, f"fragment_{current_fragment_index}_full.mp4")
|
334 |
-
target_device = pipeline_instance.device
|
335 |
-
try:
|
336 |
-
if use_attention_slicing: pipeline_instance.enable_attention_slicing()
|
337 |
-
conditioning_items = [ConditioningItem(load_conditioning_tensor(p, height, width).to(target_device), s, t) for p, s, t in conditioning_items_data]
|
338 |
-
actual_num_frames = int(round((float(video_total_frames) - 1.0) / 8.0) * 8 + 1)
|
339 |
-
padded_h, padded_w = ((height - 1) // 32 + 1) * 32, ((width - 1) // 32 + 1) * 32
|
340 |
-
padding_vals = calculate_padding(height, width, padded_h, padded_w)
|
341 |
-
for item in conditioning_items: item.media_item = torch.nn.functional.pad(item.media_item, padding_vals)
|
342 |
-
first_pass_config = PIPELINE_CONFIG_YAML.get("first_pass", {}).copy()
|
343 |
-
first_pass_config['num_inference_steps'] = int(num_inference_steps)
|
344 |
-
kwargs = {"prompt": motion_prompt, "negative_prompt": "blurry, distorted, bad quality, artifacts", "height": padded_h, "width": padded_w, "num_frames": actual_num_frames, "frame_rate": video_fps, "generator": torch.Generator(device=target_device).manual_seed(int(seed) + current_fragment_index), "output_type": "pt", "guidance_scale": float(cfg), "timesteps": first_pass_config.get("timesteps"), "conditioning_items": conditioning_items, "decode_timestep": PIPELINE_CONFIG_YAML.get("decode_timestep"), "decode_noise_scale": PIPELINE_CONFIG_YAML.get("decode_noise_scale"), "stochastic_sampling": PIPELINE_CONFIG_YAML.get("stochastic_sampling"), "image_cond_noise_scale": 0.15, "is_video": True, "vae_per_channel_normalize": True, "mixed_precision": (PIPELINE_CONFIG_YAML.get("precision") == "mixed_precision"), "enhance_prompt": False, "decode_every": 4, "num_inference_steps": int(num_inference_steps)}
|
345 |
-
result_tensor = pipeline_instance(**kwargs).images
|
346 |
-
pad_l, pad_r, pad_t, pad_b = map(int, padding_vals); slice_h = -pad_b if pad_b > 0 else None; slice_w = -pad_r if pad_r > 0 else None
|
347 |
-
cropped_tensor = result_tensor[:, :, :actual_num_frames, pad_t:slice_h, pad_l:slice_w]
|
348 |
-
video_np = (cropped_tensor[0].permute(1, 2, 3, 0).cpu().float().numpy() * 255).astype(np.uint8)
|
349 |
-
with imageio.get_writer(output_path, fps=video_fps, codec='libx264', quality=8) as writer:
|
350 |
-
for i, frame in enumerate(video_np): writer.append_data(frame)
|
351 |
-
return output_path, actual_num_frames
|
352 |
-
finally:
|
353 |
-
if ENABLE_MODELS and use_attention_slicing:
|
354 |
-
pipeline_instance.disable_attention_slicing()
|
355 |
|
356 |
-
|
357 |
-
|
358 |
-
|
359 |
-
|
360 |
-
|
|
|
|
|
|
|
361 |
|
362 |
-
|
363 |
-
|
364 |
-
|
365 |
-
result = subprocess.run(cmd_probe, shell=True, check=True, text=True, capture_output=True)
|
366 |
-
total_frames = int(result.stdout.strip())
|
367 |
-
if n_frames >= total_frames:
|
368 |
-
shutil.copyfile(input_path, output_path)
|
369 |
-
return output_path
|
370 |
-
start_frame = total_frames - n_frames
|
371 |
-
cmd_ffmpeg = f"ffmpeg -y -v error -i \"{input_path}\" -vf \"select='gte(n,{start_frame})'\" -vframes {n_frames} -an \"{output_path}\""
|
372 |
-
subprocess.run(cmd_ffmpeg, shell=True, check=True, text=True)
|
373 |
-
return output_path
|
374 |
-
except (subprocess.CalledProcessError, ValueError) as e:
|
375 |
-
raise gr.Error(f"FFmpeg falhou ao extrair os ΓΊltimos {n_frames} frames: {getattr(e, 'stderr', str(e))}")
|
376 |
|
377 |
-
def concatenate_and_trim_masterpiece(fragment_paths: list, fragment_duration_frames: int, eco_video_frames: int, progress=gr.Progress()):
|
378 |
-
if not fragment_paths: raise gr.Error("Nenhum fragmento de vΓdeo para concatenar.")
|
379 |
-
progress(0.1, desc="Preparando fragmentos para montagem final...");
|
380 |
-
try:
|
381 |
-
list_file_path = os.path.join(WORKSPACE_DIR, "concat_list.txt")
|
382 |
-
final_output_path = os.path.join(WORKSPACE_DIR, "masterpiece_final.mp4")
|
383 |
-
temp_files_for_concat = []
|
384 |
-
final_clip_len = int(fragment_duration_frames - eco_video_frames)
|
385 |
-
for i, p in enumerate(fragment_paths):
|
386 |
-
if i == len(fragment_paths) - 1:
|
387 |
-
temp_files_for_concat.append(os.path.abspath(p))
|
388 |
-
progress(0.1 + (i / len(fragment_paths)) * 0.8, desc=f"Mantendo ΓΊltimo fragmento: {os.path.basename(p)}")
|
389 |
-
else:
|
390 |
-
temp_path = os.path.join(WORKSPACE_DIR, f"temp_concat_{i}.mp4")
|
391 |
-
progress(0.1 + (i / len(fragment_paths)) * 0.8, desc=f"Cortando {os.path.basename(p)} para {final_clip_len} frames")
|
392 |
-
trim_video_to_frames(p, temp_path, final_clip_len)
|
393 |
-
temp_files_for_concat.append(temp_path)
|
394 |
-
progress(0.9, desc="Concatenando clipes...")
|
395 |
-
with open(list_file_path, "w") as f:
|
396 |
-
for p_temp in temp_files_for_concat: f.write(f"file '{p_temp}'\n")
|
397 |
-
subprocess.run(f"ffmpeg -y -v error -f concat -safe 0 -i \"{list_file_path}\" -c copy \"{final_output_path}\"", shell=True, check=True, text=True)
|
398 |
-
progress(1.0, desc="Montagem concluΓda!")
|
399 |
-
return final_output_path
|
400 |
-
except subprocess.CalledProcessError as e:
|
401 |
-
raise gr.Error(f"FFmpeg falhou na concatenaΓ§Γ£o final: {e.stderr}")
|
402 |
-
|
403 |
-
# --- Ato 5: A Interface com o Mundo (UI) ---
|
404 |
with gr.Blocks(theme=gr.themes.Soft()) as demo:
|
405 |
-
gr.Markdown("# NOVIM-
|
406 |
-
|
407 |
-
if not ENABLE_MODELS:
|
408 |
-
gr.Warning(
|
409 |
-
"""
|
410 |
-
**MODO DE DEMONSTRAΓΓO ATIVADO**
|
411 |
-
VocΓͺ pode explorar a interface e usar a "Etapa 1: Gerar Roteiro" se tiver uma chave da API Gemini configurada.
|
412 |
-
Para habilitar a geraΓ§Γ£o de imagens e vΓdeos (Etapas 2 e 3), vocΓͺ precisa:
|
413 |
-
1. **Fork este Space:** Clique no menu de trΓͺs pontos ao lado do tΓtulo e selecione "Duplicate this Space".
|
414 |
-
2. **Escolha um Hardware de GPU:** Na tela de duplicaΓ§Γ£o, selecione um hardware de GPU (ex: T4 Small).
|
415 |
-
3. **Edite o `app.py`:** Na aba "Files" do seu novo Space, edite o arquivo `app.py`.
|
416 |
-
4. **Ative os Modelos:** Mude a linha `ENABLE_MODELS = False` para `ENABLE_MODELS = True`.
|
417 |
-
5. Salve o arquivo. O Space serΓ‘ reiniciado com a funcionalidade completa.
|
418 |
-
"""
|
419 |
-
)
|
420 |
|
421 |
if os.path.exists(WORKSPACE_DIR): shutil.rmtree(WORKSPACE_DIR)
|
422 |
os.makedirs(WORKSPACE_DIR); Path("prompts").mkdir(exist_ok=True)
|
423 |
-
|
424 |
-
|
425 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
426 |
|
427 |
gr.Markdown("--- \n ## ETAPA 1: O ROTEIRO (IA Roteirista)")
|
428 |
with gr.Row():
|
429 |
with gr.Column(scale=1):
|
430 |
prompt_input = gr.Textbox(label="Ideia Geral (Prompt)")
|
431 |
-
num_fragments_input = gr.Slider(2,
|
432 |
-
|
|
|
|
|
|
|
|
|
433 |
director_button = gr.Button("βΆοΈ 1. Gerar Roteiro", variant="primary")
|
434 |
with gr.Column(scale=2): storyboard_to_show = gr.JSON(label="Roteiro de Cenas Gerado (em InglΓͺs)")
|
435 |
|
436 |
-
gr.Markdown("--- \n ## ETAPA 2: OS KEYFRAMES (IA
|
437 |
with gr.Row():
|
438 |
with gr.Column(scale=2):
|
439 |
-
gr.Markdown("
|
440 |
-
|
441 |
-
|
442 |
-
|
443 |
-
|
444 |
-
with gr.Column():
|
445 |
-
ref2_image = gr.Image(label="ReferΓͺncia SecundΓ‘ria (Opcional)", type="filepath")
|
446 |
-
ref2_task = gr.Dropdown(choices=["ip", "id", "style"], value="style", label="Tarefa da Ref. SecundΓ‘ria")
|
447 |
-
photographer_button = gr.Button("βΆοΈ 2. Pintar Imagens-Chave em Cadeia", variant="primary")
|
448 |
-
with gr.Column(scale=1):
|
449 |
-
keyframe_log_output = gr.Textbox(label="DiΓ‘rio de Bordo do Pintor", lines=15, interactive=False)
|
450 |
-
keyframe_gallery_output = gr.Gallery(label="Imagens-Chave Pintadas", object_fit="contain", height="auto", type="filepath")
|
451 |
|
452 |
gr.Markdown("--- \n ## ETAPA 3: A PRODUΓΓO (IA Cineasta & CΓ’mera)")
|
453 |
with gr.Row():
|
454 |
with gr.Column(scale=1):
|
455 |
-
cfg_slider = gr.Slider(
|
456 |
with gr.Accordion("Controles AvanΓ§ados de Timing e Performance", open=False):
|
457 |
-
video_duration_slider = gr.Slider(label="DuraΓ§Γ£o da GeraΓ§Γ£o Bruta (
|
458 |
-
|
459 |
-
num_inference_steps_slider = gr.Slider(label="Etapas de InferΓͺncia", minimum=
|
460 |
slicing_checkbox = gr.Checkbox(label="Usar Attention Slicing (Economiza VRAM)", value=True)
|
461 |
gr.Markdown("---"); gr.Markdown("#### Controles de DuraΓ§Γ£o (Arquitetura Eco + DΓ©jΓ Vu)")
|
462 |
-
fragment_duration_slider = gr.Slider(label="DuraΓ§Γ£o de Cada Fragmento (
|
463 |
eco_frames_slider = gr.Slider(label="Tamanho do Eco CinΓ©tico (Frames)", minimum=4, maximum=48, value=8, step=1)
|
464 |
mid_cond_strength_slider = gr.Slider(label="ForΓ§a do 'Caminho'", minimum=0.1, maximum=1.0, value=0.5, step=0.05)
|
465 |
-
|
466 |
-
"""
|
467 |
-
|
468 |
-
|
469 |
-
|
470 |
-
|
471 |
-
- **ForΓ§a do Caminho:** Define o quΓ£o forte a imagem-chave intermediΓ‘ria ('Caminho') influencia a transiΓ§Γ£o.
|
472 |
-
"""
|
473 |
-
)
|
474 |
-
animator_button = gr.Button("βΆοΈ 3. Produzir Cenas (Handoff CinΓ©tico)", variant="primary")
|
475 |
with gr.Accordion("VisualizaΓ§Γ£o das MΓdias de Condicionamento (Ao Vivo)", open=True):
|
476 |
with gr.Row():
|
477 |
prod_media_start_output = gr.Video(label="MΓdia Inicial (Eco/K1)", interactive=False)
|
478 |
prod_media_mid_output = gr.Image(label="MΓdia do Caminho (K_i-1)", interactive=False, visible=False)
|
479 |
prod_media_end_output = gr.Image(label="MΓdia de Destino (K_i)", interactive=False)
|
480 |
production_log_output = gr.Textbox(label="DiΓ‘rio de Bordo da ProduΓ§Γ£o", lines=10, interactive=False)
|
481 |
-
with gr.Column(scale=1):
|
482 |
-
|
483 |
-
|
484 |
-
|
485 |
-
|
486 |
-
|
487 |
-
|
488 |
-
final_video_output = gr.Video(label="A Obra-Prima Final", width=TARGET_RESOLUTION)
|
489 |
|
490 |
gr.Markdown(
|
491 |
"""
|
492 |
---
|
493 |
-
### A Arquitetura:
|
494 |
-
|
495 |
-
|
496 |
-
|
|
|
|
|
497 |
"""
|
498 |
)
|
499 |
-
|
500 |
-
|
501 |
-
|
502 |
-
|
503 |
-
|
504 |
-
|
505 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
506 |
|
507 |
director_button.click(
|
508 |
-
fn=
|
509 |
-
inputs=[num_fragments_input, prompt_input,
|
510 |
-
outputs=[scene_storyboard_state, prompt_geral_state,
|
511 |
-
).success(
|
512 |
-
|
513 |
-
|
514 |
-
|
515 |
-
|
516 |
-
|
517 |
-
@photographer_button.click(
|
518 |
-
inputs=[scene_storyboard_state, ref1_image, ref1_task, ref2_image, ref2_task],
|
519 |
outputs=[keyframe_log_output, keyframe_gallery_output, keyframe_images_state]
|
520 |
)
|
521 |
-
|
522 |
-
|
523 |
-
|
524 |
-
|
525 |
-
|
526 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
527 |
|
528 |
animator_button.click(
|
529 |
-
fn=
|
530 |
-
inputs=[fragment_duration_slider, eco_frames_slider],
|
531 |
-
outputs=[fragment_duration_state, eco_frames_state]
|
532 |
-
).then(
|
533 |
-
fn=run_video_production,
|
534 |
inputs=[
|
535 |
-
|
536 |
-
|
537 |
-
num_inference_steps_slider,
|
|
|
538 |
prompt_geral_state, keyframe_images_state, scene_storyboard_state, cfg_slider
|
539 |
],
|
540 |
outputs=[
|
541 |
-
production_log_output,
|
542 |
-
prod_media_start_output, prod_media_mid_output, prod_media_end_output
|
|
|
543 |
]
|
544 |
)
|
545 |
|
546 |
editor_button.click(
|
547 |
-
fn=
|
548 |
-
inputs=[fragment_list_state, fragment_duration_state, eco_frames_state],
|
549 |
outputs=[final_video_output]
|
550 |
)
|
551 |
|
552 |
if __name__ == "__main__":
|
|
|
|
|
|
|
553 |
demo.queue().launch(server_name="0.0.0.0", share=True)
|
|
|
4 |
# Contato:
|
5 |
# Carlos Rodrigues dos Santos
|
6 | |
|
|
7 |
#
|
8 |
# RepositΓ³rios e Projetos Relacionados:
|
9 |
# GitHub: https://github.com/carlex22/Aduc-sdr
|
10 |
+
# YouTube (Resultados): https://m.youtube.com/channel/UC3EgoJi_Fv7yuDpvfYNtoIQ
|
11 |
+
# Hugging Face: https://huggingface.co/spaces/Carlexx/ADUC-Sdr_Gemini_Drem0_Ltx_Video60seconds/
|
12 |
#
|
13 |
# Este programa Γ© software livre: vocΓͺ pode redistribuΓ-lo e/ou modificΓ‘-lo
|
14 |
# sob os termos da LicenΓ§a PΓΊblica Geral Affero da GNU como publicada pela
|
|
|
23 |
# VocΓͺ deve ter recebido uma cΓ³pia da LicenΓ§a PΓΊblica Geral Affero da GNU
|
24 |
# junto com este programa. Se nΓ£o, veja <https://www.gnu.org/licenses/>.
|
25 |
|
26 |
+
# --- app.py (ADUC-SDR-2.9: Diretor de Cena com Prompt Γnico e ExtraΓ§Γ£o) ---
|
27 |
|
|
|
28 |
import gradio as gr
|
29 |
import torch
|
30 |
import os
|
31 |
+
import re
|
32 |
import yaml
|
33 |
from PIL import Image, ImageOps, ExifTags
|
34 |
import shutil
|
|
|
35 |
import subprocess
|
36 |
import google.generativeai as genai
|
37 |
import numpy as np
|
38 |
import imageio
|
39 |
from pathlib import Path
|
|
|
40 |
import json
|
41 |
import time
|
42 |
+
import math
|
43 |
+
|
44 |
+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
45 |
+
|
46 |
+
from flux_kontext_helpers import flux_kontext_singleton
|
47 |
+
from ltx_manager_helpers import ltx_manager_singleton
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
WORKSPACE_DIR = "aduc_workspace"
|
50 |
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
|
|
|
|
|
|
|
51 |
|
52 |
+
# ======================================================================================
|
53 |
+
# SEΓΓO 1: FUNΓΓES UTILITΓRIAS E DE PROCESSAMENTO DE MΓDIA
|
54 |
+
# ======================================================================================
|
55 |
|
56 |
def robust_json_parser(raw_text: str) -> dict:
|
57 |
+
"""
|
58 |
+
Analisa uma string de texto bruto para encontrar e decodificar o primeiro objeto JSON vΓ‘lido.
|
59 |
+
Γ essencial para extrair respostas estruturadas de modelos de linguagem.
|
60 |
+
|
61 |
+
Args:
|
62 |
+
raw_text (str): A string completa retornada pela IA.
|
63 |
+
|
64 |
+
Returns:
|
65 |
+
dict: Um dicionΓ‘rio Python representando o objeto JSON.
|
66 |
+
|
67 |
+
Raises:
|
68 |
+
ValueError: Se nenhum objeto JSON vΓ‘lido for encontrado ou a decodificaΓ§Γ£o falhar.
|
69 |
+
"""
|
70 |
+
clean_text = raw_text.strip()
|
71 |
try:
|
72 |
+
start_index = clean_text.find('{'); end_index = clean_text.rfind('}')
|
73 |
if start_index != -1 and end_index != -1 and end_index > start_index:
|
74 |
+
json_str = clean_text[start_index : end_index + 1]
|
75 |
+
return json.loads(json_str)
|
76 |
else: raise ValueError("Nenhum objeto JSON vΓ‘lido encontrado na resposta da IA.")
|
77 |
except json.JSONDecodeError as e: raise ValueError(f"Falha ao decodificar JSON: {e}")
|
78 |
|
79 |
+
def process_image_to_square(image_path: str, size: int, output_filename: str = None) -> str:
|
80 |
+
"""
|
81 |
+
Processa uma imagem para um formato quadrado, redimensionando e cortando centralmente.
|
82 |
+
|
83 |
+
Args:
|
84 |
+
image_path (str): Caminho para a imagem de entrada.
|
85 |
+
size (int): A dimensΓ£o (altura e largura) da imagem de saΓda.
|
86 |
+
output_filename (str, optional): Nome do arquivo de saΓda.
|
87 |
+
|
88 |
+
Returns:
|
89 |
+
str: O caminho para a imagem processada.
|
90 |
+
"""
|
91 |
+
if not image_path: return None
|
92 |
+
try:
|
93 |
+
img = Image.open(image_path).convert("RGB")
|
94 |
+
img_square = ImageOps.fit(img, (size, size), Image.Resampling.LANCZOS)
|
95 |
+
if output_filename: output_path = os.path.join(WORKSPACE_DIR, output_filename)
|
96 |
+
else: output_path = os.path.join(WORKSPACE_DIR, f"edited_ref_{time.time()}.png")
|
97 |
+
img_square.save(output_path)
|
98 |
+
return output_path
|
99 |
+
except Exception as e: raise gr.Error(f"Falha ao processar a imagem de referΓͺncia: {e}")
|
100 |
+
|
101 |
+
def trim_video_to_frames(input_path: str, output_path: str, frames_to_keep: int) -> str:
|
102 |
+
"""
|
103 |
+
Usa o FFmpeg para cortar um vΓdeo, mantendo um nΓΊmero especΓfico de frames do inΓcio.
|
104 |
+
|
105 |
+
Args:
|
106 |
+
input_path (str): Caminho para o vΓdeo de entrada.
|
107 |
+
output_path (str): Caminho para salvar o vΓdeo cortado.
|
108 |
+
frames_to_keep (int): NΓΊmero de frames a serem mantidos.
|
109 |
+
|
110 |
+
Returns:
|
111 |
+
str: O caminho para o vΓdeo cortado.
|
112 |
+
"""
|
113 |
+
try:
|
114 |
+
subprocess.run(f"ffmpeg -y -v error -i \"{input_path}\" -vf \"select='lt(n,{frames_to_keep})'\" -an \"{output_path}\"", shell=True, check=True, text=True)
|
115 |
+
return output_path
|
116 |
+
except subprocess.CalledProcessError as e: raise gr.Error(f"FFmpeg falhou ao cortar vΓdeo: {e.stderr}")
|
117 |
+
|
118 |
+
def extract_last_n_frames_as_video(input_path: str, output_path: str, n_frames: int) -> str:
|
119 |
+
"""
|
120 |
+
Usa o FFmpeg para extrair os ΓΊltimos N frames de um vΓdeo para criar o "Eco CinΓ©tico".
|
121 |
+
|
122 |
+
Args:
|
123 |
+
input_path (str): Caminho para o vΓdeo de entrada.
|
124 |
+
output_path (str): Caminho para salvar o vΓdeo de saΓda (o eco).
|
125 |
+
n_frames (int): NΓΊmero de frames a serem extraΓdos do final.
|
126 |
+
|
127 |
+
Returns:
|
128 |
+
str: O caminho para o vΓdeo de eco gerado.
|
129 |
+
"""
|
130 |
+
try:
|
131 |
+
cmd_probe = f"ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 \"{input_path}\""
|
132 |
+
result = subprocess.run(cmd_probe, shell=True, check=True, text=True, capture_output=True)
|
133 |
+
total_frames = int(result.stdout.strip())
|
134 |
+
if n_frames >= total_frames: shutil.copyfile(input_path, output_path); return output_path
|
135 |
+
start_frame = total_frames - n_frames
|
136 |
+
cmd_ffmpeg = f"ffmpeg -y -v error -i \"{input_path}\" -vf \"select='gte(n,{start_frame})'\" -vframes {n_frames} -an \"{output_path}\""
|
137 |
+
subprocess.run(cmd_ffmpeg, shell=True, check=True, text=True)
|
138 |
+
return output_path
|
139 |
+
except (subprocess.CalledProcessError, ValueError) as e: raise gr.Error(f"FFmpeg falhou ao extrair os ΓΊltimos {n_frames} frames: {getattr(e, 'stderr', str(e))}")
|
140 |
+
|
141 |
+
def concatenate_final_video(fragment_paths: list, fragment_duration_frames: int, eco_video_frames: int, progress=gr.Progress()):
|
142 |
+
"""
|
143 |
+
Concatena os fragmentos de vΓdeo gerados em uma ΓΊnica "Obra-Prima" final.
|
144 |
+
Fragmentos marcados como 'cut' (identificados pelo nome do arquivo)
|
145 |
+
nΓ£o terΓ£o sua duraΓ§Γ£o cortada para preservar a intenΓ§Γ£o do corte.
|
146 |
+
|
147 |
+
Args:
|
148 |
+
fragment_paths (list): Lista de caminhos para os fragmentos de vΓdeo.
|
149 |
+
Cada caminho pode conter '_cut.mp4' no nome se for um corte.
|
150 |
+
fragment_duration_frames (int): A duraΓ§Γ£o esperada de cada clipe (usado apenas para
|
151 |
+
fragmentos que NΓO sΓ£o cortes).
|
152 |
+
eco_video_frames (int): O tamanho da sobreposiΓ§Γ£o que deve ser cortada para fragmentos
|
153 |
+
que NΓO sΓ£o cortes (usado para o 'eco').
|
154 |
+
progress (gr.Progress): Objeto do Gradio para atualizar a barra de progresso.
|
155 |
+
|
156 |
+
Returns:
|
157 |
+
str: O caminho para o vΓdeo final montado.
|
158 |
+
"""
|
159 |
+
if not fragment_paths:
|
160 |
+
raise gr.Error("Nenhum fragmento de vΓdeo para concatenar.")
|
161 |
+
|
162 |
+
progress(0.1, desc="Preparando fragmentos para a montagem final...");
|
163 |
+
|
164 |
+
try:
|
165 |
+
list_file_path = os.path.abspath(os.path.join(WORKSPACE_DIR, f"concat_list_final_{time.time()}.txt"))
|
166 |
+
final_output_path = os.path.abspath(os.path.join(WORKSPACE_DIR, "masterpiece_final.mp4"))
|
167 |
+
temp_files_for_concat = []
|
168 |
+
|
169 |
+
# Calculamos a duraΓ§Γ£o a ser mantida APENAS para fragmentos que NΓO sΓ£o cortes
|
170 |
+
# Se for um corte, consideramos a duraΓ§Γ£o total do fragmento original
|
171 |
+
duration_for_non_cut_fragments = int(fragment_duration_frames - eco_video_frames)
|
172 |
+
duration_for_non_cut_fragments = max(1, duration_for_non_cut_fragments) # Garantir que seja pelo menos 1 frame
|
173 |
+
|
174 |
+
for i, p in enumerate(fragment_paths):
|
175 |
+
is_last_fragment = (i == len(fragment_paths) - 1)
|
176 |
+
|
177 |
+
# Verificamos se o nome do arquivo contΓ©m "_cut.mp4" para identificar um corte
|
178 |
+
if "_cut.mp4" in os.path.basename(p) or is_last_fragment:
|
179 |
+
# Se for um corte ou o ΓΊltimo fragmento, usamos o arquivo original sem cortar o fim
|
180 |
+
temp_files_for_concat.append(os.path.abspath(p))
|
181 |
+
# Apenas para o ΓΊltimo fragmento, garantimos que ele tambΓ©m seja considerado
|
182 |
+
if is_last_fragment and "_cut.mp4" not in os.path.basename(p):
|
183 |
+
pass # O ΓΊltimo fragmento original jΓ‘ foi adicionado
|
184 |
+
else:
|
185 |
+
# Para fragmentos que nΓ£o sΓ£o cortes e nΓ£o sΓ£o o ΓΊltimo, cortamos o fim
|
186 |
+
temp_path = os.path.join(WORKSPACE_DIR, f"final_temp_concat_{i}.mp4")
|
187 |
+
# Aqui usamos a duraΓ§Γ£o calculada para nΓ£o-cortes (fragment_duration - eco)
|
188 |
+
trim_video_to_frames(p, temp_path, duration_for_non_cut_fragments)
|
189 |
+
temp_files_for_concat.append(os.path.abspath(temp_path))
|
190 |
+
|
191 |
+
progress(0.8, desc="Concatenando clipe final...");
|
192 |
+
|
193 |
+
with open(list_file_path, "w") as f:
|
194 |
+
for p_temp in temp_files_for_concat:
|
195 |
+
f.write(f"file '{p_temp}'\n")
|
196 |
+
|
197 |
+
ffmpeg_command = f"ffmpeg -y -v error -f concat -safe 0 -i \"{list_file_path}\" -c copy \"{final_output_path}\""
|
198 |
+
subprocess.run(ffmpeg_command, shell=True, check=True, text=True)
|
199 |
+
|
200 |
+
progress(1.0, desc="Montagem final concluΓda!");
|
201 |
+
return final_output_path
|
202 |
+
except subprocess.CalledProcessError as e:
|
203 |
+
error_output = e.stderr if e.stderr else "Nenhuma saΓda de erro do FFmpeg."
|
204 |
+
raise gr.Error(f"FFmpeg falhou na concatenaΓ§Γ£o final: {error_output}")
|
205 |
+
except Exception as e:
|
206 |
+
raise gr.Error(f"Um erro ocorreu durante a concatenaΓ§Γ£o final: {e}")
|
207 |
+
|
208 |
+
def concatenate_final_video1(fragment_paths: list, fragment_duration_frames: int, eco_video_frames: int, progress=gr.Progress()):
|
209 |
+
"""
|
210 |
+
Concatena os fragmentos de vΓdeo gerados em uma ΓΊnica "Obra-Prima" final.
|
211 |
+
|
212 |
+
Args:
|
213 |
+
fragment_paths (list): Lista de caminhos para os fragmentos de vΓdeo.
|
214 |
+
fragment_duration_frames (int): A duraΓ§Γ£o de cada clipe na montagem final.
|
215 |
+
eco_video_frames (int): O tamanho da sobreposiΓ§Γ£o que deve ser cortada.
|
216 |
+
progress (gr.Progress): Objeto do Gradio para atualizar a barra de progresso.
|
217 |
+
|
218 |
+
Returns:
|
219 |
+
str: O caminho para o vΓdeo final montado.
|
220 |
+
"""
|
221 |
+
if not fragment_paths: raise gr.Error("Nenhum fragmento de vΓdeo para concatenar.")
|
222 |
+
progress(0.1, desc="Preparando e cortando fragmentos para a montagem final...");
|
223 |
+
try:
|
224 |
+
list_file_path = os.path.abspath(os.path.join(WORKSPACE_DIR, f"concat_list_final_{time.time()}.txt"))
|
225 |
+
final_output_path = os.path.abspath(os.path.join(WORKSPACE_DIR, "masterpiece_final.mp4"))
|
226 |
+
temp_files_for_concat = []
|
227 |
+
final_clip_len = int(fragment_duration_frames - eco_video_frames)
|
228 |
+
for i, p in enumerate(fragment_paths):
|
229 |
+
is_last_fragment = (i == len(fragment_paths) - 1)
|
230 |
+
if is_last_fragment or "_cut.mp4" in os.path.basename(p):
|
231 |
+
temp_files_for_concat.append(os.path.abspath(p))
|
232 |
+
else:
|
233 |
+
temp_path = os.path.join(WORKSPACE_DIR, f"final_temp_concat_{i}.mp4")
|
234 |
+
trim_video_to_frames(p, temp_path, final_clip_len)
|
235 |
+
temp_files_for_concat.append(os.path.abspath(temp_path))
|
236 |
+
progress(0.8, desc="Concatenando clipe final...")
|
237 |
+
with open(list_file_path, "w") as f:
|
238 |
+
for p_temp in temp_files_for_concat:
|
239 |
+
f.write(f"file '{p_temp}'\n")
|
240 |
+
ffmpeg_command = f"ffmpeg -y -v error -f concat -safe 0 -i \"{list_file_path}\" -c copy \"{final_output_path}\""
|
241 |
+
subprocess.run(ffmpeg_command, shell=True, check=True, text=True)
|
242 |
+
progress(1.0, desc="Montagem final concluΓda!")
|
243 |
+
return final_output_path
|
244 |
+
except subprocess.CalledProcessError as e:
|
245 |
+
error_output = e.stderr if e.stderr else "Nenhuma saΓda de erro do FFmpeg."
|
246 |
+
raise gr.Error(f"FFmpeg falhou na concatenaΓ§Γ£o final: {error_output}")
|
247 |
+
|
248 |
def extract_image_exif(image_path: str) -> str:
|
249 |
+
"""
|
250 |
+
Extrai metadados EXIF relevantes de uma imagem.
|
251 |
+
|
252 |
+
Args:
|
253 |
+
image_path (str): O caminho para o arquivo de imagem.
|
254 |
+
|
255 |
+
Returns:
|
256 |
+
str: Uma string formatada contendo os metadados EXIF.
|
257 |
+
"""
|
258 |
try:
|
259 |
img = Image.open(image_path); exif_data = img._getexif()
|
260 |
if not exif_data: return "No EXIF metadata found."
|
|
|
264 |
return metadata_str if metadata_str else "No relevant EXIF metadata found."
|
265 |
except Exception: return "Could not read EXIF data."
|
266 |
|
267 |
+
# ======================================================================================
|
268 |
+
# SEΓΓO 2: ORQUESTRADORES DE IA (As "Etapas" da GeraΓ§Γ£o)
|
269 |
+
# ======================================================================================
|
270 |
+
|
271 |
+
def run_storyboard_generation(num_fragments: int, prompt: str, reference_paths: list):
|
272 |
+
"""
|
273 |
+
Orquestra a Etapa 1: O Roteiro.
|
274 |
+
Chama a IA (Gemini) para atuar como "Roteirista", analisando o prompt do usuΓ‘rio e
|
275 |
+
todas as imagens de referΓͺncia para criar uma narrativa coesa dividida em atos.
|
276 |
+
|
277 |
+
Args:
|
278 |
+
num_fragments (int): O nΓΊmero de keyframes (atos) a serem gerados no roteiro.
|
279 |
+
prompt (str): A ideia geral do usuΓ‘rio.
|
280 |
+
reference_paths (list): Lista de caminhos para todas as imagens de referΓͺncia fornecidas.
|
281 |
+
|
282 |
+
Returns:
|
283 |
+
list: Uma lista de strings, onde cada string Γ© a descriΓ§Γ£o de uma cena.
|
284 |
+
"""
|
285 |
+
if not reference_paths: raise gr.Error("Por favor, forneΓ§a pelo menos uma imagem de referΓͺncia.")
|
286 |
+
if not GEMINI_API_KEY: raise gr.Error("Chave da API Gemini nΓ£o configurada!")
|
287 |
+
main_ref_path = reference_paths[0]
|
288 |
+
exif_metadata = extract_image_exif(main_ref_path)
|
289 |
prompt_file = "prompts/unified_storyboard_prompt.txt"
|
290 |
with open(os.path.join(os.path.dirname(__file__), prompt_file), "r", encoding="utf-8") as f: template = f.read()
|
291 |
director_prompt = template.format(user_prompt=prompt, num_fragments=int(num_fragments), image_metadata=exif_metadata)
|
292 |
genai.configure(api_key=GEMINI_API_KEY)
|
293 |
+
model = genai.GenerativeModel('gemini-2.5-flash')
|
294 |
+
model_contents = [director_prompt]
|
295 |
+
for i, img_path in enumerate(reference_paths):
|
296 |
+
model_contents.append(f"Reference Image {i+1}:")
|
297 |
+
model_contents.append(Image.open(img_path))
|
298 |
+
print(f"Gerando roteiro com {len(reference_paths)} imagens de referΓͺncia...")
|
299 |
+
response = model.generate_content(model_contents)
|
300 |
try:
|
301 |
storyboard_data = robust_json_parser(response.text)
|
302 |
storyboard = storyboard_data.get("scene_storyboard", [])
|
|
|
304 |
return storyboard
|
305 |
except Exception as e: raise gr.Error(f"O Roteirista (Gemini) falhou ao criar o roteiro: {e}. Resposta recebida: {response.text}")
|
306 |
|
307 |
+
def run_keyframe_generation(storyboard, fixed_reference_paths, keyframe_resolution, global_prompt, progress=gr.Progress()):
|
308 |
+
"""
|
309 |
+
Orquestra a Etapa 2: Os Keyframes.
|
310 |
+
A cada iteraΓ§Γ£o, chama a IA (Gemini) para atuar como "Diretor de Cena". A IA analisa
|
311 |
+
o roteiro, as referΓͺncias fixas e as ΓΊltimas 3 imagens geradas para criar um prompt
|
312 |
+
de composiΓ§Γ£o. O prompt usa tags [IMG-X] para referenciar as fontes, que sΓ£o entΓ£o
|
313 |
+
mapeadas para os arquivos reais e enviadas ao `FluxKontext` para a geraΓ§Γ£o da imagem.
|
314 |
+
|
315 |
+
Args:
|
316 |
+
storyboard (list): A lista de atos do roteiro.
|
317 |
+
fixed_reference_paths (list): Lista de caminhos para as imagens de referΓͺncia fixas.
|
318 |
+
keyframe_resolution (int): A resoluΓ§Γ£o para os keyframes a serem gerados.
|
319 |
+
global_prompt (str): A ideia geral do usuΓ‘rio para dar contexto Γ IA.
|
320 |
+
progress (gr.Progress): Objeto do Gradio para a barra de progresso.
|
321 |
+
|
322 |
+
Yields:
|
323 |
+
dict: AtualizaΓ§Γ΅es para os componentes da UI do Gradio durante a geraΓ§Γ£o.
|
324 |
+
"""
|
325 |
if not storyboard: raise gr.Error("Nenhum roteiro para gerar keyframes.")
|
326 |
+
if not fixed_reference_paths: raise gr.Error("A imagem de referΓͺncia inicial Γ© obrigatΓ³ria.")
|
327 |
+
|
328 |
+
initial_ref_image_path = fixed_reference_paths[0]
|
329 |
+
log_history = ""; generated_images_for_gallery = []
|
330 |
+
width, height = keyframe_resolution, keyframe_resolution
|
331 |
+
|
332 |
+
keyframe_paths_for_video = []
|
333 |
+
scene_history = "N/A"
|
334 |
+
|
335 |
+
wrapper_prompt_path = os.path.join(os.path.dirname(__file__), "prompts/flux_composition_wrapper_prompt.txt")
|
336 |
+
with open(wrapper_prompt_path, "r", encoding="utf-8") as f:
|
337 |
+
kontext_template = f.read()
|
338 |
+
|
339 |
+
director_prompt_path = os.path.join(os.path.dirname(__file__), "prompts/director_composition_prompt.txt")
|
340 |
+
with open(director_prompt_path, "r", encoding="utf-8") as f:
|
341 |
+
director_template = f.read()
|
342 |
+
|
343 |
try:
|
344 |
+
genai.configure(api_key=GEMINI_API_KEY)
|
345 |
+
model = genai.GenerativeModel('gemini-2.5-flash')
|
346 |
+
|
347 |
for i, scene_description in enumerate(storyboard):
|
348 |
+
progress(i / len(storyboard), desc=f"Compondo Keyframe {i+1}/{len(storyboard)} ({width}x{height})")
|
349 |
+
log_history += f"\n--- COMPONDO KEYFRAME {i+1}/{len(storyboard)} ---\n"
|
350 |
+
|
351 |
+
last_three_paths = ([initial_ref_image_path] + keyframe_paths_for_video)[-3:]
|
352 |
+
|
353 |
+
log_history += f" - Diretor de Cena estΓ‘ analisando o contexto...\n"
|
354 |
+
yield {keyframe_log_output: gr.update(value=log_history), keyframe_gallery_output: gr.update(value=generated_images_for_gallery), keyframe_images_state: gr.update(value=generated_images_for_gallery)}
|
355 |
+
|
356 |
+
director_prompt = director_template.format(
|
357 |
+
global_prompt=global_prompt,
|
358 |
+
scene_history=scene_history,
|
359 |
+
current_scene_desc=scene_description,
|
360 |
+
)
|
361 |
+
|
362 |
+
model_contents = []
|
363 |
+
image_map = {}
|
364 |
+
current_image_index = 1
|
365 |
+
|
366 |
+
for path in last_three_paths:
|
367 |
+
if path not in image_map.values():
|
368 |
+
image_map[current_image_index] = path
|
369 |
+
model_contents.extend([f"IMG-{current_image_index}:", Image.open(path)])
|
370 |
+
current_image_index += 1
|
371 |
+
|
372 |
+
for path in fixed_reference_paths:
|
373 |
+
if path not in image_map.values():
|
374 |
+
image_map[current_image_index] = path
|
375 |
+
model_contents.extend([f"IMG-{current_image_index}:", Image.open(path)])
|
376 |
+
current_image_index += 1
|
377 |
+
|
378 |
+
model_contents.append(director_prompt)
|
379 |
+
|
380 |
+
response_text = model.generate_content(model_contents).text
|
381 |
+
composition_prompt_with_tags = response_text.strip()
|
382 |
+
|
383 |
+
referenced_indices = [int(idx) for idx in re.findall(r'\[IMG-(\d+)\]', composition_prompt_with_tags)]
|
384 |
+
|
385 |
+
current_reference_paths = [image_map[idx] for idx in sorted(list(set(referenced_indices))) if idx in image_map]
|
386 |
+
if not current_reference_paths:
|
387 |
+
current_reference_paths = [last_three_paths[-1]]
|
388 |
+
|
389 |
+
reference_images_pil = [Image.open(p) for p in current_reference_paths]
|
390 |
+
final_kontext_prompt = re.sub(r'\[IMG-\d+\]', '', composition_prompt_with_tags).strip()
|
391 |
+
|
392 |
+
log_history += f" - Diretor de Cena decidiu usar as imagens: {[os.path.basename(p) for p in current_reference_paths]}\n"
|
393 |
+
log_history += f" - Prompt Final do Diretor: \"{final_kontext_prompt}\"\n"
|
394 |
+
scene_history += f"Scene {i+1}: {final_kontext_prompt}\n"
|
395 |
+
|
396 |
+
yield {keyframe_log_output: gr.update(value=log_history), keyframe_gallery_output: gr.update(value=generated_images_for_gallery), keyframe_images_state: gr.update(value=generated_images_for_gallery)}
|
397 |
+
|
398 |
+
final_kontext_prompt_wrapped = kontext_template.format(target_prompt=final_kontext_prompt)
|
399 |
output_path = os.path.join(WORKSPACE_DIR, f"keyframe_{i+1}.png")
|
400 |
+
|
401 |
+
image = flux_kontext_singleton.generate_image(
|
402 |
+
reference_images=reference_images_pil,
|
403 |
+
prompt=final_kontext_prompt_wrapped,
|
404 |
+
width=width, height=height, seed=int(time.time())
|
405 |
+
)
|
406 |
+
|
407 |
image.save(output_path)
|
408 |
+
keyframe_paths_for_video.append(output_path)
|
409 |
+
generated_images_for_gallery.append(output_path)
|
410 |
+
|
411 |
+
except Exception as e:
|
412 |
+
raise gr.Error(f"O Compositor (FluxKontext) ou o Diretor de Cena (Gemini) falhou: {e}")
|
413 |
+
|
414 |
+
log_history += "\nComposiΓ§Γ£o de todos os keyframes concluΓda.\n"
|
415 |
+
final_keyframes = keyframe_paths_for_video
|
416 |
+
yield {keyframe_log_output: gr.update(value=log_history), keyframe_gallery_output: final_keyframes, keyframe_images_state: final_keyframes}
|
|
|
417 |
|
418 |
def get_initial_motion_prompt(user_prompt: str, start_image_path: str, destination_image_path: str, dest_scene_desc: str):
|
419 |
+
"""
|
420 |
+
Chama a IA (Gemini) para atuar como "Cineasta Inicial".
|
421 |
+
Gera o prompt de movimento para o primeiro fragmento de vΓdeo, que nΓ£o possui um eco anterior.
|
422 |
+
|
423 |
+
Args:
|
424 |
+
user_prompt (str): A ideia geral da histΓ³ria.
|
425 |
+
start_image_path (str): Caminho para o primeiro keyframe.
|
426 |
+
destination_image_path (str): Caminho para o segundo keyframe.
|
427 |
+
dest_scene_desc (str): A descriΓ§Γ£o do roteiro para a cena de destino.
|
428 |
+
|
429 |
+
Returns:
|
430 |
+
str: O prompt de movimento gerado.
|
431 |
+
"""
|
432 |
if not GEMINI_API_KEY: raise gr.Error("Chave da API Gemini nΓ£o configurada!")
|
433 |
try:
|
434 |
+
genai.configure(api_key=GEMINI_API_KEY); model = genai.GenerativeModel('gemini-2.5-flash'); prompt_file = "prompts/initial_motion_prompt.txt"
|
435 |
with open(os.path.join(os.path.dirname(__file__), prompt_file), "r", encoding="utf-8") as f: template = f.read()
|
436 |
cinematographer_prompt = template.format(user_prompt=user_prompt, destination_scene_description=dest_scene_desc)
|
437 |
start_img, dest_img = Image.open(start_image_path), Image.open(destination_image_path)
|
|
|
440 |
return response.text.strip()
|
441 |
except Exception as e: raise gr.Error(f"O Cineasta de IA (Inicial) falhou: {e}. Resposta: {getattr(e, 'text', 'No text available.')}")
|
442 |
|
443 |
+
def get_transition_decision(user_prompt, story_history, memory_media_path, path_image_path, destination_image_path, midpoint_scene_description, dest_scene_desc):
|
444 |
+
"""
|
445 |
+
Chama a IA (Gemini) para atuar como "Diretor de Continuidade".
|
446 |
+
Analisa o eco, o keyframe atual e o prΓ³ximo para decidir entre uma transiΓ§Γ£o contΓnua
|
447 |
+
ou um corte de cena, e gera o prompt de movimento apropriado.
|
448 |
+
|
449 |
+
Args:
|
450 |
+
(VΓ‘rios argumentos de contexto sobre a histΓ³ria e as imagens)
|
451 |
+
|
452 |
+
Returns:
|
453 |
+
dict: Um dicionΓ‘rio contendo 'transition_type' e 'motion_prompt'.
|
454 |
+
"""
|
455 |
if not GEMINI_API_KEY: raise gr.Error("Chave da API Gemini nΓ£o configurada!")
|
456 |
try:
|
457 |
+
genai.configure(api_key=GEMINI_API_KEY); model = genai.GenerativeModel('gemini-2.5-flash'); prompt_file = "prompts/transition_decision_prompt.txt"
|
458 |
with open(os.path.join(os.path.dirname(__file__), prompt_file), "r", encoding="utf-8") as f: template = f.read()
|
459 |
+
continuity_prompt = template.format(user_prompt=user_prompt, story_history=story_history, midpoint_scene_description=midpoint_scene_description, destination_scene_description=dest_scene_desc)
|
460 |
+
with imageio.get_reader(memory_media_path) as reader: mem_img = Image.fromarray(reader.get_data(0))
|
|
|
461 |
path_img, dest_img = Image.open(path_image_path), Image.open(destination_image_path)
|
462 |
+
model_contents = ["START Image (from Kinetic Echo):", mem_img, "MIDPOINT Image (Path):", path_img, "DESTINATION Image (Destination):", dest_img, continuity_prompt]
|
463 |
response = model.generate_content(model_contents)
|
464 |
+
decision_data = robust_json_parser(response.text)
|
465 |
+
if "transition_type" not in decision_data or "motion_prompt" not in decision_data: raise ValueError("A resposta da IA nΓ£o contΓ©m as chaves 'transition_type' ou 'motion_prompt'.")
|
466 |
+
return decision_data
|
467 |
+
except Exception as e: raise gr.Error(f"O Diretor de Continuidade (IA) falhou: {e}. Resposta: {getattr(e, 'text', str(e))}")
|
468 |
|
|
|
469 |
def run_video_production(
|
470 |
+
video_resolution,
|
471 |
video_duration_seconds, video_fps, eco_video_frames, use_attention_slicing,
|
472 |
+
fragment_duration_frames, mid_cond_strength, dest_cond_strength, num_inference_steps,
|
473 |
+
decode_timestep, image_cond_noise_scale,
|
474 |
+
prompt_geral, keyframe_images_state, scene_storyboard, cfg,
|
475 |
progress=gr.Progress()
|
476 |
):
|
477 |
+
"""
|
478 |
+
Orquestra a Etapa 3: A ProduΓ§Γ£o.
|
479 |
+
Itera sobre os keyframes e chama os cineastas de IA para gerar os fragmentos de vΓdeo.
|
480 |
+
|
481 |
+
Args:
|
482 |
+
(VΓ‘rios parΓ’metros da UI para controlar a geraΓ§Γ£o de vΓdeo)
|
483 |
+
|
484 |
+
Yields:
|
485 |
+
dict: AtualizaΓ§Γ΅es para os componentes da UI do Gradio.
|
486 |
+
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
487 |
try:
|
488 |
+
valid_keyframes = [p for p in keyframe_images_state if p is not None and os.path.exists(p)]
|
489 |
+
width, height = video_resolution, video_resolution
|
490 |
+
video_total_frames_user = int(video_duration_seconds * video_fps)
|
491 |
+
video_total_frames_ltx = int(round((float(video_total_frames_user) - 1.0) / 8.0) * 8 + 1)
|
492 |
+
if not valid_keyframes or len(valid_keyframes) < 2: raise gr.Error("SΓ£o necessΓ‘rios pelo menos 2 keyframes vΓ‘lidos para produzir uma transiΓ§Γ£o.")
|
493 |
+
if int(fragment_duration_frames) > video_total_frames_user: raise gr.Error(f"DuraΓ§Γ£o do fragmento ({fragment_duration_frames}) nΓ£o pode ser maior que a DuraΓ§Γ£o Bruta ({video_total_frames_user}).")
|
494 |
+
log_history = f"\n--- FASE 3/4: Iniciando ProduΓ§Γ£o ({width}x{height})...\n"
|
495 |
+
yield {
|
496 |
+
production_log_output: log_history, video_gallery_output: [],
|
497 |
+
prod_media_start_output: None, prod_media_mid_output: gr.update(visible=False), prod_media_end_output: None
|
498 |
+
}
|
499 |
+
seed = int(time.time()); video_fragments, story_history = [], ""; kinetic_memory_path = None
|
500 |
+
num_transitions = len(valid_keyframes) - 1
|
501 |
|
|
|
502 |
for i in range(num_transitions):
|
503 |
fragment_num = i + 1
|
504 |
+
progress(i / num_transitions, desc=f"Gerando Fragmento {fragment_num}...")
|
505 |
log_history += f"\n--- FRAGMENTO {fragment_num}/{num_transitions} ---\n"
|
506 |
+
destination_frame = int(video_total_frames_ltx - 1)
|
507 |
|
508 |
+
if i == 0 or kinetic_memory_path is None:
|
509 |
+
start_path, destination_path = valid_keyframes[i], valid_keyframes[i+1]
|
510 |
+
dest_scene_desc = scene_storyboard[i]
|
511 |
+
log_history += f" - InΓcio (Cena Nova): {os.path.basename(start_path)}\n - Destino: {os.path.basename(destination_path)}\n"
|
512 |
current_motion_prompt = get_initial_motion_prompt(prompt_geral, start_path, destination_path, dest_scene_desc)
|
513 |
+
conditioning_items_data = [(start_path, 0, 1.0), (destination_path, destination_frame, dest_cond_strength)]
|
514 |
+
transition_type = "continuous"
|
515 |
+
yield { production_log_output: log_history, prod_media_start_output: start_path, prod_media_mid_output: gr.update(visible=False), prod_media_end_output: destination_path }
|
|
|
|
|
|
|
|
|
516 |
else:
|
517 |
+
memory_path, path_path, destination_path = kinetic_memory_path, valid_keyframes[i], valid_keyframes[i+1]
|
518 |
+
path_scene_desc, dest_scene_desc = scene_storyboard[i-1], scene_storyboard[i]
|
519 |
+
log_history += f" - Diretor de Continuidade analisando...\n - MemΓ³ria: {os.path.basename(memory_path)}\n - Caminho: {os.path.basename(path_path)}\n - Destino: {os.path.basename(destination_path)}\n"
|
520 |
+
yield { production_log_output: log_history, prod_media_start_output: gr.update(value=memory_path, visible=True), prod_media_mid_output: gr.update(value=path_path, visible=True), prod_media_end_output: destination_path }
|
521 |
+
decision_data = get_transition_decision(prompt_geral, story_history, memory_path, path_path, destination_path, midpoint_scene_description=path_scene_desc, dest_scene_desc=dest_scene_desc)
|
522 |
+
transition_type = decision_data["transition_type"]
|
523 |
+
current_motion_prompt = decision_data["motion_prompt"]
|
524 |
+
log_history += f" - DecisΓ£o: {transition_type.upper()}\n"
|
525 |
+
mid_cond_frame_calculated = int(video_total_frames_ltx - fragment_duration_frames + eco_video_frames)
|
526 |
+
conditioning_items_data = [(memory_path, 0, 1.0), (path_path, mid_cond_frame_calculated, mid_cond_strength), (destination_path, destination_frame, dest_cond_strength)]
|
|
|
|
|
|
|
527 |
|
528 |
story_history += f"\n- Ato {fragment_num + 1}: {current_motion_prompt}"
|
529 |
log_history += f" - InstruΓ§Γ£o do Cineasta: '{current_motion_prompt}'\n"; yield {production_log_output: log_history}
|
530 |
|
531 |
+
output_filename = f"fragment_{fragment_num}_{transition_type}.mp4"
|
532 |
+
full_fragment_path, _ = ltx_manager_singleton.generate_video_fragment(
|
533 |
+
motion_prompt=current_motion_prompt, conditioning_items_data=conditioning_items_data,
|
534 |
+
width=width, height=height, seed=seed, cfg=cfg, progress=progress,
|
535 |
+
video_total_frames=video_total_frames_ltx, video_fps=video_fps,
|
536 |
+
use_attention_slicing=use_attention_slicing, num_inference_steps=num_inference_steps,
|
537 |
+
decode_timestep=decode_timestep, image_cond_noise_scale=image_cond_noise_scale,
|
538 |
+
current_fragment_index=fragment_num, output_path=os.path.join(WORKSPACE_DIR, output_filename)
|
539 |
)
|
540 |
+
log_history += f" - LOG: Gerei {output_filename}.\n"
|
541 |
+
|
|
|
|
|
|
|
|
|
542 |
is_last_fragment = (i == num_transitions - 1)
|
543 |
+
|
544 |
+
if is_last_fragment:
|
545 |
+
log_history += " - Γltimo fragmento. Mantendo duraΓ§Γ£o total.\n"
|
546 |
+
video_fragments.append(full_fragment_path)
|
547 |
+
kinetic_memory_path = None
|
548 |
+
elif transition_type == "cut":
|
549 |
+
log_history += " - CORTE DE CENA: Fragmento mantido, memΓ³ria reiniciada.\n"
|
550 |
+
video_fragments.append(full_fragment_path)
|
551 |
+
kinetic_memory_path = None
|
552 |
+
else:
|
553 |
+
trimmed_fragment_path = os.path.join(WORKSPACE_DIR, f"fragment_{fragment_num}_trimmed.mp4")
|
554 |
+
trim_video_to_frames(full_fragment_path, trimmed_fragment_path, int(fragment_duration_frames))
|
555 |
eco_output_path = os.path.join(WORKSPACE_DIR, f"eco_from_frag_{fragment_num}.mp4")
|
556 |
kinetic_memory_path = extract_last_n_frames_as_video(trimmed_fragment_path, eco_output_path, int(eco_video_frames))
|
557 |
+
video_fragments.append(full_fragment_path)
|
558 |
+
log_history += f" - CONTINUIDADE: Eco criado: {os.path.basename(kinetic_memory_path)}\n"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
559 |
|
560 |
+
yield {production_log_output: log_history, video_gallery_output: video_fragments}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
561 |
|
562 |
+
progress(1.0, desc="ProduΓ§Γ£o dos fragmentos concluΓda.")
|
563 |
+
log_history += "\nProduΓ§Γ£o de todos os fragmentos concluΓda. Pronto para montar o vΓdeo final.\n"
|
564 |
+
yield {
|
565 |
+
production_log_output: log_history,
|
566 |
+
video_gallery_output: video_fragments,
|
567 |
+
fragment_list_state: video_fragments
|
568 |
+
}
|
569 |
+
except Exception as e: raise gr.Error(f"A ProduΓ§Γ£o de VΓdeo (LTX) falhou: {e}")
|
570 |
|
571 |
+
# ======================================================================================
|
572 |
+
# SEΓΓO 3: DEFINIΓΓO DA INTERFACE GRΓFICA (UI com Gradio)
|
573 |
+
# ======================================================================================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
574 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
575 |
with gr.Blocks(theme=gr.themes.Soft()) as demo:
|
576 |
+
gr.Markdown(f"# NOVIM-13.1 (Painel de Controle do Diretor)\n*Arquitetura ADUC-SDR com DocumentaΓ§Γ£o Completa*")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
577 |
|
578 |
if os.path.exists(WORKSPACE_DIR): shutil.rmtree(WORKSPACE_DIR)
|
579 |
os.makedirs(WORKSPACE_DIR); Path("prompts").mkdir(exist_ok=True)
|
580 |
+
|
581 |
+
# --- DefiniΓ§Γ£o dos Estados da UI ---
|
582 |
+
scene_storyboard_state = gr.State([])
|
583 |
+
keyframe_images_state = gr.State([])
|
584 |
+
fragment_list_state = gr.State([])
|
585 |
+
prompt_geral_state = gr.State("")
|
586 |
+
processed_ref_paths_state = gr.State([])
|
587 |
+
fragment_duration_state = gr.State()
|
588 |
+
eco_frames_state = gr.State()
|
589 |
+
|
590 |
+
# --- Layout da UI ---
|
591 |
+
gr.Markdown("## CONFIGURAΓΓES GLOBAIS DE RESOLUΓΓO")
|
592 |
+
with gr.Row():
|
593 |
+
video_resolution_selector = gr.Radio([512, 720, 1024], value=512, label="ResoluΓ§Γ£o de GeraΓ§Γ£o do VΓdeo (px)")
|
594 |
+
keyframe_resolution_selector = gr.Radio([512, 720, 1024], value=512, label="ResoluΓ§Γ£o dos Keyframes (px)")
|
595 |
|
596 |
gr.Markdown("--- \n ## ETAPA 1: O ROTEIRO (IA Roteirista)")
|
597 |
with gr.Row():
|
598 |
with gr.Column(scale=1):
|
599 |
prompt_input = gr.Textbox(label="Ideia Geral (Prompt)")
|
600 |
+
num_fragments_input = gr.Slider(2, 50, 4, step=1, label="NΒΊ de Keyframes a Gerar")
|
601 |
+
reference_gallery = gr.Gallery(
|
602 |
+
label="Imagens de ReferΓͺncia (A primeira Γ© a principal)",
|
603 |
+
type="filepath",
|
604 |
+
columns=4, rows=1, object_fit="contain", height="auto"
|
605 |
+
)
|
606 |
director_button = gr.Button("βΆοΈ 1. Gerar Roteiro", variant="primary")
|
607 |
with gr.Column(scale=2): storyboard_to_show = gr.JSON(label="Roteiro de Cenas Gerado (em InglΓͺs)")
|
608 |
|
609 |
+
gr.Markdown("--- \n ## ETAPA 2: OS KEYFRAMES (IA Compositor & Diretor de Cena)")
|
610 |
with gr.Row():
|
611 |
with gr.Column(scale=2):
|
612 |
+
gr.Markdown("O Diretor de Cena IA irΓ‘ analisar as referΓͺncias e o roteiro para compor cada keyframe de forma autΓ΄noma.")
|
613 |
+
photographer_button = gr.Button("βΆοΈ 2. Compor Imagens-Chave em Cadeia", variant="primary")
|
614 |
+
keyframe_gallery_output = gr.Gallery(label="Galeria de Keyframes Gerados", object_fit="contain", height="auto", type="filepath", interactive=False)
|
615 |
+
with gr.Column(scale=1):
|
616 |
+
keyframe_log_output = gr.Textbox(label="DiΓ‘rio de Bordo do Compositor", lines=25, interactive=False)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
617 |
|
618 |
gr.Markdown("--- \n ## ETAPA 3: A PRODUΓΓO (IA Cineasta & CΓ’mera)")
|
619 |
with gr.Row():
|
620 |
with gr.Column(scale=1):
|
621 |
+
cfg_slider = gr.Slider(0.5, 10.0, 1.0, step=0.1, label="CFG (Guidance Scale)")
|
622 |
with gr.Accordion("Controles AvanΓ§ados de Timing e Performance", open=False):
|
623 |
+
video_duration_slider = gr.Slider(label="DuraΓ§Γ£o da GeraΓ§Γ£o Bruta (s)", minimum=2.0, maximum=10.0, value=6.0, step=0.5)
|
624 |
+
video_fps_radio = gr.Radio(choices=[8, 16, 24, 32], value=24, label="FPS do VΓdeo")
|
625 |
+
num_inference_steps_slider = gr.Slider(label="Etapas de InferΓͺncia", minimum=4, maximum=20, value=10, step=1)
|
626 |
slicing_checkbox = gr.Checkbox(label="Usar Attention Slicing (Economiza VRAM)", value=True)
|
627 |
gr.Markdown("---"); gr.Markdown("#### Controles de DuraΓ§Γ£o (Arquitetura Eco + DΓ©jΓ Vu)")
|
628 |
+
fragment_duration_slider = gr.Slider(label="DuraΓ§Γ£o de Cada Fragmento (% da GeraΓ§Γ£o Bruta)", minimum=1, maximum=100, value=75, step=1)
|
629 |
eco_frames_slider = gr.Slider(label="Tamanho do Eco CinΓ©tico (Frames)", minimum=4, maximum=48, value=8, step=1)
|
630 |
mid_cond_strength_slider = gr.Slider(label="ForΓ§a do 'Caminho'", minimum=0.1, maximum=1.0, value=0.5, step=0.05)
|
631 |
+
dest_cond_strength_slider = gr.Slider(label="ForΓ§a do 'Destino'", minimum=0.1, maximum=1.0, value=1.0, step=0.05)
|
632 |
+
gr.Markdown("---"); gr.Markdown("#### Controles do VAE (AvanΓ§ado)")
|
633 |
+
decode_timestep_slider = gr.Slider(label="VAE Decode Timestep", minimum=0.0, maximum=0.2, value=0.05, step=0.005)
|
634 |
+
image_cond_noise_scale_slider = gr.Slider(label="VAE Image Cond Noise Scale", minimum=0.0, maximum=0.1, value=0.025, step=0.005)
|
635 |
+
|
636 |
+
animator_button = gr.Button("βΆοΈ 3. Produzir Cenas", variant="primary")
|
|
|
|
|
|
|
|
|
637 |
with gr.Accordion("VisualizaΓ§Γ£o das MΓdias de Condicionamento (Ao Vivo)", open=True):
|
638 |
with gr.Row():
|
639 |
prod_media_start_output = gr.Video(label="MΓdia Inicial (Eco/K1)", interactive=False)
|
640 |
prod_media_mid_output = gr.Image(label="MΓdia do Caminho (K_i-1)", interactive=False, visible=False)
|
641 |
prod_media_end_output = gr.Image(label="MΓdia de Destino (K_i)", interactive=False)
|
642 |
production_log_output = gr.Textbox(label="DiΓ‘rio de Bordo da ProduΓ§Γ£o", lines=10, interactive=False)
|
643 |
+
with gr.Column(scale=1): video_gallery_output = gr.Gallery(label="Fragmentos Gerados", object_fit="contain", height="auto", type="video")
|
644 |
+
|
645 |
+
gr.Markdown(f"--- \n ## ETAPA 4: PΓS-PRODUΓΓO (Montagem Final)")
|
646 |
+
with gr.Row():
|
647 |
+
with gr.Column():
|
648 |
+
editor_button = gr.Button("βΆοΈ 4. Montar VΓdeo Final", variant="primary")
|
649 |
+
final_video_output = gr.Video(label="A Obra-Prima Final")
|
|
|
650 |
|
651 |
gr.Markdown(
|
652 |
"""
|
653 |
---
|
654 |
+
### A Arquitetura: ADUC-SDR
|
655 |
+
**ADUC (Arquitetura de UnificaΓ§Γ£o Compositiva):** O sistema nΓ£o usa um ΓΊnico modelo, mas uma equipe de IAs especializadas. Um **Roteirista** cria a histΓ³ria. Um **Diretor de Cena** decide a composiΓ§Γ£o de cada keyframe, selecionando elementos de um "Γ‘lbum" de referΓͺncias visuais. Um **Compositor** (`FluxKontext`) cria as imagens.
|
656 |
+
|
657 |
+
**SDR (Escala DinΓ’mica e Resiliente):** A geraΓ§Γ£o de vΓdeo Γ© dividida em fragmentos, permitindo criar vΓdeos de longa duraΓ§Γ£o. A continuidade Γ© garantida pela arquitetura **Eco + DΓ©jΓ Vu**:
|
658 |
+
- **O Eco:** Os ΓΊltimos frames de um clipe sΓ£o passados para o prΓ³ximo, transferindo o *momentum* fΓsico e a iluminaΓ§Γ£o.
|
659 |
+
- **O DΓ©jΓ Vu:** Uma IA **Cineasta** analisa o Eco e os keyframes futuros para criar uma instruΓ§Γ£o de movimento que seja ao mesmo tempo contΓnua e narrativamente coerente, sabendo atΓ© quando realizar um corte de cena.
|
660 |
"""
|
661 |
)
|
662 |
+
# --- LΓ³gica de ConexΓ£o dos Componentes ---
|
663 |
+
def process_and_run_storyboard(num_fragments, prompt, gallery_files, keyframe_resolution):
|
664 |
+
if not gallery_files:
|
665 |
+
raise gr.Error("Por favor, suba pelo menos uma imagem de referΓͺncia na galeria.")
|
666 |
+
|
667 |
+
raw_paths = [item[0] for item in gallery_files]
|
668 |
+
processed_paths = []
|
669 |
+
for i, path in enumerate(raw_paths):
|
670 |
+
filename = f"processed_ref_{i}_{keyframe_resolution}x{keyframe_resolution}.png"
|
671 |
+
processed_path = process_image_to_square(path, keyframe_resolution, filename)
|
672 |
+
processed_paths.append(processed_path)
|
673 |
+
|
674 |
+
storyboard = run_storyboard_generation(num_fragments, prompt, processed_paths)
|
675 |
+
return storyboard, prompt, processed_paths
|
676 |
|
677 |
director_button.click(
|
678 |
+
fn=process_and_run_storyboard,
|
679 |
+
inputs=[num_fragments_input, prompt_input, reference_gallery, keyframe_resolution_selector],
|
680 |
+
outputs=[scene_storyboard_state, prompt_geral_state, processed_ref_paths_state]
|
681 |
+
).success(fn=lambda s: s, inputs=[scene_storyboard_state], outputs=[storyboard_to_show])
|
682 |
+
|
683 |
+
photographer_button.click(
|
684 |
+
fn=run_keyframe_generation,
|
685 |
+
inputs=[scene_storyboard_state, processed_ref_paths_state, keyframe_resolution_selector, prompt_geral_state],
|
|
|
|
|
|
|
686 |
outputs=[keyframe_log_output, keyframe_gallery_output, keyframe_images_state]
|
687 |
)
|
688 |
+
|
689 |
+
def updated_animator_click(
|
690 |
+
video_resolution,
|
691 |
+
video_duration_seconds, video_fps, eco_video_frames, use_attention_slicing,
|
692 |
+
fragment_duration_percentage, mid_cond_strength, dest_cond_strength, num_inference_steps,
|
693 |
+
decode_timestep, image_cond_noise_scale,
|
694 |
+
prompt_geral, keyframe_images_state, scene_storyboard, cfg, progress=gr.Progress()):
|
695 |
+
|
696 |
+
total_frames = video_duration_seconds * video_fps
|
697 |
+
fragment_duration_in_frames = int(math.floor((fragment_duration_percentage / 100.0) * total_frames))
|
698 |
+
fragment_duration_in_frames = max(1, fragment_duration_in_frames)
|
699 |
+
|
700 |
+
for update in run_video_production(
|
701 |
+
video_resolution,
|
702 |
+
video_duration_seconds, video_fps, eco_video_frames, use_attention_slicing,
|
703 |
+
fragment_duration_in_frames, mid_cond_strength, dest_cond_strength, num_inference_steps,
|
704 |
+
decode_timestep, image_cond_noise_scale,
|
705 |
+
prompt_geral, keyframe_images_state, scene_storyboard, cfg, progress):
|
706 |
+
yield update
|
707 |
+
|
708 |
+
yield {
|
709 |
+
fragment_duration_state: fragment_duration_in_frames,
|
710 |
+
eco_frames_state: eco_video_frames
|
711 |
+
}
|
712 |
|
713 |
animator_button.click(
|
714 |
+
fn=updated_animator_click,
|
|
|
|
|
|
|
|
|
715 |
inputs=[
|
716 |
+
video_resolution_selector,
|
717 |
+
video_duration_slider, video_fps_radio, eco_frames_slider, slicing_checkbox,
|
718 |
+
fragment_duration_slider, mid_cond_strength_slider, dest_cond_strength_slider, num_inference_steps_slider,
|
719 |
+
decode_timestep_slider, image_cond_noise_scale_slider,
|
720 |
prompt_geral_state, keyframe_images_state, scene_storyboard_state, cfg_slider
|
721 |
],
|
722 |
outputs=[
|
723 |
+
production_log_output, video_gallery_output, fragment_list_state,
|
724 |
+
prod_media_start_output, prod_media_mid_output, prod_media_end_output,
|
725 |
+
fragment_duration_state, eco_frames_state
|
726 |
]
|
727 |
)
|
728 |
|
729 |
editor_button.click(
|
730 |
+
fn=concatenate_final_video,
|
731 |
+
inputs=[fragment_list_state, fragment_duration_state, eco_frames_state],
|
732 |
outputs=[final_video_output]
|
733 |
)
|
734 |
|
735 |
if __name__ == "__main__":
|
736 |
+
if os.path.exists(WORKSPACE_DIR): shutil.rmtree(WORKSPACE_DIR)
|
737 |
+
os.makedirs(WORKSPACE_DIR); Path("prompts").mkdir(exist_ok=True)
|
738 |
+
|
739 |
demo.queue().launch(server_name="0.0.0.0", share=True)
|
flux_kontext_helpers.py
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# flux_kontext_helpers.py
|
2 |
+
# MΓ³dulo de serviΓ§o para o FluxKontext, com gestΓ£o de memΓ³ria atΓ΄mica.
|
3 |
+
# Este arquivo Γ© parte do projeto Euia-AducSdr e estΓ‘ sob a licenΓ§a AGPL v3.
|
4 |
+
# Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
|
5 |
+
|
6 |
+
import torch
|
7 |
+
from PIL import Image
|
8 |
+
import gc
|
9 |
+
from diffusers import FluxKontextPipeline
|
10 |
+
import huggingface_hub
|
11 |
+
import os
|
12 |
+
|
13 |
+
class Generator:
|
14 |
+
def __init__(self, device_id='cuda:0'):
|
15 |
+
self.cpu_device = torch.device('cpu')
|
16 |
+
self.gpu_device = torch.device(device_id if torch.cuda.is_available() else 'cpu')
|
17 |
+
print(f"WORKER COMPOSITOR: Usando dispositivo: {self.gpu_device}")
|
18 |
+
self.pipe = None
|
19 |
+
self._load_pipe_to_cpu()
|
20 |
+
|
21 |
+
def _load_pipe_to_cpu(self):
|
22 |
+
if self.pipe is None:
|
23 |
+
print("WORKER COMPOSITOR: Carregando modelo FluxKontext para a CPU...")
|
24 |
+
self.pipe = FluxKontextPipeline.from_pretrained(
|
25 |
+
"black-forest-labs/FLUX.1-Kontext-dev", torch_dtype=torch.bfloat16
|
26 |
+
).to(self.cpu_device)
|
27 |
+
print("WORKER COMPOSITOR: Modelo FluxKontext pronto (na CPU).")
|
28 |
+
|
29 |
+
def to_gpu(self):
|
30 |
+
if self.gpu_device.type == 'cpu': return
|
31 |
+
print(f"WORKER COMPOSITOR: Movendo modelo para {self.gpu_device}...")
|
32 |
+
self.pipe.to(self.gpu_device)
|
33 |
+
print(f"WORKER COMPOSITOR: Modelo na GPU {self.gpu_device}.")
|
34 |
+
|
35 |
+
def to_cpu(self):
|
36 |
+
if self.gpu_device.type == 'cpu': return
|
37 |
+
print(f"WORKER COMPOSITOR: Descarregando modelo da GPU {self.gpu_device}...")
|
38 |
+
self.pipe.to(self.cpu_device)
|
39 |
+
gc.collect()
|
40 |
+
if torch.cuda.is_available():
|
41 |
+
torch.cuda.empty_cache()
|
42 |
+
|
43 |
+
def _concatenate_images(self, images, direction="horizontal"):
|
44 |
+
if not images: return None
|
45 |
+
valid_images = [img.convert("RGB") for img in images if img is not None]
|
46 |
+
if not valid_images: return None
|
47 |
+
if len(valid_images) == 1: return valid_images[0]
|
48 |
+
|
49 |
+
if direction == "horizontal":
|
50 |
+
total_width = sum(img.width for img in valid_images)
|
51 |
+
max_height = max(img.height for img in valid_images)
|
52 |
+
concatenated = Image.new('RGB', (total_width, max_height))
|
53 |
+
x_offset = 0
|
54 |
+
for img in valid_images:
|
55 |
+
y_offset = (max_height - img.height) // 2
|
56 |
+
concatenated.paste(img, (x_offset, y_offset))
|
57 |
+
x_offset += img.width
|
58 |
+
else:
|
59 |
+
max_width = max(img.width for img in valid_images)
|
60 |
+
total_height = sum(img.height for img in valid_images)
|
61 |
+
concatenated = Image.new('RGB', (max_width, total_height))
|
62 |
+
y_offset = 0
|
63 |
+
for img in valid_images:
|
64 |
+
x_offset = (max_width - img.width) // 2
|
65 |
+
concatenated.paste(img, (x_offset, y_offset))
|
66 |
+
y_offset += img.height
|
67 |
+
return concatenated
|
68 |
+
|
69 |
+
@torch.inference_mode()
|
70 |
+
def generate_image(self, reference_images, prompt, width, height, seed=42):
|
71 |
+
try:
|
72 |
+
self.to_gpu()
|
73 |
+
|
74 |
+
concatenated_image = self._concatenate_images(reference_images, "horizontal")
|
75 |
+
if concatenated_image is None:
|
76 |
+
raise ValueError("Nenhuma imagem de referΓͺncia vΓ‘lida foi fornecida.")
|
77 |
+
|
78 |
+
# ### CORREΓΓO ###
|
79 |
+
# Ignora o tamanho da imagem concatenada e usa os parΓ’metros `width` e `height` fornecidos.
|
80 |
+
image = self.pipe(
|
81 |
+
image=concatenated_image,
|
82 |
+
prompt=prompt,
|
83 |
+
guidance_scale=2.5,
|
84 |
+
width=width,
|
85 |
+
height=height,
|
86 |
+
generator=torch.Generator(device="cpu").manual_seed(seed)
|
87 |
+
).images[0]
|
88 |
+
|
89 |
+
return image
|
90 |
+
finally:
|
91 |
+
self.to_cpu()
|
92 |
+
|
93 |
+
# --- InstΓ’ncia Singleton ---
|
94 |
+
print("Inicializando o Compositor de Cenas (FluxKontext)...")
|
95 |
+
hf_token = os.getenv('HF_TOKEN')
|
96 |
+
if hf_token: huggingface_hub.login(token=hf_token)
|
97 |
+
flux_kontext_singleton = Generator(device_id='cuda:0')
|
98 |
+
print("Compositor de Cenas pronto.")
|
ltx_helpers.py
ADDED
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ltx_manager_helpers.py
|
2 |
+
# Gerente de Pool de Workers LTX para revezamento assΓncrono em mΓΊltiplas GPUs.
|
3 |
+
# Este arquivo Γ© parte do projeto Euia-AducSdr e estΓ‘ sob a licenΓ§a AGPL v3.
|
4 |
+
# Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
|
5 |
+
|
6 |
+
import torch
|
7 |
+
import gc
|
8 |
+
import os
|
9 |
+
import yaml
|
10 |
+
import numpy as np
|
11 |
+
import imageio
|
12 |
+
from pathlib import Path
|
13 |
+
import huggingface_hub
|
14 |
+
import threading
|
15 |
+
from PIL import Image
|
16 |
+
|
17 |
+
# Importa as funΓ§Γ΅es e classes necessΓ‘rias do inference.py
|
18 |
+
from inference import (
|
19 |
+
create_ltx_video_pipeline,
|
20 |
+
ConditioningItem,
|
21 |
+
calculate_padding,
|
22 |
+
prepare_conditioning
|
23 |
+
)
|
24 |
+
|
25 |
+
class LtxWorker:
|
26 |
+
"""
|
27 |
+
Representa uma ΓΊnica instΓ’ncia do pipeline LTX, associada a uma GPU especΓfica.
|
28 |
+
O pipeline Γ© carregado na CPU por padrΓ£o e movido para a GPU sob demanda.
|
29 |
+
"""
|
30 |
+
def __init__(self, device_id='cuda:0'):
|
31 |
+
self.device = torch.device(device_id if torch.cuda.is_available() else 'cpu')
|
32 |
+
print(f"LTX Worker: Inicializando para o dispositivo {self.device} (carregando na CPU)...")
|
33 |
+
|
34 |
+
config_file_path = "configs/ltxv-13b-0.9.8-distilled.yaml"
|
35 |
+
with open(config_file_path, "r") as file:
|
36 |
+
self.config = yaml.safe_load(file)
|
37 |
+
|
38 |
+
LTX_REPO = "Lightricks/LTX-Video"
|
39 |
+
models_dir = "downloaded_models_gradio"
|
40 |
+
|
41 |
+
distilled_model_actual_path = huggingface_hub.hf_hub_download(
|
42 |
+
repo_id=LTX_REPO,
|
43 |
+
filename=self.config["checkpoint_path"],
|
44 |
+
local_dir=models_dir,
|
45 |
+
local_dir_use_symlinks=False
|
46 |
+
)
|
47 |
+
|
48 |
+
self.pipeline = create_ltx_video_pipeline(
|
49 |
+
ckpt_path=distilled_model_actual_path,
|
50 |
+
precision=self.config["precision"],
|
51 |
+
text_encoder_model_name_or_path=self.config["text_encoder_model_name_or_path"],
|
52 |
+
sampler=self.config["sampler"],
|
53 |
+
device='cpu'
|
54 |
+
)
|
55 |
+
print(f"LTX Worker para {self.device} pronto na CPU.")
|
56 |
+
|
57 |
+
def to_gpu(self):
|
58 |
+
"""Move o pipeline para a GPU designada."""
|
59 |
+
if self.device.type == 'cpu': return
|
60 |
+
print(f"LTX Worker: Movendo pipeline para {self.device}...")
|
61 |
+
self.pipeline.to(self.device)
|
62 |
+
print(f"LTX Worker: Pipeline na GPU {self.device}.")
|
63 |
+
|
64 |
+
def to_cpu(self):
|
65 |
+
"""Move o pipeline de volta para a CPU e limpa a memΓ³ria da GPU."""
|
66 |
+
if self.device.type == 'cpu': return
|
67 |
+
print(f"LTX Worker: Descarregando pipeline da GPU {self.device}...")
|
68 |
+
self.pipeline.to('cpu')
|
69 |
+
gc.collect()
|
70 |
+
if torch.cuda.is_available():
|
71 |
+
torch.cuda.empty_cache()
|
72 |
+
print(f"LTX Worker: GPU {self.device} limpa.")
|
73 |
+
|
74 |
+
def generate_video_fragment_internal(self, **kwargs):
|
75 |
+
"""A lΓ³gica real da geraΓ§Γ£o de vΓdeo, que espera estar na GPU."""
|
76 |
+
return self.pipeline(**kwargs)
|
77 |
+
|
78 |
+
class LtxPoolManager:
|
79 |
+
"""
|
80 |
+
Gerencia um pool de LtxWorkers, orquestrando um revezamento entre GPUs
|
81 |
+
para permitir que a limpeza de uma GPU ocorra em paralelo com a computaΓ§Γ£o em outra.
|
82 |
+
"""
|
83 |
+
def __init__(self, device_ids=['cuda:2', 'cuda:3']):
|
84 |
+
print(f"LTX POOL MANAGER: Criando workers para os dispositivos: {device_ids}")
|
85 |
+
self.workers = [LtxWorker(device_id) for device_id in device_ids]
|
86 |
+
self.current_worker_index = 0
|
87 |
+
self.lock = threading.Lock()
|
88 |
+
self.last_cleanup_thread = None
|
89 |
+
|
90 |
+
def _cleanup_worker(self, worker):
|
91 |
+
"""FunΓ§Γ£o alvo para a thread de limpeza."""
|
92 |
+
print(f"CLEANUP THREAD: Iniciando limpeza da GPU {worker.device} em background...")
|
93 |
+
worker.to_cpu()
|
94 |
+
print(f"CLEANUP THREAD: Limpeza da GPU {worker.device} concluΓda.")
|
95 |
+
|
96 |
+
def generate_video_fragment(
|
97 |
+
self,
|
98 |
+
motion_prompt: str, conditioning_items_data: list,
|
99 |
+
width: int, height: int, seed: int, cfg: float, video_total_frames: int,
|
100 |
+
video_fps: int, num_inference_steps: int, use_attention_slicing: bool,
|
101 |
+
current_fragment_index: int, output_path: str, progress
|
102 |
+
):
|
103 |
+
worker_to_use = None
|
104 |
+
try:
|
105 |
+
with self.lock:
|
106 |
+
# 1. Espera a limpeza da thread anterior, se ainda estiver rodando.
|
107 |
+
if self.last_cleanup_thread and self.last_cleanup_thread.is_alive():
|
108 |
+
print("LTX POOL MANAGER: Aguardando limpeza da GPU anterior...")
|
109 |
+
self.last_cleanup_thread.join()
|
110 |
+
print("LTX POOL MANAGER: Limpeza anterior concluΓda.")
|
111 |
+
|
112 |
+
# 2. Seleciona o worker ATUAL para o trabalho
|
113 |
+
worker_to_use = self.workers[self.current_worker_index]
|
114 |
+
|
115 |
+
# 3. Seleciona o worker ANTERIOR para iniciar a limpeza
|
116 |
+
previous_worker_index = (self.current_worker_index - 1 + len(self.workers)) % len(self.workers)
|
117 |
+
worker_to_cleanup = self.workers[previous_worker_index]
|
118 |
+
|
119 |
+
# 4. Dispara a limpeza do worker ANTERIOR em uma nova thread
|
120 |
+
cleanup_thread = threading.Thread(target=self._cleanup_worker, args=(worker_to_cleanup,))
|
121 |
+
cleanup_thread.start()
|
122 |
+
self.last_cleanup_thread = cleanup_thread
|
123 |
+
|
124 |
+
# 5. Prepara o worker ATUAL para a computaΓ§Γ£o
|
125 |
+
worker_to_use.to_gpu()
|
126 |
+
|
127 |
+
# 6. Atualiza o Γndice para a PRΓXIMA chamada
|
128 |
+
self.current_worker_index = (self.current_worker_index + 1) % len(self.workers)
|
129 |
+
|
130 |
+
# --- A GERAΓΓO OCORRE FORA DO LOCK ---
|
131 |
+
target_device = worker_to_use.device
|
132 |
+
|
133 |
+
if use_attention_slicing:
|
134 |
+
worker_to_use.pipeline.enable_attention_slicing()
|
135 |
+
|
136 |
+
media_paths = [item[0] for item in conditioning_items_data]
|
137 |
+
start_frames = [item[1] for item in conditioning_items_data]
|
138 |
+
strengths = [item[2] for item in conditioning_items_data]
|
139 |
+
|
140 |
+
padded_h, padded_w = ((height - 1) // 32 + 1) * 32, ((width - 1) // 32 + 1) * 32
|
141 |
+
padding_vals = calculate_padding(height, width, padded_h, padded_w)
|
142 |
+
|
143 |
+
conditioning_items = prepare_conditioning(
|
144 |
+
conditioning_media_paths=media_paths, conditioning_strengths=strengths,
|
145 |
+
conditioning_start_frames=start_frames, height=height, width=width,
|
146 |
+
num_frames=video_total_frames, padding=padding_vals, pipeline=worker_to_use.pipeline,
|
147 |
+
)
|
148 |
+
|
149 |
+
for item in conditioning_items:
|
150 |
+
item.media_item = item.media_item.to(target_device)
|
151 |
+
|
152 |
+
first_pass_config = worker_to_use.config.get("first_pass", {}).copy()
|
153 |
+
first_pass_config['num_inference_steps'] = int(num_inference_steps)
|
154 |
+
|
155 |
+
kwargs = {
|
156 |
+
"prompt": motion_prompt, "negative_prompt": "blurry, distorted, bad quality, artifacts",
|
157 |
+
"height": padded_h, "width": padded_w, "num_frames": video_total_frames,
|
158 |
+
"frame_rate": video_fps,
|
159 |
+
"generator": torch.Generator(device=target_device).manual_seed(int(seed) + current_fragment_index),
|
160 |
+
"output_type": "pt", "guidance_scale": float(cfg),
|
161 |
+
"timesteps": first_pass_config.get("timesteps"),
|
162 |
+
"conditioning_items": conditioning_items,
|
163 |
+
"decode_timestep": worker_to_use.config.get("decode_timestep"),
|
164 |
+
"decode_noise_scale": worker_to_use.config.get("decode_noise_scale"),
|
165 |
+
"stochastic_sampling": worker_to_use.config.get("stochastic_sampling"),
|
166 |
+
"image_cond_noise_scale": 0.15, "is_video": True, "vae_per_channel_normalize": True,
|
167 |
+
"mixed_precision": (worker_to_use.config.get("precision") == "mixed_precision"),
|
168 |
+
"enhance_prompt": False, "decode_every": 4, "num_inference_steps": int(num_inference_steps)
|
169 |
+
}
|
170 |
+
|
171 |
+
progress(0.1, desc=f"[CΓ’mera LTX em {worker_to_use.device}] Filmando Cena {current_fragment_index}...")
|
172 |
+
result_tensor = worker_to_use.generate_video_fragment_internal(**kwargs).images
|
173 |
+
|
174 |
+
pad_l, pad_r, pad_t, pad_b = map(int, padding_vals); slice_h = -pad_b if pad_b > 0 else None; slice_w = -pad_r if pad_r > 0 else None
|
175 |
+
cropped_tensor = result_tensor[:, :, :video_total_frames, pad_t:slice_h, pad_l:slice_w]
|
176 |
+
video_np = (cropped_tensor[0].permute(1, 2, 3, 0).cpu().float().numpy() * 255).astype(np.uint8)
|
177 |
+
|
178 |
+
with imageio.get_writer(output_path, fps=video_fps, codec='libx264', quality=8) as writer:
|
179 |
+
for frame in video_np: writer.append_data(frame)
|
180 |
+
|
181 |
+
return output_path, video_total_frames
|
182 |
+
|
183 |
+
finally:
|
184 |
+
if use_attention_slicing and worker_to_use and worker_to_use.pipeline:
|
185 |
+
worker_to_use.pipeline.disable_attention_slicing()
|
186 |
+
# A limpeza do worker_to_use serΓ‘ feita na PRΓXIMA chamada a esta funΓ§Γ£o.
|
187 |
+
|
188 |
+
# Singleton do Gerenciador de Pool
|
189 |
+
# Por padrΓ£o, usa cuda:2 e cuda:3. Altere aqui se necessΓ‘rio.
|
190 |
+
ltx_manager_singleton = LtxPoolManager(device_ids=['cuda:2', 'cuda:3'])
|
ltx_upscaler_manager_helpers.py
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ltx_upscaler_manager_helpers.py
|
2 |
+
# Gerente de Pool para o revezamento de workers de Upscaling.
|
3 |
+
# Este arquivo Γ© parte do projeto Euia-AducSdr e estΓ‘ sob a licenΓ§a AGPL v3.
|
4 |
+
# Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
|
5 |
+
|
6 |
+
import torch
|
7 |
+
import gc
|
8 |
+
import os
|
9 |
+
import threading
|
10 |
+
from ltx_worker_upscaler import LtxUpscaler
|
11 |
+
|
12 |
+
class LtxUpscalerPoolManager:
|
13 |
+
"""
|
14 |
+
Gerencia um pool de LtxUpscalerWorkers, orquestrando um revezamento entre GPUs
|
15 |
+
para a tarefa de upscaling.
|
16 |
+
"""
|
17 |
+
def __init__(self, device_ids=['cuda:2', 'cuda:3']):
|
18 |
+
print(f"LTX UPSCALER POOL MANAGER: Criando workers para os dispositivos: {device_ids}")
|
19 |
+
self.workers = [LtxUpscaler(device_id) for device_id in device_ids]
|
20 |
+
self.current_worker_index = 0
|
21 |
+
self.lock = threading.Lock()
|
22 |
+
self.last_cleanup_thread = None
|
23 |
+
|
24 |
+
def _cleanup_worker(self, worker):
|
25 |
+
"""FunΓ§Γ£o alvo para a thread de limpeza em background."""
|
26 |
+
print(f"UPSCALER CLEANUP THREAD: Iniciando limpeza da GPU {worker.device}...")
|
27 |
+
worker.to_cpu()
|
28 |
+
print(f"UPSCALER CLEANUP THREAD: Limpeza da GPU {worker.device} concluΓda.")
|
29 |
+
|
30 |
+
def upscale_video_fragment(self, video_path_low_res: str, output_path: str, video_fps: int):
|
31 |
+
"""
|
32 |
+
Seleciona um worker livre, faz o upscale de um fragmento e limpa o worker anterior.
|
33 |
+
"""
|
34 |
+
worker_to_use = None
|
35 |
+
try:
|
36 |
+
with self.lock:
|
37 |
+
if self.last_cleanup_thread and self.last_cleanup_thread.is_alive():
|
38 |
+
print("UPSCALER POOL MANAGER: Aguardando limpeza da GPU anterior...")
|
39 |
+
self.last_cleanup_thread.join()
|
40 |
+
|
41 |
+
worker_to_use = self.workers[self.current_worker_index]
|
42 |
+
previous_worker_index = (self.current_worker_index - 1 + len(self.workers)) % len(self.workers)
|
43 |
+
worker_to_cleanup = self.workers[previous_worker_index]
|
44 |
+
|
45 |
+
cleanup_thread = threading.Thread(target=self._cleanup_worker, args=(worker_to_cleanup,))
|
46 |
+
cleanup_thread.start()
|
47 |
+
self.last_cleanup_thread = cleanup_thread
|
48 |
+
|
49 |
+
worker_to_use.to_gpu()
|
50 |
+
|
51 |
+
self.current_worker_index = (self.current_worker_index + 1) % len(self.workers)
|
52 |
+
|
53 |
+
print(f"UPSCALER POOL MANAGER: Worker em {worker_to_use.device} iniciando upscale de {os.path.basename(video_path_low_res)}...")
|
54 |
+
worker_to_use.upscale_video_fragment(video_path_low_res, output_path, video_fps)
|
55 |
+
print(f"UPSCALER POOL MANAGER: Upscale de {os.path.basename(video_path_low_res)} concluΓdo.")
|
56 |
+
|
57 |
+
finally:
|
58 |
+
# A limpeza do worker_to_use serΓ‘ feita na prΓ³xima chamada
|
59 |
+
pass
|
60 |
+
|
61 |
+
# --- InstΓ’ncia Singleton do Gerenciador de Upscaling ---
|
62 |
+
ltx_upscaler_manager_singleton = LtxUpscalerPoolManager(device_ids=['cuda:2', 'cuda:3'])
|
ltx_worker_base.py
ADDED
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ltx_worker_base.py (GPU-C: cuda:2)
|
2 |
+
# Worker para gerar os fragmentos de vΓdeo em resoluΓ§Γ£o base.
|
3 |
+
# Este arquivo Γ© parte do projeto Euia-AducSdr e estΓ‘ sob a licenΓ§a AGPL v3.
|
4 |
+
# Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
|
5 |
+
|
6 |
+
import torch
|
7 |
+
import gc
|
8 |
+
import os
|
9 |
+
import yaml
|
10 |
+
import numpy as np
|
11 |
+
import imageio
|
12 |
+
from pathlib import Path
|
13 |
+
import huggingface_hub
|
14 |
+
|
15 |
+
from inference import (
|
16 |
+
create_ltx_video_pipeline,
|
17 |
+
ConditioningItem,
|
18 |
+
calculate_padding,
|
19 |
+
prepare_conditioning
|
20 |
+
)
|
21 |
+
|
22 |
+
class LtxGenerator:
|
23 |
+
def __init__(self, device_id='cuda:2'):
|
24 |
+
print(f"WORKER CΓMERA-BASE: Inicializando...")
|
25 |
+
self.device = torch.device(device_id if torch.cuda.is_available() else 'cpu')
|
26 |
+
print(f"WORKER CΓMERA-BASE: Usando dispositivo: {self.device}")
|
27 |
+
|
28 |
+
config_file_path = "configs/ltxv-13b-0.9.8-distilled.yaml"
|
29 |
+
with open(config_file_path, "r") as file:
|
30 |
+
self.config = yaml.safe_load(file)
|
31 |
+
|
32 |
+
LTX_REPO = "Lightricks/LTX-Video"
|
33 |
+
models_dir = "downloaded_models_gradio"
|
34 |
+
Path(models_dir).mkdir(parents=True, exist_ok=True)
|
35 |
+
|
36 |
+
print("WORKER CΓMERA-BASE: Carregando pipeline LTX na CPU (estado de repouso)...")
|
37 |
+
distilled_model_actual_path = huggingface_hub.hf_hub_download(
|
38 |
+
repo_id=LTX_REPO,
|
39 |
+
filename=self.config["checkpoint_path"],
|
40 |
+
local_dir=models_dir,
|
41 |
+
local_dir_use_symlinks=False
|
42 |
+
)
|
43 |
+
|
44 |
+
self.pipeline = create_ltx_video_pipeline(
|
45 |
+
ckpt_path=distilled_model_actual_path,
|
46 |
+
precision=self.config["precision"],
|
47 |
+
text_encoder_model_name_or_path=self.config["text_encoder_model_name_or_path"],
|
48 |
+
sampler=self.config["sampler"],
|
49 |
+
device='cpu'
|
50 |
+
)
|
51 |
+
print("WORKER CΓMERA-BASE: Pronto (na CPU).")
|
52 |
+
|
53 |
+
def to_gpu(self):
|
54 |
+
if self.pipeline and torch.cuda.is_available():
|
55 |
+
print(f"WORKER CΓMERA-BASE: Movendo LTX para {self.device}...")
|
56 |
+
self.pipeline.to(self.device)
|
57 |
+
|
58 |
+
def to_cpu(self):
|
59 |
+
if self.pipeline:
|
60 |
+
print(f"WORKER CΓMERA-BASE: Descarregando LTX da GPU {self.device}...")
|
61 |
+
self.pipeline.to('cpu')
|
62 |
+
gc.collect()
|
63 |
+
if torch.cuda.is_available():
|
64 |
+
torch.cuda.empty_cache()
|
65 |
+
|
66 |
+
def generate_video_fragment(
|
67 |
+
self, motion_prompt: str, conditioning_items_data: list,
|
68 |
+
width: int, height: int, seed: int, cfg: float, video_total_frames: int,
|
69 |
+
video_fps: int, num_inference_steps: int, use_attention_slicing: bool,
|
70 |
+
current_fragment_index: int, output_path: str, progress
|
71 |
+
):
|
72 |
+
progress(0.1, desc=f"[CΓ’mera LTX Base] Filmando Cena {current_fragment_index}...")
|
73 |
+
|
74 |
+
target_device = self.pipeline.device
|
75 |
+
|
76 |
+
if use_attention_slicing:
|
77 |
+
self.pipeline.enable_attention_slicing()
|
78 |
+
|
79 |
+
media_paths = [item[0] for item in conditioning_items_data]
|
80 |
+
start_frames = [item[1] for item in conditioning_items_data]
|
81 |
+
strengths = [item[2] for item in conditioning_items_data]
|
82 |
+
|
83 |
+
padded_h, padded_w = ((height - 1) // 32 + 1) * 32, ((width - 1) // 32 + 1) * 32
|
84 |
+
padding_vals = calculate_padding(height, width, padded_h, padded_w)
|
85 |
+
|
86 |
+
conditioning_items = prepare_conditioning(
|
87 |
+
conditioning_media_paths=media_paths, conditioning_strengths=strengths,
|
88 |
+
conditioning_start_frames=start_frames, height=height, width=width,
|
89 |
+
num_frames=video_total_frames, padding=padding_vals, pipeline=self.pipeline,
|
90 |
+
)
|
91 |
+
|
92 |
+
for item in conditioning_items:
|
93 |
+
item.media_item = item.media_item.to(target_device)
|
94 |
+
|
95 |
+
actual_num_frames = int(round((float(video_total_frames) - 1.0) / 8.0) * 8 + 1)
|
96 |
+
first_pass_config = self.config.get("first_pass", {}).copy()
|
97 |
+
first_pass_config['num_inference_steps'] = int(num_inference_steps)
|
98 |
+
|
99 |
+
kwargs = {
|
100 |
+
"prompt": motion_prompt, "negative_prompt": "blurry, distorted, bad quality, artifacts",
|
101 |
+
"height": padded_h, "width": padded_w, "num_frames": actual_num_frames,
|
102 |
+
"frame_rate": video_fps,
|
103 |
+
"generator": torch.Generator(device=target_device).manual_seed(int(seed) + current_fragment_index),
|
104 |
+
"output_type": "pt", "guidance_scale": float(cfg),
|
105 |
+
"timesteps": first_pass_config.get("timesteps"),
|
106 |
+
"conditioning_items": conditioning_items,
|
107 |
+
"decode_timestep": self.config.get("decode_timestep"),
|
108 |
+
"decode_noise_scale": self.config.get("decode_noise_scale"),
|
109 |
+
"stochastic_sampling": self.config.get("stochastic_sampling"),
|
110 |
+
"image_cond_noise_scale": 0.15, "is_video": True, "vae_per_channel_normalize": True,
|
111 |
+
"mixed_precision": (self.config.get("precision") == "mixed_precision"),
|
112 |
+
"enhance_prompt": False, "decode_every": 4, "num_inference_steps": int(num_inference_steps)
|
113 |
+
}
|
114 |
+
|
115 |
+
result_tensor = self.pipeline(**kwargs).images
|
116 |
+
|
117 |
+
pad_l, pad_r, pad_t, pad_b = map(int, padding_vals)
|
118 |
+
slice_h = -pad_b if pad_b > 0 else None; slice_w = -pad_r if pad_r > 0 else None
|
119 |
+
|
120 |
+
cropped_tensor = result_tensor[:, :, :actual_num_frames, pad_t:slice_h, pad_l:slice_w]
|
121 |
+
video_np = (cropped_tensor[0].permute(1, 2, 3, 0).cpu().float().numpy() * 255).astype(np.uint8)
|
122 |
+
|
123 |
+
with imageio.get_writer(output_path, fps=video_fps, codec='libx264', quality=8) as writer:
|
124 |
+
for frame in video_np:
|
125 |
+
writer.append_data(frame)
|
126 |
+
|
127 |
+
if use_attention_slicing and self.pipeline:
|
128 |
+
self.pipeline.disable_attention_slicing()
|
129 |
+
|
130 |
+
return output_path, actual_num_frames
|
131 |
+
|
132 |
+
# --- InstΓ’ncia Singleton para o Worker Base ---
|
133 |
+
ltx_base_singleton = LtxGenerator(device_id='cuda:2')
|
ltx_worker_upscaler.py
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ltx_worker_upscaler.py (Corrigido com dtype=bfloat16)
|
2 |
+
# Worker para fazer upscale dos fragmentos de vΓdeo para alta resoluΓ§Γ£o.
|
3 |
+
# Este arquivo Γ© parte do projeto Euia-AducSdr e estΓ‘ sob a licenΓ§a AGPL v3.
|
4 |
+
# Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
|
5 |
+
|
6 |
+
import torch
|
7 |
+
import gc
|
8 |
+
import os
|
9 |
+
import yaml
|
10 |
+
import numpy as np
|
11 |
+
import imageio
|
12 |
+
from pathlib import Path
|
13 |
+
import huggingface_hub
|
14 |
+
from einops import rearrange
|
15 |
+
|
16 |
+
from inference import create_ltx_video_pipeline
|
17 |
+
from ltx_video.models.autoencoders.latent_upsampler import LatentUpsampler
|
18 |
+
from ltx_video.models.autoencoders.vae_encode import vae_encode, vae_decode
|
19 |
+
|
20 |
+
class LtxUpscaler:
|
21 |
+
def __init__(self, device_id='cuda:2'):
|
22 |
+
print(f"WORKER CΓMERA-UPSCALER: Inicializando para {device_id}...")
|
23 |
+
self.device = torch.device(device_id if torch.cuda.is_available() else 'cpu')
|
24 |
+
self.model_dtype = torch.bfloat16 # <<<--- DEFINIR O DTYPE DO MODELO
|
25 |
+
|
26 |
+
config_file_path = "configs/ltxv-13b-0.9.8-distilled.yaml"
|
27 |
+
with open(config_file_path, "r") as file:
|
28 |
+
self.config = yaml.safe_load(file)
|
29 |
+
|
30 |
+
LTX_REPO = "Lightricks/LTX-Video"
|
31 |
+
models_dir = "downloaded_models_gradio"
|
32 |
+
Path(models_dir).mkdir(parents=True, exist_ok=True)
|
33 |
+
|
34 |
+
print(f"WORKER CΓMERA-UPSCALER ({self.device}): Carregando VAE na CPU...")
|
35 |
+
distilled_model_actual_path = huggingface_hub.hf_hub_download(
|
36 |
+
repo_id=LTX_REPO, filename=self.config["checkpoint_path"],
|
37 |
+
local_dir=models_dir, local_dir_use_symlinks=False
|
38 |
+
)
|
39 |
+
temp_pipeline = create_ltx_video_pipeline(
|
40 |
+
ckpt_path=distilled_model_actual_path, precision=self.config["precision"],
|
41 |
+
text_encoder_model_name_or_path=self.config["text_encoder_model_name_or_path"],
|
42 |
+
sampler=self.config["sampler"], device='cpu'
|
43 |
+
)
|
44 |
+
self.vae = temp_pipeline.vae.to(self.model_dtype) # <<<--- CARREGA NO DTYPE CORRETO
|
45 |
+
del temp_pipeline
|
46 |
+
gc.collect()
|
47 |
+
|
48 |
+
print(f"WORKER CΓMERA-UPSCALER ({self.device}): Carregando Latent Upsampler na CPU...")
|
49 |
+
upscaler_path = huggingface_hub.hf_hub_download(
|
50 |
+
repo_id=LTX_REPO, filename=self.config["spatial_upscaler_model_path"],
|
51 |
+
local_dir=models_dir, local_dir_use_symlinks=False
|
52 |
+
)
|
53 |
+
self.latent_upsampler = LatentUpsampler.from_pretrained(upscaler_path).to(self.model_dtype) # <<<--- CARREGA NO DTYPE CORRETO
|
54 |
+
self.latent_upsampler.to('cpu')
|
55 |
+
|
56 |
+
print(f"WORKER CΓMERA-UPSCALER ({self.device}): Pronto (na CPU).")
|
57 |
+
|
58 |
+
def to_gpu(self):
|
59 |
+
if self.latent_upsampler and self.vae and torch.cuda.is_available():
|
60 |
+
print(f"WORKER CΓMERA-UPSCALER: Movendo modelos para {self.device}...")
|
61 |
+
self.latent_upsampler.to(self.device)
|
62 |
+
self.vae.to(self.device)
|
63 |
+
|
64 |
+
def to_cpu(self):
|
65 |
+
if self.latent_upsampler and self.vae:
|
66 |
+
print(f"WORKER CΓMERA-UPSCALER: Descarregando modelos da GPU {self.device}...")
|
67 |
+
self.latent_upsampler.to('cpu')
|
68 |
+
self.vae.to('cpu')
|
69 |
+
gc.collect()
|
70 |
+
if torch.cuda.is_available():
|
71 |
+
torch.cuda.empty_cache()
|
72 |
+
|
73 |
+
@torch.no_grad()
|
74 |
+
def upscale_video_fragment(self, video_path_low_res: str, output_path: str, video_fps: int):
|
75 |
+
print(f"UPSCALER ({self.device}): Processando {os.path.basename(video_path_low_res)}")
|
76 |
+
|
77 |
+
with imageio.get_reader(video_path_low_res) as reader:
|
78 |
+
video_frames = [frame for frame in reader]
|
79 |
+
video_np = np.stack(video_frames)
|
80 |
+
|
81 |
+
# <<<--- CORREΓΓO CRΓTICA AQUI ---_>>>
|
82 |
+
video_tensor = torch.from_numpy(video_np).permute(0, 3, 1, 2).float() / 255.0
|
83 |
+
video_tensor = (video_tensor * 2.0) - 1.0
|
84 |
+
video_tensor = video_tensor.unsqueeze(0).permute(0, 2, 1, 3, 4)
|
85 |
+
video_tensor = video_tensor.to(self.device, dtype=self.model_dtype) # Envia para GPU JΓ NO DTYPE CORRETO
|
86 |
+
|
87 |
+
latents = vae_encode(video_tensor, self.vae)
|
88 |
+
upsampled_latents = self.latent_upsampler(latents)
|
89 |
+
upsampled_video_tensor = vae_decode(upsampled_latents, self.vae, is_video=True)
|
90 |
+
|
91 |
+
upsampled_video_tensor = (upsampled_video_tensor.clamp(-1, 1) + 1) / 2.0
|
92 |
+
video_np_high_res = (upsampled_video_tensor[0].permute(1, 2, 3, 0).cpu().float().numpy() * 255).astype(np.uint8) # Converte de volta para float para salvar
|
93 |
+
|
94 |
+
with imageio.get_writer(output_path, fps=video_fps, codec='libx264', quality=8) as writer:
|
95 |
+
for frame in video_np_high_res:
|
96 |
+
writer.append_data(frame)
|
97 |
+
|
98 |
+
print(f"UPSCALER ({self.device}): Arquivo salvo em {os.path.basename(output_path)}")
|
99 |
+
return output_path
|
requirements.txt
CHANGED
@@ -15,11 +15,13 @@ imageio
|
|
15 |
imageio-ffmpeg
|
16 |
einops
|
17 |
timm
|
|
|
|
|
18 |
av
|
19 |
-
|
20 |
torch
|
21 |
peft
|
22 |
-
diffusers==0.31.0
|
23 |
-
transformers
|
24 |
-
accelerate
|
25 |
git+https://github.com/ToTheBeginning/facexlib.git
|
|
|
15 |
imageio-ffmpeg
|
16 |
einops
|
17 |
timm
|
18 |
+
safetensors
|
19 |
+
|
20 |
av
|
21 |
+
git+https://github.com/huggingface/diffusers.git@main
|
22 |
torch
|
23 |
peft
|
24 |
+
#diffusers==0.31.0
|
25 |
+
transformers
|
26 |
+
accelerate
|
27 |
git+https://github.com/ToTheBeginning/facexlib.git
|