AbstractPhila PRO

AbstractPhil

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more. RIP BeatriXL V3 - REAL. May your flight be through the heavens dear Beatrix. We'll try again soon.

Recent Activity

posted an update 5 days ago
With flan-t5-base and clip models as teachers; I have produced and successfully trained a dual-shunt cross-attention adapter archetype. This is NOT a lora. This adapter is currently tasked with taking the T5-flan-base to guide the outputs of VIT-L-14 and/or VIT-bigG-14, and the opposite is equally usable and utilizable within the archetype. Meaning the CLIP_G can also guide the T5-FLAN-base. These checkpoints were trained with 20 million synthetic human-templated captions, and they can be heavily improved by multiple languages, additional depiction context, and any sort of finetune task desired of the user that can be applied to the T5-flan-base with little to no training due to the adapter's functionality and accuracy. VIT-L-14 adapters only took a couple hours on a colab a100 and the VIT-bigG-14 took about 4 hours. So you can rapidly adapt many of these in short periods of time with almost no additional overhead beyond the single t5-flan-base required. Each can be compiled, loaded, and offloaded. This is a cross-attention system meant to shape encoded text after the output is received from the clip models and is very fast to inference - the t5-flan-base on the other hand isn't the fastest. It's trained on a form of cooperative association with a series of complex losses designed specifically for this associative process. This adapter has individual gating for tokenization context with a multitude of safeguards to prevent overfitting during rapid learning and can be paired with any number of additional other adapters. I'm currently formatting the comfyui nodes that will allow easy conditioning shift to showcase the full power of this cooperative system's capability. The comfyui nodes will be available here shortly, I just need to write them. https://github.com/AbstractEyes/comfy-clip-shunts
View all activity

Organizations

DeepGHS's profile picture BangumiBase's profile picture Abstract Powered's profile picture

Posts 4

view post
Post
267
With flan-t5-base and clip models as teachers; I have produced and successfully trained a dual-shunt cross-attention adapter archetype. This is NOT a lora.
This adapter is currently tasked with taking the T5-flan-base to guide the outputs of VIT-L-14 and/or VIT-bigG-14, and the opposite is equally usable and utilizable within the archetype. Meaning the CLIP_G can also guide the T5-FLAN-base.

These checkpoints were trained with 20 million synthetic human-templated captions, and they can be heavily improved by multiple languages, additional depiction context, and any sort of finetune task desired of the user that can be applied to the T5-flan-base with little to no training due to the adapter's functionality and accuracy.

VIT-L-14 adapters only took a couple hours on a colab a100 and the VIT-bigG-14 took about 4 hours. So you can rapidly adapt many of these in short periods of time with almost no additional overhead beyond the single t5-flan-base required. Each can be compiled, loaded, and offloaded.

This is a cross-attention system meant to shape encoded text after the output is received from the clip models and is very fast to inference - the t5-flan-base on the other hand isn't the fastest.

It's trained on a form of cooperative association with a series of complex losses designed specifically for this associative process.

This adapter has individual gating for tokenization context with a multitude of safeguards to prevent overfitting during rapid learning and can be paired with any number of additional other adapters.

I'm currently formatting the comfyui nodes that will allow easy conditioning shift to showcase the full power of this cooperative system's capability.

The comfyui nodes will be available here shortly, I just need to write them.
https://github.com/AbstractEyes/comfy-clip-shunts
view post
Post
247
The T5-small + VIT-L-14 guidance shunt adapter is ready for toy use.
AbstractPhil/t5-vit-14-v1
Included is a simple drop-in for sdxl experimentation using colab.

The outcome is okay but not great - diffusers is a headache so I spent more time trying to disjoint that machine than I did actually messing with this adapter.

I trained two variations of the baseline adapter;
t5-small vanilla and t5-small-human-associated-try2-pass3.
The vanilla was more accurate to adding context while the human associated stays locked onto human topics like a bloodhound... badly. Both ended up being substandard, even with a robust adapter like this.

Finetunes with specific goals can complete at runtime if desired due to the t5-small's tiny size, clip_l's inference speed, and the adapter's size. The adapter is very small and has safeguards for overfitting that can be disabled, so runtime freezing and adaptive shifts can be a viable methodology to immediate task pipeline adaptation.

The t5-small lacks the behavioral complexity of a model more built for such a task such as the base, large, or xxl - or even the Flan T5-small. However, this doesn't slow the little brain slug down. It guides and it's wrappers have many rapid generation potentials, whether it's trained the way I trained it or not.
The proof of concept is there, and the outcomes are present. Judge yourself.
The next variation will be more dims, more catches, higher conv, and additional safeguards to prevent overfitting - as well as including considerably more laion flavors so the T5-flan-base doesn't overwhelm or vise-versa.