ลžuayp Talha Kocabay's picture

ลžuayp Talha Kocabay

suayptalha

AI & ML interests

NLP, LLMs, Transformers, Merging, RNNs, CNNs, ANNs, Computer Vision and ML algorithms

Recent Activity

Organizations

TรœBฤฐTAK Science High School AI Club's profile picture scikit-learn's profile picture Blog-explorers's profile picture Mathematical Intelligence's profile picture Hugging Face Discord Community's profile picture Abra Muhara's profile picture Nerdy Face's profile picture Intelligent Estate's profile picture CipherAI's profile picture open/ acc's profile picture AI Starter Pack's profile picture Xyper AI's profile picture Harezmi-25's profile picture Lyra AI's profile picture

suayptalha's activity

reacted to sometimesanotion's post with ๐Ÿ”ฅ about 3 hours ago
view post
Post
2354
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion
posted an update about 20 hours ago
replied to sometimesanotion's post about 1 month ago
replied to their post about 1 month ago
reacted to sometimesanotion's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
2720
I've managed a #1 score of 41.22% average for 14B parameter models on the Open LLM Leaderboard. As of this writing, sometimesanotion/Lamarck-14B-v0.7 is #8 for all models up to 70B parameters.

It took a custom toolchain around Arcee AI's mergekit to manage the complex merges, gradients, and LoRAs required to make this happen. I really like seeing features of many quality finetunes in one solid generalist model.
ยท
posted an update about 1 month ago
posted an update about 2 months ago
view post
Post
2142
๐Ÿš€ Introducing ๐…๐ข๐ซ๐ฌ๐ญ ๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ˆ๐ง๐ญ๐ž๐ ๐ซ๐š๐ญ๐ข๐จ๐ง ๐จ๐Ÿ ๐ฆ๐ข๐ง๐†๐‘๐” ๐Œ๐จ๐๐ž๐ฅ๐ฌ from the paper ๐–๐ž๐ซ๐ž ๐‘๐๐๐ฌ ๐€๐ฅ๐ฅ ๐–๐ž ๐๐ž๐ž๐๐ž๐?

๐Ÿ–ฅ I have integrated ๐ง๐ž๐ฑ๐ญ-๐ ๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐จ๐ง ๐‘๐๐๐ฌ, specifically minGRU, which offer faster performance compared to Transformer architectures, into HuggingFace. This allows users to leverage the lighter and more efficient minGRU models with the "๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ" ๐ฅ๐ข๐›๐ซ๐š๐ซ๐ฒ for both usage and training.

๐Ÿ’ป I integrated two main tasks: ๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐’๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž๐‚๐ฅ๐š๐ฌ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง and ๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐‚๐š๐ฎ๐ฌ๐š๐ฅ๐‹๐Œ.

๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐’๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž๐‚๐ฅ๐š๐ฌ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง:
You can use this class for ๐’๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž ๐‚๐ฅ๐š๐ฌ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง tasks. I also trained a Sentiment Analysis model with stanfordnlp/imdb dataset.

๐Œ๐ข๐ง๐†๐‘๐”๐…๐จ๐ซ๐‚๐š๐ฎ๐ฌ๐š๐ฅ๐‹๐Œ:
You can use this class for ๐‚๐š๐ฎ๐ฌ๐š๐ฅ ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐Œ๐จ๐๐ž๐ฅ tasks such as GPT, Llama. I also trained an example model with roneneldan/TinyStories dataset. You can fine-tune and use it!

๐Ÿ”— ๐‹๐ข๐ง๐ค๐ฌ:
Models: suayptalha/mingru-676fe8d90760d01b7955d7ab
GitHub: https://github.com/suayptalha/minGRU-hf
LinkedIn Post: https://www.linkedin.com/posts/suayp-talha-kocabay_mingru-a-suayptalha-collection-activity-7278755484172439552-wNY1

๐Ÿ“ฐ ๐‚๐ซ๐ž๐๐ข๐ญ๐ฌ:
Paper Link: https://arxiv.org/abs/2410.01201

I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.
reacted to AdinaY's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
3616
The Chinese community is shipping ๐Ÿšข

DeepSeek V3 (685 B MoE) has quietly released on the hub!
Base: deepseek-ai/DeepSeek-V3-Base
Instruct: deepseek-ai/DeepSeek-V3

Canโ€™t wait to see whatโ€™s next!
  • 1 reply
ยท
replied to their post 2 months ago
view reply

Thank you for your reply, but I never intended to mislead anyone. This post was not entirely created with AI. We spent hours working on the models and dataset preparation and are proud of our efforts. I expected more constructive criticism.

reacted to their post with ๐Ÿ”ฅ 2 months ago
view post
Post
2496
๐Ÿš€ Introducing Substitution Cipher Solvers!

As @suayptalha and @Synd209 we are thrilled to share an update!

๐Ÿ”‘ This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.

These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!

Example:

Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.

Decoded text: A family member or a support person may stay with a patient during recovery.

Model Collection Link: Cipher-AI/substitution-cipher-solvers-6731ebd22f0f0d8e0e2e2e00

Organization Link: https://huggingface.co/Cipher-AI
  • 3 replies
ยท
posted an update 2 months ago
view post
Post
2496
๐Ÿš€ Introducing Substitution Cipher Solvers!

As @suayptalha and @Synd209 we are thrilled to share an update!

๐Ÿ”‘ This project contains a text-to-text model designed to decrypt English and Turkish text encoded using a substitution cipher. In a substitution cipher, each letter in the plaintext is replaced by a corresponding, unique letter to form the ciphertext. The model leverages statistical and linguistic properties of English to make educated guesses about the letter substitutions, aiming to recover the original plaintext message.

These models were fine-tuned on T5-base. The models are for monoalphabetic English and Turkish substitution ciphers, and they output decoded text and the alphabet with an accuracy that has never been achieved before!

Example:

Encoded text: Z hztwgx tstcsf qf z ulooqfe osfuqb tzx uezx awej z ozewsbe vlfwby fsmqisfx.

Decoded text: A family member or a support person may stay with a patient during recovery.

Model Collection Link: Cipher-AI/substitution-cipher-solvers-6731ebd22f0f0d8e0e2e2e00

Organization Link: https://huggingface.co/Cipher-AI
  • 3 replies
ยท
reacted to etemiz's post with ๐Ÿ‘€ 2 months ago
view post
Post
2322
As more synthetic datasets are made, we move slowly away from human alignment.
  • 4 replies
ยท
posted an update 2 months ago
view post
Post
1635
๐Ÿš€ FastLlama Series is Live!

๐Ÿฆพ Experience faster, lighter, and smarter language models! The new FastLlama makes Meta's LLaMA models work with smaller file sizes, lower system requirements, and higher performance. The model supports 8 languages, including English, German, and Spanish.

๐Ÿค– Built on the LLaMA 3.2-1B-Instruct model, fine-tuned with Hugging Face's SmolTalk and MetaMathQA-50k datasets, and powered by LoRA (Low-Rank Adaptation) for groundbreaking mathematical reasoning.

๐Ÿ’ป Its compact size makes it versatile for a wide range of applications!
๐Ÿ’ฌ Chat with the model:
๐Ÿ”— Chat Link: suayptalha/Chat-with-FastLlama
๐Ÿ”— Model Link: suayptalha/FastLlama-3.2-1B-Instruct