Shwai He's picture

5 14

Shwai He

Shwai

·

https://shwai-he.github.io/

Shwai-He

AI & ML interests

Deep Learning, Mechine Learning, Natural Language Processing.

Organizations

Shwai's activity

liked 14 models 2 months ago

s1ghhh/Mistral-7B-v0.1-Drop4Attn

Updated Sep 8 • 11 • 2

s1ghhh/Llama-2-13b-Drop8Block

Updated Sep 8 • 12 • 2

s1ghhh/Llama-2-13b-Drop4Block

Updated Sep 8 • 7 • 2

s1ghhh/Llama-2-13b-Drop8Attn

Updated Sep 8 • 9 • 2

s1ghhh/Llama-2-13b-Drop4Attn

Updated Sep 8 • 10 • 2

s1ghhh/Llama-2-13b-Drop4MLP

Updated Sep 8 • 7 • 2

s1ghhh/Llama-2-13b-Drop8MLP

Updated Sep 8 • 6 • 2

s1ghhh/Mistral-7B-v0.1-Drop4Block

Updated Sep 8 • 6 • 2

s1ghhh/Mistral-7B-v0.1-Drop8Block

Updated Sep 8 • 9 • 2

s1ghhh/Mistral-7B-v0.1-Drop8Attn

Updated Sep 8 • 9 • 2

s1ghhh/Mistral-7B-v0.1-Drop4MLP

Updated Sep 8 • 7 • 2

s1ghhh/Mistral-7B-v0.1-Drop8MLP

Updated Sep 8 • 11 • 2

s1ghhh/Llama-3-70b-Drop

Text Generation • Updated Oct 23 • 24 • 3

s1ghhh/Llama-2-70b-Drop

Text Generation • Updated Oct 23 • 20 • 2

authored a paper 2 months ago

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Paper • 2410.13184 • Published Oct 17 • 2

upvoted a collection 2 months ago

LLM-Drop

Model weights of paper "What Matters in Transformers? Not All Attention is Needed" (https://arxiv.org/abs/2406.15786) • 14 items • Updated Oct 23 • 4

upvoted a paper 2 months ago

What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22 • 29

authored 3 papers 5 months ago

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Paper • 2310.11716 • Published Oct 18, 2023 • 5

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

Paper • 2310.09832 • Published Oct 15, 2023 • 1

Vega-MT: The JD Explore Academy Translation System for WMT22

Paper • 2209.09444 • Published Sep 20, 2022