File size: 2,869 Bytes
3e61871
 
 
 
 
 
38904bd
 
 
 
 
 
 
 
 
 
 
3e61871
924113f
 
 
 
3e61871
924113f
 
 
3e61871
924113f
 
6d2cd96
 
 
924113f
 
 
e64686e
924113f
 
 
 
e64686e
924113f
 
 
 
e64686e
924113f
 
 
3e61871
 
924113f
7c32199
3e61871
924113f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38904bd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
<!doctype html>
<html>
	<head>
		<meta charset="utf-8" />
		<meta name="viewport" content="width=device-width" />
		<title>My static Space</title>

		<!-- Google tag (gtag.js) -->
		<script async src="https://www.googletagmanager.com/gtag/js?id=G-2ZR8R45ZZ1"></script>
		<script>
		  window.dataLayer = window.dataLayer || [];
		  function gtag(){dataLayer.push(arguments);}
		  gtag('js', new Date());

		  gtag('config', 'G-2ZR8R45ZZ1');
		</script>

		<link rel="stylesheet" href="style.css" />
		<script
			type="module"
			src="https://gradio.s3-us-west-2.amazonaws.com/4.36.1/gradio.js"
		></script>
	</head>
	<body class="theme-dependent">
		<div class="description">
			<h2>VPTQ Online Demo</h2>
			<p>
				<b>VPTQ (Vector Post-Training Quantization)</b> is an advanced compression technique that dramatically reduces the size of large language models such as the 70B and 405B Llama models. VPTQ efficiently compresses these models to 1-2 bits within just a few hours, enabling them to run effectively on GPUs with limited memory.
				For more information, visit the following links:
                <p style="font-weight: bold; font-size: larger;">
                    The current demo runs on a free, shared A100 provided by HUGGINGFACE, which may lead to long load times for model loading and acquiring an available GPU. This demo is intended to showcase the quality of the quantized model, not inference speed.
                </p>
				<ul>
					<li>
						<a href="https://arxiv.org/abs/2409.17066" target="_blank" class="link-styled">
							<img src="arxiv-logo.png" alt="arXiv" width="20" height="20" /> <b>Paper on arXiv</b>
						</a>
					</li>
					<li>
						<a href="https://github.com/microsoft/VPTQ" target="_blank" class="link-styled">
							<img src="github-mark.png" alt="GitHub" width="20" height="20" /> <b>GitHub Repository</b>
						</a>
					</li>
					<li>
						<a href="https://huggingface.co/VPTQ-community" target="_blank" class="link-styled">
							<img src="hf-logo.png" alt="Hugging Face" width="20" height="20" /> <b>Hugging Face Community</b>
						</a>
					</li>
				</ul>
			</p>
		</div>

        <gradio-app src="https://opensourceronin-vptq-demo-f6c7fc7.hf.space"></gradio-app>
	</body>
	<style>
		body.theme-dependent {
			background-color: #0d1117;
			color: #c9d1d9;
			font-family: Arial, sans-serif;
		}

		.description h2 {
			color: #58a6ff;
		}

		.link-styled {
			color: #58a6ff;
			text-decoration: none;
		}

		.link-styled:hover {
			text-decoration: underline;
		}

		.link-styled:visited {
			color: #8b949e;
		}

		@media (prefers-color-scheme: dark) {
			body.theme-dependent {
				--background-color: #0d1117;
				--text-color: #c9d1d9;
			}
		}

		@media (prefers-color-scheme: light) {
			body.theme-dependent {
				--background-color: #ffffff;
				--text-color: #000000;
			}
		}
	</style>
</html>