Spaces:
Sleeping
Sleeping
wrapping
Browse files
README.md
ADDED
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Spaces Ship
|
2 |
+
|
3 |
+
This is a spaceship through Spaces.
|
4 |
+
|
5 |
+
I started this mostly as a way to see more Spaces that I was interested in. Since there aren't any search/filtering options outside of full-text search and searching for Space titles, I wanted more ways to look around and get inspired.
|
6 |
+
|
7 |
+
It expanded as I saw what information you can get from leveraging the APIs in the `huggingface_hub` client.
|
8 |
+
|
9 |
+
Short-term, I'm running a lot of this locally, but long-term my goal is to run [this script](https://github.com/jsulz/hf-spaces-stats-builder/blob/main/src/pipeline.py) every 2 weeks, which:
|
10 |
+
|
11 |
+
- Calls `list_spaces` to get all spaces and some high level metadata
|
12 |
+
- Calls `space_info` to get the next level of depth from each space
|
13 |
+
- Stores this into a Dataset on the Hub - [jsulz/space-stats](https://huggingface.co/datasets/jsulz/space-stats)
|
14 |
+
- Inspiration from this came from [cfahlgren1/hub-stats](cfahlgren1/hub-stats), but desiring one level of additional information (only available by making a lot of API calls)
|
15 |
+
|
16 |
+
I want this to be on a semi-regular cadence, but also respect that this takes in the realm of 12-15 hours (with some potential speedup from parallel )
|
17 |
+
|
18 |
+
This Space consumes that dataset into a Gradio app that has two tabs:
|
19 |
+
|
20 |
+
- Spaces Overview
|
21 |
+
- Spaces Search
|
22 |
+
|
23 |
+
The remaining content from here on out is a breakdown of what's in the Space, both tabs, and my feelings/thoughts about them after doing some digging.
|
24 |
+
|
25 |
+
# General
|
26 |
+
|
27 |
+
All of this needs context needs to live in the app in some form alongside the component. Avoiding that for the moment.
|
28 |
+
|
29 |
+
All of the labels and words that do exist need cleanup. Not worried about that for the moment.
|
30 |
+
|
31 |
+
# Spaces Overview
|
32 |
+
|
33 |
+
Charts exist for the following (commentary for each in sub-bullets):
|
34 |
+
|
35 |
+
- Growth of Spaces over Time
|
36 |
+
- This is a line chart that shows the number of spaces created over time. Shows all Spaces, regardless of status.
|
37 |
+
- Distribution of Spaces by SDK
|
38 |
+
- This is a pie chart that shows the distribution of Spaces by SDK. Can be either gradio, streamlit, docker, or static.
|
39 |
+
- Distribution of Spaces by Emoji
|
40 |
+
- This is a pie chart that shows the distribution of Spaces by Emoji. This is a bit silly, but could be fun to work on this more to make it visually funny/appealing.
|
41 |
+
- Relationship between Number of Spaces Created and Number of Likes
|
42 |
+
- This is a scatter plot that shows the relationship between the number of spaces created by an author and the number of likes. Not very interesting except for the outliers.
|
43 |
+
- Relationship between Space Emoji and Number of Likes
|
44 |
+
- This is a scatter plot that shows the relationship between the emoji used in a space and the number of likes. Similar take as with the other scatter plot.
|
45 |
+
- Hardware in Use
|
46 |
+
- This is a log scale bar chart of hardware in use. More interesting stuff here.
|
47 |
+
- Most Popular Model Authors
|
48 |
+
- Bar chart of most popular model authors whose models are used in Spaces.
|
49 |
+
- Most Used Models
|
50 |
+
- Bar chart of most popular models used in Spaces.
|
51 |
+
- Most Popular Dataset Authors
|
52 |
+
- Bar chart of most popular dataset authors whose models are used in Spaces.
|
53 |
+
- Most Used Datasets
|
54 |
+
- Bar chart of most popular datasets used in Spaces.
|
55 |
+
- Number of Duplicates by Space
|
56 |
+
- Table showing the most duplicated Spaces.
|
57 |
+
- Number of Likes by Space
|
58 |
+
- Table showing the most liked Spaces.
|
59 |
+
- Number of Spaces by Author
|
60 |
+
- Table showing the most prolific Spaces authors.
|
61 |
+
- Number of Likes by Author
|
62 |
+
- Table showing the authors with the most cumulative likes across all Spaces.
|
63 |
+
|
64 |
+
# Spaces Search
|
65 |
+
|
66 |
+
Filtration Options exist for the following (commentary for each in sub-bullets)
|
67 |
+
|
68 |
+
- Emojis
|
69 |
+
- Fun, not very useful.
|
70 |
+
- Likes
|
71 |
+
- Easy and helpful to see popular stuff.
|
72 |
+
- Authors
|
73 |
+
- Kinda fun, but so many authors with so little context.
|
74 |
+
- SDK/Tags
|
75 |
+
- Too many tags - lots of one-offs. Would maybe limit this to the top 10ish.
|
76 |
+
- Hardware
|
77 |
+
- More useful than I thought it would be.
|
78 |
+
- License
|
79 |
+
- Meh.
|
80 |
+
- Models
|
81 |
+
- Very cool, but lots of one-offs and not highly used. Would maybe limit this to the top 10ish.
|
82 |
+
- Datasets
|
83 |
+
- Same as models.
|
84 |
+
- Dev Mode
|
85 |
+
- The interesting thing about this is how little it's used.
|
app.py
CHANGED
@@ -72,6 +72,7 @@ def filtered_df(
|
|
72 |
filtered_models,
|
73 |
filtered_datasets,
|
74 |
space_licenses,
|
|
|
75 |
):
|
76 |
"""
|
77 |
Filter the dataframe based on the given criteria.
|
@@ -143,6 +144,10 @@ def filtered_df(
|
|
143 |
"r_licenses": "Licenses",
|
144 |
}
|
145 |
)
|
|
|
|
|
|
|
|
|
146 |
|
147 |
return _df[["URL", "Likes", "Models", "Datasets", "Licenses"]]
|
148 |
|
@@ -238,7 +243,7 @@ with gr.Blocks(fill_width=True) as demo:
|
|
238 |
emoji_likes,
|
239 |
x="id",
|
240 |
y="likes",
|
241 |
-
title="Relationship between Emoji and Number of Likes",
|
242 |
labels={"id": "Number of Spaces Created", "likes": "Number of Likes"},
|
243 |
hover_data={"emoji": True},
|
244 |
template="plotly_dark",
|
@@ -399,6 +404,7 @@ with gr.Blocks(fill_width=True) as demo:
|
|
399 |
multiselect=True,
|
400 |
)
|
401 |
|
|
|
402 |
clear = gr.ClearButton(components=[
|
403 |
emoji,
|
404 |
author,
|
@@ -426,6 +432,7 @@ with gr.Blocks(fill_width=True) as demo:
|
|
426 |
"r_models",
|
427 |
"r_datasets",
|
428 |
"r_licenses",
|
|
|
429 |
]
|
430 |
]
|
431 |
)
|
@@ -440,6 +447,7 @@ with gr.Blocks(fill_width=True) as demo:
|
|
440 |
models,
|
441 |
datasets,
|
442 |
space_license,
|
|
|
443 |
],
|
444 |
datatype="html",
|
445 |
wrap=True,
|
|
|
72 |
filtered_models,
|
73 |
filtered_datasets,
|
74 |
space_licenses,
|
75 |
+
filtered_devmode,
|
76 |
):
|
77 |
"""
|
78 |
Filter the dataframe based on the given criteria.
|
|
|
144 |
"r_licenses": "Licenses",
|
145 |
}
|
146 |
)
|
147 |
+
if filtered_devmode:
|
148 |
+
_df = _df[
|
149 |
+
_df["devMode"] == filtered_devmode
|
150 |
+
]
|
151 |
|
152 |
return _df[["URL", "Likes", "Models", "Datasets", "Licenses"]]
|
153 |
|
|
|
243 |
emoji_likes,
|
244 |
x="id",
|
245 |
y="likes",
|
246 |
+
title="Relationship between Space Emoji and Number of Likes",
|
247 |
labels={"id": "Number of Spaces Created", "likes": "Number of Likes"},
|
248 |
hover_data={"emoji": True},
|
249 |
template="plotly_dark",
|
|
|
404 |
multiselect=True,
|
405 |
)
|
406 |
|
407 |
+
devmode = gr.Checkbox(label="Show Dev Mode Spaces")
|
408 |
clear = gr.ClearButton(components=[
|
409 |
emoji,
|
410 |
author,
|
|
|
432 |
"r_models",
|
433 |
"r_datasets",
|
434 |
"r_licenses",
|
435 |
+
'devMode'
|
436 |
]
|
437 |
]
|
438 |
)
|
|
|
447 |
models,
|
448 |
datasets,
|
449 |
space_license,
|
450 |
+
devmode,
|
451 |
],
|
452 |
datatype="html",
|
453 |
wrap=True,
|