gperdrizet commited on
Commit
9161371
·
verified ·
1 Parent(s): 887b877

Added parsed LinkedIn resume, GitHub repository list and job call

Browse files
tests/test_data/github_repos.json ADDED
@@ -0,0 +1,580 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "name": "ds-12",
4
+ "description": "Course materials for 4Geeks Academy data science cohort 12",
5
+ "language": "Jupyter Notebook",
6
+ "stars": 3,
7
+ "forks": 1,
8
+ "updated_at": "2025-07-29T02:49:06Z",
9
+ "created_at": "2025-06-23T23:17:01Z",
10
+ "html_url": "https://github.com/gperdrizet/ds-12",
11
+ "topics": [
12
+ "data-science",
13
+ "python"
14
+ ],
15
+ "size": 5711,
16
+ "readme": "# ds-12\nCourse materials for ds-12\n\n1. [YouTube playlist](https://youtu.be/607QEWYZQpU?si=rBIrfjwxsHJk3xf4)\n2. [Module slides](https://github.com/gperdrizet/ds-12/blob/main/pages/slides.md)\n3. [Project solutions](https://github.com/gperdrizet/ds-12/blob/main/pages/solutions.md)\n4. [Data science project MVPs](https://github.com/gperdrizet/ds-12/blob/main/pages/MVPs.md)\n5. [Data science project template repo](https://github.com/gperdrizet/4Geeks_datascience_project)\n5. [How-to guides](https://github.com/gperdrizet/ds-12/blob/main/pages/guides.md)\n\n\n## Extras\n\n### 2025-07-23\n\nYou will need two statistical test for tonight's assignment: the t-test and ANOVA. Both are in the SciPy stats module.\n\n1. [`ttest_ind`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html): t-test for means in two independent samples.\n2. [`f_oneway`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html): ANOVA for equivalence in means of two or more groups. Note: this test only tells you if one or more groups is significantly different than the others - not which group or groups!\n\n### 2025-07-18\n\nOpenAI just released their ChatGPT based agent yesterday - here are the details:\n\n- Press release/FAQ style overview: [ChatGPT agent](https://help.openai.com/en/articles/11752874-chatgpt-agent)\n- Full technical details: [ChatGPT Agent System Card](https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdf)\n\n\n### 2025-07-16\n\nWhile we are on the 'math' portion of the course one good, if a little obscure, Python library to know about is [SymPy](https://www.sympy.org/en/index.html). It does symbolic math in Python - including derivatives. We won't run into it often, but its good to know its out there in case you ever need it. Here's and example from the documentation - calculating the first derivative of a cosine function:\n\n```python\nimport sympy as sp\n\nx = sp.symbols('x')\nderivative = sp.diff(sp.cos(x), x)\n\nprint(f'First derivative: str(derivative)')\n```\n```text\nFirst derivative: -sin(x)\n```\n\n\n### 2025-07-14\n\nAs promised here is an 'extra' assignment which will walk you through hard-coding your own optimizer in Python to fit a linear model to toy data. Highly recommend taking a look - the assignment will give you a good 'gut' feeling for what is happening under the hood when we train machine learning models:\n\n[Linear Regression & Optimization Assignment](https://github.com/4GeeksAcademy/gperdrizet-optimization-bonus-assignment)\n\n2024 Nobel prize in physics was awarded for early research which lead to modern neural networks. The prize was shared between two researchers: John Hopfield, who invented the 'Hopfield network' and Geoffrey Hinton, who designed early gradient descent algorithms.\n\n1. [2024 Nobel Prize in Physics](https://www.nobelprize.org/prizes/physics/2024/popular-information/): description of the history and importance of the works\n2. [ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION](https://arxiv.org/pdf/1412.6980): Scientific paper describing ADAM, one of the most common/popular optimization algorithms for training neural networks (note the publication year and the first authors affiliations!).\n\n\n### 2025-07-11\n\nInteresting further topic to read up on while we are learning about APIs: [Model Context Protocol](https://modelcontextprotocol.io/introduction). MCP was originally proposed by Anthropic, but is an open standard that anyone can use. It's basically a type of API designed for LLMs and agents to use. It standardizes communication between the model and data source, allowing a way to easily use and share tools for building agents. See also [A2A](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/) (Google) and [ACP](https://www.ibm.com/think/topics/agent-communication-protocol) (IBM) - same idea, but for communication between agents.\n\n\n### 2025-07-02\n\nCool talk by Bohan Zhang of OpenAI's infrastructure team - covers their implementation of PostgreSQL and shows what is possible with a cutting edge, production grade SQL database at a top company: [OpenAI: Scaling PostgreSQL to the Next Level](https://www.pixelstech.net/article/1747708863-openai%3a-scaling-postgresql-to-the-next-level).\n\n\n### 2025-06-27\n\nUseful Pandas methods for the real estate data cleanup assignment:\n\n1. `.sort_values()` used to sort a dataframe\n2. `.unique()` & `.nunique()` used to get information about unique values in a dataframe/series\n3. `.isna()` checks for NaN (not a number) missing value placeholders\n3. `.dropna()` used to remove NaN (not a number) missing value placeholder from a dataframe or series\n\nYou can find more information about what these methods do and how to use them in the Pandas [DataFrame](https://pandas.pydata.org/docs/reference/frame.html) and [general function](https://pandas.pydata.org/docs/reference/general_functions.html) documentation.\n\nThere is a whole module about plotting coming up - but for now, a quick skim of the Matplotlib [hist](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) documentation should be enough to complete the last question."
17
+ },
18
+ {
19
+ "name": "4Geeks_datascience_project",
20
+ "description": "Boilerplate repository for 4Geeks data science assignments to be completed in GitHub Codespaces.",
21
+ "language": "Jupyter Notebook",
22
+ "stars": 1,
23
+ "forks": 43,
24
+ "updated_at": "2025-07-28T20:21:12Z",
25
+ "created_at": "2025-03-03T15:16:14Z",
26
+ "html_url": "https://github.com/gperdrizet/4Geeks_datascience_project",
27
+ "topics": [],
28
+ "size": 25,
29
+ "readme": "# 4Geeks data science project boilerplate\n\nMinimal Python 3.11 repository for 4Geeks data science assignments. Several useful Python packages and VSCode extensions are installed on Codespace boot-up. Directories for models and data are created within the Codespace but excluded from tracking. The notebooks directory contains `notebook.ipynb`, run this notebook to verify the environment. It can then be deleted or renamed to use for your project.\n\n## 1. Set-up\n\nFork this repository by clicking the *Fork* button at the upper right. Make sure to set 4Geeks as the owner of the new fork - this way 4Geeks pays for your codespace usage. Then start a Codespace on your fork by clicking the green *Code* button and then '**+**' icon under Codespaces in the drop-down menu.\n\n## 2. Environment\n\n### 2.1. Repository structure\n\n```text\n.\n├──.devcontainer\n│ └── devcontainer.json\n│\n├── .gitignore\n├── LICENSE\n├── README.md\n├── data\n├── models\n├── notebooks\n│ └── notebook.ipynb\n│\n└── requirements.txt\n```\n\n### 2.2. Python\n**Base image**: [Python 3.11](https://github.com/devcontainers/images/tree/main/src/python)\n\nPackages installed via `requirements.txt`:\n\n1. [ipykernel 6.30.0](https://pypi.org/project/ipykernel/)\n2. [matplotlib 3.10.3](https://matplotlib.org/stable/index.html)\n3. [numpy 2.3.2](https://numpy.org/doc/stable/index.html)\n4. [pandas 2.3.1](https://pandas.pydata.org/docs/)\n5. [pyarrow 21.0.0](https://arrow.apache.org/docs/python/index.html)\n6. [scipy 1.16.1](https://scipy.org/)\n7. [scikit-learn 1.7.1](https://scikit-learn.org/stable/index.html)\n8. [seaborn 0.13.2](https://seaborn.pydata.org/)\n\nIf you need to install additional Python packages, you can do so via the terminal with: `pip install packagename`.\n\n### 2.3. VSCode extensions\n\nSepcified via `devcontainier.json`.\n\n1. [ms-python.python](https://marketplace.visualstudio.com/items?itemName=ms-python.python)\n2. [ms-toolsai.jupyter](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter)\n3. [streetsidesoftware.code-spell-checker](https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.code-spell-checker)\n\nVSCode extensions can be added via the *Extensions* tab located on the activities panel at the left once inside the Codespace.\n"
30
+ },
31
+ {
32
+ "name": "codespace-spark-cluster",
33
+ "description": "Server node for GitHub Codespace Spark cluster.",
34
+ "language": "Shell",
35
+ "stars": 0,
36
+ "forks": 4,
37
+ "updated_at": "2025-07-19T00:36:57Z",
38
+ "created_at": "2025-03-06T17:01:19Z",
39
+ "html_url": "https://github.com/gperdrizet/codespace-spark-cluster",
40
+ "topics": [],
41
+ "size": 78,
42
+ "readme": "# Codespace Spark Cluster\n\nGitHub Codespace Spark cluster.\n"
43
+ },
44
+ {
45
+ "name": "unit-four-final-project",
46
+ "description": "HuggingFace Agents Course - Unit 4: Final Project",
47
+ "language": "Python",
48
+ "stars": 0,
49
+ "forks": 0,
50
+ "updated_at": "2025-07-05T01:30:55Z",
51
+ "created_at": "2025-06-25T00:07:35Z",
52
+ "html_url": "https://github.com/gperdrizet/unit-four-final-project",
53
+ "topics": [
54
+ "agents",
55
+ "ai",
56
+ "gaia",
57
+ "generative-ai",
58
+ "huggingface",
59
+ "llms"
60
+ ],
61
+ "size": 142,
62
+ "readme": "---\ntitle: Unit Four - Final Project\nsdk: gradio\nsdk_version: 5.25.2\napp_file: app.py\ncolorFrom: green\ncolorTo: gray\npinned: True\nhf_oauth: true\n# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.\nhf_oauth_expiration_minutes: 480\ntags:\n - smolagents\n - agent\n - smolagent\n - tool\n - agent-course\n---\n\nCheck out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference"
63
+ },
64
+ {
65
+ "name": "unit-two-frameworks",
66
+ "description": "HuggingFace Agents Course - Unit 2: Introduction to Agentic Frameworks",
67
+ "language": "Jupyter Notebook",
68
+ "stars": 0,
69
+ "forks": 0,
70
+ "updated_at": "2025-07-01T12:57:47Z",
71
+ "created_at": "2025-06-21T15:41:26Z",
72
+ "html_url": "https://github.com/gperdrizet/unit-two-frameworks",
73
+ "topics": [
74
+ "agents",
75
+ "ai",
76
+ "generative-ai",
77
+ "huggingface",
78
+ "langchain",
79
+ "langgraph",
80
+ "llms",
81
+ "smolagents"
82
+ ],
83
+ "size": 15461,
84
+ "readme": "# Unit two: frameworks for AI agents\n\nHuggingFace Agents Course - Unit 2: Introduction to Agentic Frameworks demonstration notebooks.\n\n- My main GitHub repository for the course: [HuggingFace agents course](https://github.com/gperdrizet/hf-agents-course).\n- Unit two introduction page on HuggingFace: [Introduction to Agentic Frameworks](https://huggingface.co/learn/agents-course/unit2/introduction)\n\n## Running\n\nTo run the notebooks, you need to provide the following credentials via environment variables. The method to do so will depend on the environment in which you are running (see below).\n\n1. `HF_TOKEN`: A HuggingFace access token with repository read/write and inference permission\n2. `LANGFUSE_PUBLIC_KEY`: A Langfuse public key\n3. `LANGFUSE_SECRET_KEY`: A Langfuse secret key\n4. `OPENAI_API_KEY`: An OpenAI API key\n5. `PHOENIX_API_KEY`: An Arise AI Phoenix API key\n\nAll of these can be generated using a free-tier account from the respective providers. **Note**: you don't need all keys for every notebook. If you are only interested in a specific notebook or notebooks, take a look at what keys are actually used before you set up every credential listed above.\n\nThere are two options to run the notebooks:\n\n### 1. GitHub codespace (recommended)\n\nFork a copy of the repository, then add the credentials mentioned above as codespace secrets: settings → Secrets and variables → Codespaces → New repository secret. Start a new codespace on main.\n\n### 2. Local\n\nClone the repository, create a virtual environment and install requirements.txt via pip. Provide the credentials mentioned above as environment variables. Note: for the vision agent to work, you need to have Chromium installed and chromium-webdriver configured properly.\n\n## Notebooks\n\n### 2.1. smolagents\n\n1. [Code Agents](https://github.com/gperdrizet/unit-two-frameworks/blob/main/2.1-smolagents/code_agents.ipynb)\n2. [Tool Calling Agents](https://github.com/gperdrizet/unit-two-frameworks/blob/main/2.1-smolagents/tool_calling_agents.ipynb)\n3. [Tools](https://github.com/gperdrizet/unit-two-frameworks/blob/main/2.1-smolagents/tools.ipynb)\n4. [Retrieval Agents](https://github.com/gperdrizet/unit-two-frameworks/blob/main/2.1-smolagents/retrieval_agents.ipynb)\n5. [Multiagents](https://github.com/gperdrizet/unit-two-frameworks/blob/main/2.1-smolagents/multiagent_notebook.ipynb)\n6. [Vision Agents](https://github.com/gperdrizet/unit-two-frameworks/blob/main/2.1-smolagents/vision_agents.ipynb)\n\n### 2.2. LLamaIndex\n\n### 2.3. LangGraph\n"
85
+ },
86
+ {
87
+ "name": "shit",
88
+ "description": null,
89
+ "language": null,
90
+ "stars": 1,
91
+ "forks": 0,
92
+ "updated_at": "2025-06-30T03:38:16Z",
93
+ "created_at": "2025-06-11T23:16:52Z",
94
+ "html_url": "https://github.com/gperdrizet/shit",
95
+ "topics": [],
96
+ "size": 1,
97
+ "readme": "# Shit\n"
98
+ },
99
+ {
100
+ "name": "unit-one-introduction",
101
+ "description": "HuggingFace Agents Course Unit 1: Introduction to Agents",
102
+ "language": "Python",
103
+ "stars": 1,
104
+ "forks": 0,
105
+ "updated_at": "2025-06-25T01:17:14Z",
106
+ "created_at": "2025-06-18T18:59:53Z",
107
+ "html_url": "https://github.com/gperdrizet/unit-one-introduction",
108
+ "topics": [
109
+ "agents",
110
+ "ai",
111
+ "huggingface",
112
+ "llms",
113
+ "smolagents"
114
+ ],
115
+ "size": 123,
116
+ "readme": "---\ntitle: Unit one - first agent\ncolorFrom: green\ncolorTo: gray\nsdk: gradio\nsdk_version: 5.23.1\napp_file: app.py\npinned: false\ntags:\n- smolagents\n- agent\n- smolagent\n- tool\n- agent-course\n---\n\nCheck out the configuration reference at [spaces-config-reference](https://huggingface.co/docs/hub/spaces-config-reference).\n\n# Unit one project: first agent using smolagents\n\nHands-on tutorial - create a simple agent using smolagents.\n\n- My main GitHub repository for the course: [HuggingFace agents course](https://github.com/gperdrizet/hf-agents-course).\n- Unit one tutorial page on HuggingFace: [Let’s Create Our First Agent Using smolagents](https://huggingface.co/learn/agents-course/unit1/tutorial)\n\n## Features\n\n1. Multi-turn agent with [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) using Gradio and smolagents\n2. Image generation using [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) from Black Forest Labs\n3. Text to speech using [Chatterbox](https://huggingface.co/ResembleAI/chatterbox) from Resemble AI\n4. Web search/site crawling\n5. Time-zone look-up\n\n## Running\n\nFrom your HuggingFace settings dashboard, create a fine-grained access token with inference permissions.\n\n### 1. HuggingFace spaces\n\n[Unit one project: smolagents](https://huggingface.co/spaces/gperdrizet/unit-one-smolagents)\n\nMake your own copy of the space and add your HuggingFace token as `HF_TOKEN` via: settings → Secrets and variables → New secret.\n\n### 2. GitHub codespace\n\n[Unit one project: smolagents](https://github.com/gperdrizet/unit-one-introduction/tree/main)\n\nFork a copy of the repository, then add your HuggingFace token as `HF_TOKEN` via: settings → Secrets and variables → Codespaces → New repository secret. Start a new codespace on main.\n"
117
+ },
118
+ {
119
+ "name": "hf-agents-course",
120
+ "description": "HuggingFace Agents Course: build and deploy AI agents.",
121
+ "language": null,
122
+ "stars": 0,
123
+ "forks": 0,
124
+ "updated_at": "2025-06-25T00:24:30Z",
125
+ "created_at": "2025-06-18T17:56:46Z",
126
+ "html_url": "https://github.com/gperdrizet/hf-agents-course",
127
+ "topics": [
128
+ "agents",
129
+ "huggingface",
130
+ "llms"
131
+ ],
132
+ "size": 28,
133
+ "readme": "# HuggingFace Agents Course\n\n[Course home page](https://huggingface.co/learn/agents-course/unit0/introduction)\n\n## Syllabus\n\n| Chapter | Topic | Description |\n|---------|-------|-------------|\n| 0 | [Welcome to the course](https://huggingface.co/learn/agents-course/unit0/onboarding) | Set you up with the tools and platforms that you will use. |\n| 1 | [Introduction to agents](https://huggingface.co/learn/agents-course/unit1/introduction) | Explain Tools, Thoughts, Actions, Observations, and their formats. Explain LLMs, messages, special tokens and chat templates. Show a simple use case using python functions as tools. |\n| 1-bonus | [Fine-tuning an LLM for function calling](https://huggingface.co/learn/agents-course/bonus-unit1/introduction) | Let’s use LoRa and fine-tune a model to perform function calling inside a notebook. |\n| 2 | [Frameworks for AI agents](https://huggingface.co/learn/agents-course/unit2/introduction) | Understand how the fundamentals are implemented in popular libraries : smolagents, LangGraph, LLamaIndex |\n| 2.1 | [The smolagents framework](https://huggingface.co/learn/agents-course/unit2/smolagents/introduction) | |\n| 2.2 | [The LLamaIndex framework](https://huggingface.co/learn/agents-course/unit2/llama-index/introduction) | |\n| 2.3 | [The LangGraph framework](https://huggingface.co/learn/agents-course/unit2/langgraph/introduction) | |\n| 2-bonus | [Agent Observability and Evaluation](https://huggingface.co/learn/agents-course/bonus-unit2/introduction) | Learn how to trace and evaluate your AI agents to make them ready for production. |\n| 3 | [Use Cases for Agentic Rag](https://huggingface.co/learn/agents-course/unit3/agentic-rag/introduction) | Let’s build some real life use cases (open to PRs 🤗 from experienced Agent builders) |\n| 3-bonus | [Agents in Games with Pokemon](https://huggingface.co/learn/agents-course/bonus-unit3/introduction) | |\n| 4 | [Final Assignment](https://huggingface.co/learn/agents-course/unit4/introduction) | Build an agent for a selected benchmark and prove your understanding of Agents on the student leaderboard 🚀 |\n"
134
+ },
135
+ {
136
+ "name": "MCP-hackathon",
137
+ "description": "RASS (retreival augmented simple syndication): MCP tools for RSS feeds and agentic RSS feed reader demo.",
138
+ "language": null,
139
+ "stars": 4,
140
+ "forks": 1,
141
+ "updated_at": "2025-06-14T17:58:37Z",
142
+ "created_at": "2025-06-03T15:47:30Z",
143
+ "html_url": "https://github.com/gperdrizet/MCP-hackathon",
144
+ "topics": [
145
+ "agents",
146
+ "anthropic",
147
+ "gradio",
148
+ "huggingface",
149
+ "llms",
150
+ "mcp",
151
+ "modal",
152
+ "rss"
153
+ ],
154
+ "size": 210,
155
+ "readme": ""
156
+ },
157
+ {
158
+ "name": "rss-mcp-client",
159
+ "description": "LLM agent RSS feed reader client using Model Context Protocol.",
160
+ "language": "Python",
161
+ "stars": 0,
162
+ "forks": 0,
163
+ "updated_at": "2025-06-13T16:27:38Z",
164
+ "created_at": "2025-06-03T16:18:56Z",
165
+ "html_url": "https://github.com/gperdrizet/rss-mcp-client",
166
+ "topics": [
167
+ "agents",
168
+ "anthropic",
169
+ "gradio",
170
+ "huggingface-spaces",
171
+ "mcp",
172
+ "mcp-client",
173
+ "rss",
174
+ "rss-reader"
175
+ ],
176
+ "size": 86,
177
+ "readme": ""
178
+ },
179
+ {
180
+ "name": "rss-mcp-server",
181
+ "description": "RSS feed reader Model Context Protocol server.",
182
+ "language": "Python",
183
+ "stars": 2,
184
+ "forks": 0,
185
+ "updated_at": "2025-06-12T02:18:35Z",
186
+ "created_at": "2025-06-03T16:21:25Z",
187
+ "html_url": "https://github.com/gperdrizet/rss-mcp-server",
188
+ "topics": [
189
+ "gradio",
190
+ "huggingface",
191
+ "huggingface-spaces",
192
+ "mcp",
193
+ "mcp-server",
194
+ "rss"
195
+ ],
196
+ "size": 111,
197
+ "readme": ""
198
+ },
199
+ {
200
+ "name": "GCSB_MLE",
201
+ "description": "Google Cloud Skills Boost Machine Learning Engineer Learning Path.",
202
+ "language": "Jupyter Notebook",
203
+ "stars": 1,
204
+ "forks": 0,
205
+ "updated_at": "2025-06-12T00:43:20Z",
206
+ "created_at": "2024-10-23T12:13:10Z",
207
+ "html_url": "https://github.com/gperdrizet/GCSB_MLE",
208
+ "topics": [],
209
+ "size": 8308,
210
+ "readme": "# GCSB_MLE\n\nThis repository will be used to track and document my progress through the [Google Cloud Skills Boost Machine Learning Engineer Learning Path](https://www.cloudskillsboost.google/paths/17). Each course in the learning path listed below is associated with an issue and a GitHub project is used to track overall progress. Work for each section is completed on a branch which is merged and closed upon completion.\n\n**Note:** The section numbering below follows that given in the [study guide](https://github.com/gperdrizet/GCSB_MLE/blob/main/course_introduction_materials/machine_learning_engineer_study_guide.pdf) where the first two introductory sections listed on the [learning path page](https://www.cloudskillsboost.google/paths/17) are not included in the numbering.\n\n## Learning path outline\n\n### [Course 01. Introduction to AI and Machine Learning on Google Cloud (8 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/593)\n\n- ~~**Module 1**: AI Foundations on Google Cloud~~\n- ~~**Module 2**: AI Development on Google Cloud~~\n- ~~**Module 3**: ML Workflow and Vertex AI~~\n- ~~**Module 4**: Generative AI on Google Cloud~~\n\n### [Course 02. Prepare Data for ML APIs on Google Cloud (6.5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/631)\n\n- ~~**Lab 1**: Vertex AI: Qwik Start~~\n- ~~**Lab 2**: Dataprep: Qwik Start~~\n- ~~**Lab 3**: Dataflow: Qwik Start - Templates~~\n- ~~**Lab 4**: Dataflow: Qwik Start - Python~~\n- ~~**Lab 5**: Dataproc: Qwik Start - Console~~\n- ~~**Lab 6**: Dataproc: Qwik Start - Command Line~~\n- ~~**Lab 7**: Cloud Natural Language API: Qwik Start~~\n- ~~**Lab 8**: Speech-to-Text API: Qwik Start~~\n- ~~**Lab 9**: Video Intelligence: Qwik Start~~\n- ~~**Lab 10**: Prepare Data for ML APIs on Google Cloud: Challenge Lab~~\n\n### [Course 03. Working with Notebooks in Vertex AI (0.75 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/923)\n\n**Mini-course**: 8 lessons\n\n- ~~**Lesson 1**: Working with Notebooks in Vertex AI~~\n- ~~**Lesson 2**: Vertex AI Notebook Solutions~~\n- ~~**Lesson 3**: Vertex AI Colab Enterprise notebooks~~\n- ~~**Lesson 4**: Vertex AI Workbench instance notebooks~~\n- ~~**Summary**~~\n- ~~**Quiz**: Working with Notebooks in Vertex AI~~\n- ~~**Lab 1**: Exploratory Data Analysis using Bigquery and Colab Enterprise (2 hrs)~~\n- ~~**Lab 2**: Exploratory Data Analysis using Bigquery and Workbench Instances (2 hrs)~~\n\n### [Course 04. Create ML Models with BigQuery ML (5.5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/626)\n\n- **Lab 1**: ~~Getting Started with BigQuery ML~~\n- **Lab 2**: ~~Predict Visitor Purchases with a Classification Model in BigQuery ML~~\n- **Lab 3**: ~~Predict Taxi Fare with a BigQuery ML Forecasting Model~~\n- **Lab 4**: ~~Bracketology with Google Machine Learning~~\n- **Lab 5**: ~~Create ML Models with BigQuery ML: Challenge Lab~~\n\n### [Course 05. Engineer Data for Predictive Modeling with BigQuery ML (4.25 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/627)\n\n- **Lab 1**: ~~Creating a Data Transformation Pipeline with Cloud Dataprep~~\n- **Lab 2**: ~~ETL Processing on Google Cloud Using Dataflow and BigQuery (Python)~~\n- **Lab 3**: ~~Predict Visitor Purchases with a Classification Model in BigQuery ML~~\n- **Lab 4**: ~~Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab~~\n\n### [Course 06. Feature Engineering (24 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/11)\n\n- **Module 1**: ~~Introduction to Vertex AI Feature Store~~\n- **Module 2**: ~~Raw Data to Features~~\n- **Module 3**: ~~Feature Engineering~~\n- **Module 4**: ~~Preprocessing and Feature Creation~~\n- **Module 5**: ~~Feature Crosses: TensorFlow Playground~~\n- **Module 6**: ~~Introduction to TensorFlow Transform~~\n\n### [Course 07. Build, Train and Deploy ML Models with Keras on Google Cloud (15.5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/12)\n\n- **Module 1**: Introduction to the TensorFlow Ecosystem\n- **Module 2**: Design and Build an Input Data Pipeline\n- **Module 3**: Building Neural Networks with the TensorFlow and Keras API\n- **Module 4**: Training at Scale with Vertex AI\n\n### [Course 08. Production Machine Learning Systems (16 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/17)\n\n- **Module 1**: Architecting Production ML System\n- **Module 2**: Designing Adaptable ML System Designing High-Performance ML Systems\n- **Module 3**: Designing High-Performance ML Systems\n- **Module 4**: Hybrid ML Systems\n- **Module 5**: Troubleshooting ML Production Systems\n\n### [Course 09. Machine Learning Operations (MLOps): Getting Started (8 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/158)\n\n- **Module 1**: Employing Machine Learning Operations\n- **Module 2**: Vertex AI and MLOps on Vertex AI\n\n### [Course 10. Machine Learning Operations (MLOps) with Vertex AI: Manage Features (8 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/584)\n\n- **Module 1**: Introduction to Vertex AI Feature Store\n- **Module 2**: An In-Depth Look\n\n### [Course 11. Introduction to Generative AI (0.75 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/536)\n\n- **Mini-course**: 1 lesson\n\n### [Course 12. Introduction to Large Language Models (0.5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/539)\n\n- **Mini-course**: 1 lesson\n\n### [Course 13. Machine Learning Operations (MLOps) for Generative AI (0.5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/927)\n\n- **Mini Course**: 5 lessons\n\n### [Course 14. Machine Learning Operations (MLOps) with Vertex AI: Model Evaluation (2.5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/1080)\n\n- **Module 1**: Introduction to Model Evaluation\n- **Module 2**: Model Evaluation for Generative AI\n\n### [Course 15. ML Pipelines on Google Cloud (2.25 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/191)\n\n- **Module 1**: Introduction to TFX Pipelines\n- **Module 2**: Pipeline Orchestration with TFX\n- **Module 3**: Custom Components and CI/CD for TFX Pipelines\n- **Module 4**: ML Metadata with TFX\n- **Module 5**: Continuous Training with Multiple SDKs, KubeFlow & AI Platform Pipelines\n- **Module 6**: Continuous Training with Cloud Composer\n- **Module 7**: ML Pipelines with MLflow\n\n### [Course 16. Build and Deploy Machine Learning Solutions on Vertex AI (8.25 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/684)\n\n- **Lab 1**: Vertex AI: Qwik Start\n- **Lab 2**: Identify Damaged Car Parts with Vertex AutoML Vision\n- **Lab 3**: Deploy a BigQuery ML Customer Churn Classifier to Vertex AI for Online Predictions\n- **Lab 4**: Vertex Pipelines: Qwik Start\n- **Lab 5**: Build and Deploy Machine Learning Solutions with Vertex AI: Challenge Lab\n\n### [Course 17. Create Generative AI Applications on Google Cloud (4 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/1120)\n\n- **Module 1**: Generative AI Applications\n- **Module 2**: Prompts\n- **Module 3**: Retrieval Augmented Generation (RAG)\n\n### [Course 18. Responsible AI for Developers: Fairness and Bias (4 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/985)\n\n- **Module 1**: AI Interpretability and Transparency\n- **Module 2**: Modernizing Infrastructure in the Cloud\n\n### [Course 19. Responsible AI for Developers: Interpretability and Transparency (3 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/989)\n\n- **Module 1**: AI Interpretability and Transparency\n- **Module 2**: Modernizing Infrastructure in the Cloud\n\n### [Course 20. Responsible AI for Developers: Privacy and Safety (5 hours)](https://www.cloudskillsboost.google/paths/17/course_templates/1036)\n\n- **Module 1**: AI Privacy\n- **Module 2**: AI Safety\n"
211
+ },
212
+ {
213
+ "name": "OpenSearch",
214
+ "description": "Wikipedia full text search with OpenSearch vector database.",
215
+ "language": "Python",
216
+ "stars": 1,
217
+ "forks": 0,
218
+ "updated_at": "2025-06-12T00:42:44Z",
219
+ "created_at": "2024-04-03T23:17:05Z",
220
+ "html_url": "https://github.com/gperdrizet/OpenSearch",
221
+ "topics": [],
222
+ "size": 1693,
223
+ "readme": ""
224
+ },
225
+ {
226
+ "name": "llm_detector",
227
+ "description": "Synthetic text detection service. Google Cloud for Startups grant winner.",
228
+ "language": "Python",
229
+ "stars": 2,
230
+ "forks": 0,
231
+ "updated_at": "2025-06-12T00:42:04Z",
232
+ "created_at": "2024-06-21T14:26:15Z",
233
+ "html_url": "https://github.com/gperdrizet/llm_detector",
234
+ "topics": [
235
+ "generated-text-detection",
236
+ "llms",
237
+ "machine-learning",
238
+ "xgboost"
239
+ ],
240
+ "size": 84850,
241
+ "readme": "# Ask Agatha: synthetic text detection service\n\n## News\n\n**2024-08-27**: Malone (now agatha) has joined the [Google Cloud for Startups](https://cloud.google.com/startup) program! Lot's of excitement here - this success provides significant recognition and compute resources to the project. For now, the only visible change will be a rename of the project to 'Ask Agatha', with the model being colloquially referred to as 'agatha'. The LLM detector is still avalible on telegram via [@ask_agatha_bot](https://t.me/ask_agatha_bot). Please direct any inquiries to <[email protected]>.\n\n**2024-08-17**: Malone is temporarily off-line so that compute resources can be dedicated to benchmarking and improvements to the classifier. Check out what is going on in the [benchmarking](https://github.com/gperdrizet/llm_detector/tree/classifier/benchmarking/notebooks) and [classifier](https://github.com/gperdrizet/llm_detector/tree/classifier/classifier/notebooks) notebooks on the classifier branch. If you would really like to try malone out, get in touch and I will fire it up for you.\n\n**2024-08-07**: Malone was just named a Backdrop Build v5 Finalist! Check out the build page [here](https://backdropbuild.com/builds/cadmus)! Let's gooooo!\n\n**2024-08-01**: Backdrop build v5 [launch video](https://youtu.be/6zdLcsC9I_I?si=R6knOnxMySDIRKDQ) is up on YouTube. Congrats to all of the other Backdrop Build finishers!\n\n**2024-07-30**: Malone is live in Beta on Telegram, give it a try [here](https://t.me/the_malone_bot). Note: some Firefox users have reported issues with the botlink page - seems to be a Telegram issue, not a malone issue. You can also find malone by messaging '*/start*' to @the_malone_bot anywhere you use Telegram.\n\n**2024-07-08**: llm_detector is officially part of the Backdrop Build v5 cohort under the tentative name 'malone' starting today. Check out the backdrop [build page](https://backdropbuild.com/builds/v5/cadmus) for updates.\n\n## Project description\n\n![agatha](https://github.com/gperdrizet/llm_detector/blob/main/telegram_bot/assets/agatha_A.jpg?raw=true)\n\nAgatha is a synthetic text detection service available on [Telegram Messenger](https://telegram.org/), written in Python using [HuggingFace](https://huggingface.co), [scikit-learn](https://scikit-learn.org/stable/), [XGBoost](https://github.com/dmlc/xgboost), [Luigi](https://github.com/spotify/luigi) and [python-telegram-bot](https://github.com/python-telegram-bot/python-telegram-bot), supported by [Flask](https://flask.palletsprojects.com/en/3.0.x), [Celery](https://docs.celeryq.dev/en/stable/index.html), [Redis](https://redis.io/) & [Docker](https://www.docker.com/) and served via [Gunicorn](https://gunicorn.org/) and [Nginx](https://nginx.org/). Malone uses an in-house trained gradient boosting classifier to estimate the probability that a given text was generated by an LLM. It uses a set of engineered features derived from the input text, for more details see the [feature engineering notebooks](https://github.com/gperdrizet/llm_detector/tree/main/classifier/notebooks).\n\n## Table of Contents\n\n1. Features\n2. Where to find agatha\n3. Usage\n4. Performance\n5. Demonstration/experimentation notebooks\n6. About the author\n7. Disclaimer\n\n## 1. Features\n\n- **Easily accessible** - use it anywhere you can access Telegram: iOS or Android apps and any web browser.\n- **Simple interface** - no frills, just send the bot text and it will send back the probability that the text was machine generated.\n- **Useful and accurate** - provides a probability that text is synthetic, allowing users to make their own decisions when evaluating content. Maximum likelihood classification accuracy ~98% on held-out test data.\n- **Model agnostic** - agatha is not trained to detect the output of a specific LLM, instead, it uses a gradient boosting classifier and a set of numerical features derived from/calibrated on a large corpus of human and synthetic text samples from multiple LLMs.\n- **No logs** - no user data or message contents are ever persisted to disk.\n- **Open source codebase** - agatha is an open source project. Clone it, fork it, extend it, modify it, host it yourself and use it the way you want to use it.\n- **Free**\n\n## 2. Where to find agatha\n\nAgatha is publicly available on Telegram. You can find agatha via the [Telegram bot page](https://t.me/ask_agatha_bot), or just message @ask_agatha_bot with '/*start*' to start using it.\n\nThere are also plans in the works to offer the bare API to interested parties. If that's you, see section 6 below.\n\n## 3. Usage\n\nTo use agatha you will need a Telegram account. Telegram is free to use and available as an app for iOS and Android. There is also a web version for desktop use.\n\nOnce you have a Telegram account, agatha is simple to use. Send the bot any 'suspect' text and it will reply with the probability that the text in question was written by a human or generated by an LLM. For smartphone use, a good trick is long press on 'suspect' text and then share it to agatha's contact on Telegram via the context menu. Agatha is never more that 2 taps away!\n\n![telegram app screenshot](https://github.com/gperdrizet/llm_detector/blob/main/telegram_bot/assets/telegram_screenshot.jpg?raw=true)\n\nAgatha can run in two response modes: 'default' and 'verbose'. Default mode returns the probability associated with the most likely class as a percent (e.g. 75% chance a human wrote this). Verbose mode gives a little more detail about the feature values and prediction metrics. Set the mode by messaging '*/set_mode verbose*' or '*/set_mode default*'.\n\nFor best results, submitted text must be between 50 and 500 words.\n\n## 4. Performance\n\nAgatha is >~97.5% accurate on hold-out test data depending on the submitted text length. (see example confusion matrix below). Classification accuracy is lowest on short text and best on text >= 150 words. The miss-classified examples are more or less evenly split between false negatives and false positives.\n\n![XGBoost confusion matrix](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/figures/05.8.4.5-performance_benchmark_confusion_matrix.jpg)\n\nFor more details on the classifier training and performance see the following notebooks:\n\n1. [Stage I length binned classifier](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/05.4-stage_one_length_binned_classifier.ipynb)\n2. [Stage II length binned classifier](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/05.6-stage_two_length_binned_classifier.ipynb)\n3. [v2.0 classifier finalized](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/05.8-classifier_finalized_v2.0.ipynb)\n\n## 5. Demonstration/experimentation notebooks\n\nThese notebooks are the best way to understand the approach and the engineered features used to train the classifier.\n\n1. [Perplexity ratio data](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/01.1-perplexity_ratio_data_exploration.ipynb)\n2. [Perplexity ratio score](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/03.1-perplexity_ratio_score.ipynb)\n3. [TF-IDF score](https://github.com/gperdrizet/llm_detector/blob/main/classifier/notebooks/04.1-TF-IDF_score.ipynb)\n\n## 6. About the author\n\nMy name is Dr. George Perdrizet, I am a biochemistry & molecular biology PhD seeking a career step from academia to professional data science and/or machine learning engineering. This project was conceived from the scientific literature and built solo over the course of a few weeks - I strongly believe that I have a lot to offer the right organization. If you or anyone you know is interested in an ex-researcher from University of Chicago turned builder and data scientist, please reach out, I'd love to learn from and contribute to your project.\n\n- **Email**: <[email protected]>\n- **LinkedIn**: [linkedin.com/gperdrizet](https://www.linkedin.com/in/gperdrizet/)\n\n## 7. Disclaimer\n\nAgatha is an experimental research project meant for educational, informational and entertainment purposes only. All predictions are probabilistic in nature and subject to stochastic errors. Text classifications, no matter how high or low the reported probability, should not be interpreted as definitive proof of authorship or lack thereof.\n"
242
+ },
243
+ {
244
+ "name": "ensembleswarm",
245
+ "description": "Utility for regression on tabular data, implementing ensemble of ensembles with various SciKit-learn estimators.",
246
+ "language": "Python",
247
+ "stars": 1,
248
+ "forks": 0,
249
+ "updated_at": "2025-05-30T22:16:29Z",
250
+ "created_at": "2025-05-13T14:44:55Z",
251
+ "html_url": "https://github.com/gperdrizet/ensembleswarm",
252
+ "topics": [
253
+ "ensemble",
254
+ "machine-learning",
255
+ "regression"
256
+ ],
257
+ "size": 9348,
258
+ "readme": "# EnsembleSwarm\n\n[![PyPI release](https://github.com/gperdrizet/ensembleswarm/actions/workflows/publish_pypi.yml/badge.svg)](https://github.com/gperdrizet/ensembleswarm/actions/workflows/publish_pypi.yml) [![Python CI](https://github.com/gperdrizet/ensembleswarm/actions/workflows/python_ci.yml/badge.svg)](https://github.com/gperdrizet/ensembleswarm/actions/workflows/python_ci.yml)[![Devcontainer](https://github.com/gperdrizet/ensembleswarm/actions/workflows/codespaces/create_codespaces_prebuilds/badge.svg)](https://github.com/gperdrizet/ensembleswarm/actions/workflows/codespaces/create_codespaces_prebuilds)\n\nUtility for regression on tabular data, implementing ensembles of ensembles with various SciKit-learn estimators.\n\n## 1. Installation\n\nInstall the pre-release alpha from PyPI with:\n\n```bash\npip install ensembleswarm\n```\n"
259
+ },
260
+ {
261
+ "name": "postit",
262
+ "description": "Text summarization app.",
263
+ "language": "Python",
264
+ "stars": 0,
265
+ "forks": 0,
266
+ "updated_at": "2025-05-30T18:09:51Z",
267
+ "created_at": "2025-05-28T20:33:41Z",
268
+ "html_url": "https://github.com/gperdrizet/postit",
269
+ "topics": [],
270
+ "size": 25198,
271
+ "readme": ""
272
+ },
273
+ {
274
+ "name": "ensembleset",
275
+ "description": "Ensemble dataset generator for tabular data prediction and modeling projects.",
276
+ "language": "Python",
277
+ "stars": 1,
278
+ "forks": 0,
279
+ "updated_at": "2025-05-23T06:30:07Z",
280
+ "created_at": "2025-05-02T12:03:19Z",
281
+ "html_url": "https://github.com/gperdrizet/ensembleset",
282
+ "topics": [
283
+ "classification",
284
+ "ensemble",
285
+ "feature-engineering",
286
+ "machine-learning",
287
+ "regression",
288
+ "scikit-learn"
289
+ ],
290
+ "size": 9289,
291
+ "readme": "# EnsembleSet\n\n[![PyPI release](https://github.com/gperdrizet/ensembleset/actions/workflows/publish_pypi.yml/badge.svg)](https://github.com/gperdrizet/ensembleset/actions/workflows/publish_pypi.yml) [![Python CI](https://github.com/gperdrizet/ensembleset/actions/workflows/python_ci.yml/badge.svg)](https://github.com/gperdrizet/ensembleset/actions/workflows/python_ci.yml)[![Devcontainer](https://github.com/gperdrizet/ensembleset/actions/workflows/codespaces/create_codespaces_prebuilds/badge.svg)](https://github.com/gperdrizet/ensembleset/actions/workflows/codespaces/create_codespaces_prebuilds)\n\nEnsembleSet generates dataset ensembles by applying a randomized sequence of feature engineering methods to a randomized subset of input features.\n\n## 1. Installation\n\nInstall the pre-release alpha from PyPI with:\n\n```bash\npip install ensembleset\n```\n\n## 2. Usage\n\nSee the [example usage notebook](https://github.com/gperdrizet/ensembleset/blob/main/examples/regression_calorie_burn.ipynb).\n\nInitialize an EnsembleSet class instance, passing in the label name and training DataFrame. Optionally, include a test DataFrame and/or list of any string features and the path where you want EnsembleSet to put data. Then call the `make_datasets()` to generate an EnsembleSet, specifying:\n\n1. The number of individual datasets to generate.\n2. The fraction of features to randomly select for each feature engineering step.\n3. The number of feature engineering steps to run.\n\n```python\nimport ensembleset.dataset as ds\n\ndata_ensemble=ds.DataSet(\n label='label_column_name', # Required\n train_data=train_df, # Required\n test_data=test_df, # Optional, defaults to None\n string_features=['string_feature_column_names'], # Optional, defaults to None\n data_directory='path/to/ensembleset/data' # Optional, defaults to ./data\n)\n\ndata_ensemble.make_datasets(\n n_datasets=10, # Required\n fraction_features=0.1, # Required\n n_steps=5 # Required\n)\n```\n\nThe above call to `make_datasets()` will generate 10 different datasets using a random sequence of 5 feature engineering techniques applied to a randomly selected 10% of features. The feature selection is re-calculated after each feature engineering step. Each feature engineering step is applied to the test set if one is provided with a minimum of data leakage (e.g. gaussian KDE is calculated from training data only and then applied to training and testing data).\n\nBy default, generated datasets will be saved to HDF5 in `data/dataset.h5` using the following structure:\n\n```text\ndataset.h5\n├──train\n│ ├── labels\n| ├── 1\n| ├── .\n| ├── .\n| ├── .\n| └── n\n│\n└──test\n ├── labels\n ├── 1\n ├── .\n ├── .\n ├── .\n └── n\n```\n\n## 3. Feature engineering\n\nThe currently implemented pool of feature engineering methods are:\n\n1. **One-hot encoding** for string features\n2. **Ordinal encoding** for string features\n3. **Log features** with bases 2, e or 10\n4. **Ratio features**\n5. **Exponential features** with base 2 or e\n6. **Sum features** with 2, 3, or 4\n7. **Difference features** with 2, 3 or 4 subtrahends\n8. **Polynomial features** with degree 2 or 3\n9. **Spline features** with degree 2, 3 or 4\n10. **Quantized features** with using randomly selected k-bins\n11. **Smoothed features** with gaussian kernel density estimation\n\nMajor feature engineering parameters are also randomly selected for each step.\n\n"
292
+ },
293
+ {
294
+ "name": "ds9-course-materials",
295
+ "description": "Extra course materials for 4Geeks data science bootcamp cohort 9.",
296
+ "language": "Jupyter Notebook",
297
+ "stars": 1,
298
+ "forks": 3,
299
+ "updated_at": "2025-05-09T22:26:01Z",
300
+ "created_at": "2025-02-28T19:36:22Z",
301
+ "html_url": "https://github.com/gperdrizet/ds9-course-materials",
302
+ "topics": [],
303
+ "size": 3551,
304
+ "readme": ""
305
+ },
306
+ {
307
+ "name": "longer-limbs",
308
+ "description": "Wrapper module for SciKit-Lean tree-based estimators, falls back to linear regression for predictions outside of training data range.",
309
+ "language": "Python",
310
+ "stars": 1,
311
+ "forks": 0,
312
+ "updated_at": "2025-05-07T23:25:51Z",
313
+ "created_at": "2025-05-06T12:49:05Z",
314
+ "html_url": "https://github.com/gperdrizet/longer-limbs",
315
+ "topics": [],
316
+ "size": 540,
317
+ "readme": "# longer-limbs\nWrapper for SciKit-learn tree-based estimators providing linear regression fallback for inputs outside of training data range.\n\n## Instructions\n\nInstall longer-limbs with:\n\n```bash\npip install longer-limbs\n```\n\nLonger-limbs wraps SciKit-learn's `GradientBoostingRegressor()`. It offers identical `.fit()` and `.predict()` methods. To adapt code which currently uses pure SciKit-learn, change the import of `GradientBoostingRegressor()` from:\n\n```python\nfrom sklearn.ensemble import GradientBoostingRegressor\n```\n\nto:\n\n```python\nfrom longer_limbs.regressors import GradientBoostingRegressor\n```\n\n## Usage\n\nSee the [example regression notebook](https://github.com/gperdrizet/longer-limbs/blob/main/examples/regression.ipynb) for usage demonstration and comparison to SciKit-learn."
318
+ },
319
+ {
320
+ "name": "image-classification",
321
+ "description": "Image classification with convolutional neural networks in TensorFlow.",
322
+ "language": "Jupyter Notebook",
323
+ "stars": 0,
324
+ "forks": 0,
325
+ "updated_at": "2025-04-04T02:20:41Z",
326
+ "created_at": "2025-04-04T00:22:23Z",
327
+ "html_url": "https://github.com/gperdrizet/image-classification",
328
+ "topics": [],
329
+ "size": 8777,
330
+ "readme": ""
331
+ },
332
+ {
333
+ "name": "SQL_client_server",
334
+ "description": "Demonstration of SQL client server interactions using GitHub Codespaces.",
335
+ "language": null,
336
+ "stars": 0,
337
+ "forks": 0,
338
+ "updated_at": "2025-03-17T02:09:02Z",
339
+ "created_at": "2025-03-17T02:08:36Z",
340
+ "html_url": "https://github.com/gperdrizet/SQL_client_server",
341
+ "topics": [],
342
+ "size": 15,
343
+ "readme": "# SQL client server\nDemonstration of SQL client server interactions using GitHub Codespaces.\n"
344
+ },
345
+ {
346
+ "name": "HSCT_survival",
347
+ "description": "Kaggle competition: CIBMTR - Equity in post-HCT Survival Predictions",
348
+ "language": "Jupyter Notebook",
349
+ "stars": 0,
350
+ "forks": 0,
351
+ "updated_at": "2025-03-06T15:00:50Z",
352
+ "created_at": "2025-02-04T14:36:28Z",
353
+ "html_url": "https://github.com/gperdrizet/HSCT_survival",
354
+ "topics": [],
355
+ "size": 204179,
356
+ "readme": ""
357
+ },
358
+ {
359
+ "name": "gperdrizet-data-preprocessing-project-tutorial",
360
+ "description": null,
361
+ "language": "Jupyter Notebook",
362
+ "stars": 2,
363
+ "forks": 4,
364
+ "updated_at": "2025-03-05T02:31:12Z",
365
+ "created_at": "2025-02-12T21:51:25Z",
366
+ "html_url": "https://github.com/gperdrizet/gperdrizet-data-preprocessing-project-tutorial",
367
+ "topics": [],
368
+ "size": 18995,
369
+ "readme": ""
370
+ },
371
+ {
372
+ "name": "bartleby",
373
+ "description": "LLM writing assistant and chatbot using HuggingFace.",
374
+ "language": "Python",
375
+ "stars": 8,
376
+ "forks": 2,
377
+ "updated_at": "2025-02-16T20:50:44Z",
378
+ "created_at": "2023-11-10T18:00:28Z",
379
+ "html_url": "https://github.com/gperdrizet/bartleby",
380
+ "topics": [
381
+ "chatbot",
382
+ "discord",
383
+ "discord-bot",
384
+ "discord-py",
385
+ "huggingface",
386
+ "llm",
387
+ "matrix-protocol"
388
+ ],
389
+ "size": 50001,
390
+ "readme": ""
391
+ },
392
+ {
393
+ "name": "PUBSUM",
394
+ "description": "National Library of Medicine PubMed Open Access Collection SQL database creation and LLM based publication abstract summarization.",
395
+ "language": "Jupyter Notebook",
396
+ "stars": 1,
397
+ "forks": 1,
398
+ "updated_at": "2025-02-05T23:35:34Z",
399
+ "created_at": "2023-11-10T19:00:16Z",
400
+ "html_url": "https://github.com/gperdrizet/PUBSUM",
401
+ "topics": [],
402
+ "size": 6094,
403
+ "readme": "# PUBSUM: PUBMED Open Access article abstract summarization\n\nThe project goal is to provide high level summaries of current biomedical scientific findings which span multiple publications (think automatic literature reviews). To accomplish this the plan is to build an API which gives access to plain english summaries of new scientific publications added to the National Library of Medicine's Pub Med Central Open Access collection. Ideally, these summaries would span a publication cycle or more of a specific journal, journals or topic area and present developments in that scientific area.\n\n## Progress\n\n1. Demonstrated proof-of-concept scientific abstract summarization and model fine tuning using Huggingface and the haining/scientific_abstract_simplification model.\n2. Created in house SQL database containing article metadata and text abstracts for all 3.68 million articles in the PUBMED Central Open Access Collection.\n3. Started work on summarizing all or as many of those articles as possible.\n"
404
+ },
405
+ {
406
+ "name": "firecast.ai",
407
+ "description": "Predicts wildfire ignition risk in California from weather data",
408
+ "language": "Jupyter Notebook",
409
+ "stars": 3,
410
+ "forks": 1,
411
+ "updated_at": "2025-02-01T16:10:11Z",
412
+ "created_at": "2020-05-25T20:31:00Z",
413
+ "html_url": "https://github.com/gperdrizet/firecast.ai",
414
+ "topics": [],
415
+ "size": 60665,
416
+ "readme": ""
417
+ },
418
+ {
419
+ "name": "skylines",
420
+ "description": "Custom designed, de novo trained, generative adversarial convolutional neural network. Creating mechanically imagined city skylines.",
421
+ "language": "Python",
422
+ "stars": 1,
423
+ "forks": 0,
424
+ "updated_at": "2024-08-22T14:35:30Z",
425
+ "created_at": "2024-02-07T15:35:47Z",
426
+ "html_url": "https://github.com/gperdrizet/skylines",
427
+ "topics": [
428
+ "convolutional-neural-networks",
429
+ "generative-adversarial-network",
430
+ "generative-art",
431
+ "machine-learning",
432
+ "tensorflow"
433
+ ],
434
+ "size": 2818956,
435
+ "readme": ""
436
+ },
437
+ {
438
+ "name": "SQL_with_spark",
439
+ "description": "Springboard Unit 5.6 miniproject: SQL at Scale with Spark",
440
+ "language": "Jupyter Notebook",
441
+ "stars": 1,
442
+ "forks": 1,
443
+ "updated_at": "2023-05-24T14:30:42Z",
444
+ "created_at": "2019-10-26T02:55:24Z",
445
+ "html_url": "https://github.com/gperdrizet/SQL_with_spark",
446
+ "topics": [],
447
+ "size": 47,
448
+ "readme": ""
449
+ },
450
+ {
451
+ "name": "data_wrangling_at_scale_with_spark",
452
+ "description": "Springboard Unit 5.8 miniproject: Data Wrangling at Scale with Spark",
453
+ "language": "Jupyter Notebook",
454
+ "stars": 1,
455
+ "forks": 0,
456
+ "updated_at": "2023-05-24T14:30:39Z",
457
+ "created_at": "2019-11-25T01:29:40Z",
458
+ "html_url": "https://github.com/gperdrizet/data_wrangling_at_scale_with_spark",
459
+ "topics": [],
460
+ "size": 36530,
461
+ "readme": ""
462
+ },
463
+ {
464
+ "name": "linear_regression",
465
+ "description": "Springboard Unit 8.1 miniproject: Linear Regression",
466
+ "language": "Jupyter Notebook",
467
+ "stars": 1,
468
+ "forks": 0,
469
+ "updated_at": "2023-05-24T14:30:36Z",
470
+ "created_at": "2019-11-26T23:53:04Z",
471
+ "html_url": "https://github.com/gperdrizet/linear_regression",
472
+ "topics": [],
473
+ "size": 6382,
474
+ "readme": ""
475
+ },
476
+ {
477
+ "name": "logistic_regression",
478
+ "description": "Springboard unit 8.1 miniproject: logistic regression",
479
+ "language": "Jupyter Notebook",
480
+ "stars": 1,
481
+ "forks": 0,
482
+ "updated_at": "2023-05-24T14:30:33Z",
483
+ "created_at": "2019-12-23T20:43:44Z",
484
+ "html_url": "https://github.com/gperdrizet/logistic_regression",
485
+ "topics": [],
486
+ "size": 2309,
487
+ "readme": ""
488
+ },
489
+ {
490
+ "name": "tree-based_algorithms",
491
+ "description": "Springboard unit 8.2 miniproject: tree-based algorithms",
492
+ "language": "Jupyter Notebook",
493
+ "stars": 1,
494
+ "forks": 0,
495
+ "updated_at": "2023-05-24T14:30:30Z",
496
+ "created_at": "2020-01-07T21:21:50Z",
497
+ "html_url": "https://github.com/gperdrizet/tree-based_algorithms",
498
+ "topics": [],
499
+ "size": 4926,
500
+ "readme": ""
501
+ },
502
+ {
503
+ "name": "clustering",
504
+ "description": "Springboard unit 8.2 miniproject: clustering",
505
+ "language": "Jupyter Notebook",
506
+ "stars": 1,
507
+ "forks": 0,
508
+ "updated_at": "2023-05-24T14:30:28Z",
509
+ "created_at": "2020-01-20T22:27:19Z",
510
+ "html_url": "https://github.com/gperdrizet/clustering",
511
+ "topics": [],
512
+ "size": 1991,
513
+ "readme": ""
514
+ },
515
+ {
516
+ "name": "PandasFromTheInside",
517
+ "description": "Springboard unit 9: pandas from the inside",
518
+ "language": null,
519
+ "stars": 1,
520
+ "forks": 0,
521
+ "updated_at": "2023-05-24T14:30:24Z",
522
+ "created_at": "2020-03-31T21:13:38Z",
523
+ "html_url": "https://github.com/gperdrizet/PandasFromTheInside",
524
+ "topics": [],
525
+ "size": 0,
526
+ "readme": ""
527
+ },
528
+ {
529
+ "name": "sparkML",
530
+ "description": "Springboard unit 9.3 miniproject: scalable ml with SparkML",
531
+ "language": "Jupyter Notebook",
532
+ "stars": 1,
533
+ "forks": 0,
534
+ "updated_at": "2023-05-24T14:30:18Z",
535
+ "created_at": "2020-04-01T18:58:50Z",
536
+ "html_url": "https://github.com/gperdrizet/sparkML",
537
+ "topics": [],
538
+ "size": 537,
539
+ "readme": ""
540
+ },
541
+ {
542
+ "name": "gansformer",
543
+ "description": "Generative Adversarial Transformers",
544
+ "language": "Python",
545
+ "stars": 1,
546
+ "forks": 0,
547
+ "updated_at": "2023-05-24T14:29:59Z",
548
+ "created_at": "2021-05-03T03:56:27Z",
549
+ "html_url": "https://github.com/gperdrizet/gansformer",
550
+ "topics": [],
551
+ "size": 836,
552
+ "readme": "[![PWC](https://img.shields.io/endpoint.svg?style=plastic&url=https://paperswithcode.com/badge/generative-adversarial-transformers/image-generation-on-clevr)](https://paperswithcode.com/sota/image-generation-on-clevr?p=generative-adversarial-transformers)\n[![PWC](https://img.shields.io/endpoint.svg?style=plastic&url=https://paperswithcode.com/badge/generative-adversarial-transformers/image-generation-on-cityscapes)](https://paperswithcode.com/sota/image-generation-on-cityscapes?p=generative-adversarial-transformers)\n[![PWC](https://img.shields.io/endpoint.svg?style=plastic&url=https://paperswithcode.com/badge/generative-adversarial-transformers/image-generation-on-lsun-bedroom-256-x-256)](https://paperswithcode.com/sota/image-generation-on-lsun-bedroom-256-x-256?p=generative-adversarial-transformers)\n\n![Python 3.7](https://img.shields.io/badge/python-3.7-blueviolet.svg?style=plastic)\n![TensorFlow 1.10](https://img.shields.io/badge/tensorflow-1.14-2545e6.svg?style=plastic)\n![cuDNN 7.3.1](https://img.shields.io/badge/cudnn-10.0-b0071e.svg?style=plastic)\n![License CC BY-NC](https://img.shields.io/badge/license-MIT-05b502.svg?style=plastic)\n\n# GANsformer: Generative Adversarial Transformers\n<p align=\"center\">\n <b><a href=\"https://cs.stanford.edu/~dorarad/\">Drew A. Hudson</a>* & <a href=\"http://larryzitnick.org/\">C. Lawrence Zitnick</a></b></span>\n</p>\n\n*_I wish to thank [Christopher D. Manning](https://nlp.stanford.edu/~manning/) for the fruitful discussions and constructive feedback in developing the Bipartite Transformer, especially when explored within the language representation area and also in the visual context, as well as for providing the kind financial support that allowed this work to happen!_ :sunflower:\n\n<div align=\"center\">\n <img src=\"https://cs.stanford.edu/people/dorarad/image1.png\" style=\"float:left\" width=\"340px\">\n <img src=\"https://cs.stanford.edu/people/dorarad/image3.png\" style=\"float:right\" width=\"440px\">\n</div>\n<p></p>\n\nThis is an implementation of the [GANsformer](https://arxiv.org/pdf/2103.01209.pdf) model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a _bipartite structure_ that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. \nThe model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. \nIn contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network.\n\n<img align=\"right\" src=\"https://cs.stanford.edu/people/dorarad/img3.png\" width=\"270px\">\n\n**Paper**: [https://arxiv.org/pdf/2103.01209](https://arxiv.org/pdf/2103.01209) \n**Contact**: [email protected] \n**Implementation**: [`network.py`](training/network.py)\n\n### Update: All code is now ready!\n\n:white_check_mark: Uploading initial code and readme \n:white_check_mark: Image sampling and visualization script \n:white_check_mark: Code clean-up and refacotiring, adding documentation \n:white_check_mark: Training and data-prepreation intructions \n:white_check_mark: Pretrained networks for all datasets \n:white_check_mark: Extra visualizations and evaluations <!--Extra visualizations/animations and evaluation-->\n\nIf you experience any issues or have suggestions for improvements or extensions, feel free to contact me either thourgh the issues page or at [email protected]. \n\n## Bibtex\n```bibtex\n@article{hudson2021gansformer,\n title={Generative Adversarial Transformers},\n author={Hudson, Drew A and Zitnick, C. Lawrence},\n journal={arXiv preprint:2103.01209},\n year={2021}\n}\n```\n\n## Sample Images\nUsing the pre-trained models (generated after training for ***5-7x*** less steps than StyleGAN2 models! Training our models for longer will improve the image quality further):\n<div align=\"center\">\n <img src=\"https://cs.stanford.edu/people/dorarad/samples.png\" width=\"700px\">\n</div>\n\n## Requirements\n<img align=\"right\" src=\"https://cs.stanford.edu/people/dorarad/dia.png\" width=\"190px\">\n\n- Python 3.6 or 3.7 are supported.\n- We recommend TensorFlow 1.14 which was used for development, but TensorFlow 1.15 is also supported.\n- The code was tested with CUDA 10.0 toolkit and cuDNN 7.5.\n- We have performed experiments on Titan V GPU. We assume 12GB of GPU memory (more memory can expedite training).\n- See [`requirements.txt`](requirements.txt) for the required python packages and run `pip install -r requirements.txt` to install them.\n\n## Quickstart & Overview\n\nA minimal example of using a pre-trained GANsformer can be found at [`generate.py`](generate.py). When executed, the 10-lines program downloads a pre-trained modle and uses it to generate some images:\n```python\npython generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 32\n```\nYou can use `--truncation-psi` to control the generated images quality/diversity trade-off. \nWe recommend setting it to values in the range of `0.6-1.0`.\n\nWe currently provide pretrained models for resolution 256&times;256 but keep training them and will release newer checkpoints as well as pretrained models for resolution 1024&times;1024 soon!\n\nWe can train and evaluate new or pretrained model both quantitatively and qualitative with [`run_netowrk.py`](run_network.py). \nThe model architecutre can be found at [`network.py`](training/network.py). The training procedure is implemented at [`training_loop.py`](training/training_loop.py).\n\n## Data preparation\nWe explored the GANsformer model on 4 datasets for images and scenes: [CLEVR](https://cs.stanford.edu/people/jcjohns/clevr/), [LSUN-Bedrooms](https://www.yf.io/p/lsun), [Cityscapes](https://www.cityscapes-dataset.com/) and [FFHQ](https://github.com/NVlabs/ffhq-dataset). The model can be trained on other datasets as well.\nWe trained the model on `256x256` resolution. Higher resolutions are supported too. The model will automatically adapt to the resolution of the images in the dataset.\n\nThe [`prepare_data.py`](prepare_data.py) can either prepare the datasets from our catalog or create new datasets.\n\n### Default Datasets \nTo prepare the datasets from the catalog, run the following command:\n```python\npython prepare_data.py --ffhq --cityscapes --clevr --bedrooms --max-images 100000\n```\n\nSee table below for details about the datasets in the catalog.\n\n**Useful options**: \n* `--data-dir` the output data directory (default: `datasets`) \n* `--shards-num` to select the number of shards for the data (default: adapted to each dataset) \n* `--max-images` to store only a subset of the dataset, in order to reduce the size of the stored `tfrecord` files (default: _max_). \nThis can be particularly useful to save space in case of large datasets, such as LSUN-bedrooms (originaly contains 3M images)\n\n### Custom Datasets\nYou can also use the script to create new custom datasets. For instance:\n```python\npython prepare_data.py --task <dataset-name> --images-dir <source-dir> --format png --ratio 0.7 --shards-num 5\n```\nThe script supports several formats: `png`, `jpg`, `npy`, `hdf5`, `tfds` and `lmdb`.\n\n### Dataset Catalog\n| Dataset | # Images | Resolution | Dowhnload Size | TFrecords Size | Gamma | \n| :---------------: | :-------: | :-----------: | :------------: | :--------------: | :---: |\n| **FFHQ** | 70,000 | 256&times;256 | 13GB | 13GB | 10 |\n| **CLEVR** | 100,015 | 256&times;256 | 18GB | 15.5GB | 40 |\n| **Cityscapes** | 24,998 | 256&times;256 | 1.8GB | 8GB | 20 |\n| **LSUN-Bedrooms** | 3,033,042 | 256&times;256 | 42.8GB | Up to 480GB | 100 |\n\nUse `--max-images` to reduce the size of the `tfrecord` files.\n\n## Training\nModels are trained by using the `--train` option. To fine-tune a pretrained GANsformer model:\n```python\npython run_network.py --train --gpus 0 --gansformer-default --expname clevr-pretrained --dataset clevr \\\n --pretrained-pkl gdrive:clevr-snapshot.pkl\n```\nWe provide pretrained models for `bedrooms`, `cityscapes`, `clevr` and `ffhq`.\n\nTo train a GANsformer in its default configuration form scratch:\n```python\npython run_network.py --train --gpus 0 --gansformer-default --expname clevr-scratch --dataset clevr\n```\n\nBy defualt, models training is resumed from the latest snapshot. Use `--restart` to strat a new experiment, or `--pretrained-pkl` to select a particular snapshot to load.\n\nFor comparing to state-of-the-art, we compute metric scores using 50,000 sample imaegs. To expedite training though, we recommend settings `--eval-images-num` to a lower number. Note though that this can impact the precision of the metrics, so we recommend using a lower value during training, and increasing it back up in the final evaluation.\n\nWe support a large variety of command-line options to adjust the model, training, and evaluation. Run `python run_network.py -h` for the full list of options!\n\nwe recommend exploring different values for `--gamma` when training on new datasets. If you train on resolution >= 512 and observe OOM issues, consider reducing `--minibatch-size` to a lower value.\n\n### Logging\n* During training, sample images and attention maps will be generated and stored at results/<expname>-<run-id> (`--keep-samples`).\n* Metrics will also be regularly commputed and reported in a `metric-<name>.txt` file. `--metrics` can be set to `fid` for FID, `is` for Inception Score and `pr` for Precision/Recall.\n* Tensorboard logs are also created (`--summarize`) that track the metrics, loss values for the generator and discriminator, and other useful statistics over the course of training.\n\n### Baseline models\nThe codebase suppors multiple baselines in addition to the GANsformer. For instance, to run a vanilla GAN model:\n```python\npython run_network.py --train --gpus 0 --baseline GAN --expname clevr-gan --dataset clevr \n```\n* **[Vanialla GAN](https://arxiv.org/abs/1406.2661)**: `--baseline GAN`, a standard GAN without style modulation.\n* **[StyleGAN2](https://arxiv.org/abs/1912.04958)**: `--baseline StyleGAN2`, with one global latent that modulates the image features.\n* **[k-GAN](https://arxiv.org/abs/1810.10340)**: `--baseline kGAN`, which generates multiple image layers independetly and then merge them into one shared image.\n* **[SAGAN]()**: `--baseline SAGAN`, which performs self-attention between all image features in low-resolution layer (e.g. `32x32`).\n\n## Evaluation\nTo evalute a model, use the `--eval` option:\n```python\npython run_network.py --eval --gpus 0 --expname clevr-exp --dataset clevr\n```\nAdd `--pretrained-pkl gdrive:<dataset>-snapshot.pkl` to evalute a pretrained model.\n\nBelow we provide the FID-50k scores for the GANsformer (_using the pretrained checkpoints above_) as well as baseline models. \nNote that these scores are different than the scores reported in the StyleGAN2 paper since they run experiments for up to 7x more training steps (5k-15k kimg-steps in our experiments over all models, which takes about 3-4 days with 4 GPUs, vs 50-70k kimg-steps in their experiments, which take over 90 GPU-days).\n\n| Model | CLEVR | LSUN-Bedroom | FFHQ | Cityscapes |\n| :------------: | :----------: | :----------: | :--------: | :--------: |\n| **GAN** | 25.02 | 12.16 | 13.18 | 11.57 |\n| **kGAN** | 28.28 | 69.9 | 61.14 | 51.08 |\n| **SAGAN** | 26.04 | 14.06 | 16.21 | 12.81 |\n| **StyleGAN2** | 16.05 | 11.53 | 16.21 | 8.35 |\n| **VQGAN** | 32.60 | 59.63 | 63.12 | 173.80 |\n| **GANsformer** | ***9.24*** | ***6.15*** | ***7.42*** | ***5.23*** |\n\n<div>\n <img src=\"https://cs.stanford.edu/people/dorarad/plot1.png\" width=\"350px\">\n <img src=\"https://cs.stanford.edu/people/dorarad/plot2.png\" width=\"350px\">\n</div>\n\n### Model Change-log\nCompared to the original GANsformer depicted in the paper, this repository make several additional improvments that contributed to the performance:\n* Use `--mapping_ltnt2ltnt` so that the latents communicate with each other directly through self-attention inside the mapping network before starting to generate the image.\n* Add an additional global latent (`--style`) to the `k` latent components, such that first the global latent modulates all the image features uniformly, and then the `k` latents modulate different regions based on the bipartite transformer's attention. \nThe global latent is useful for coordinating holistic aspects of the image such as global lighting conditions, global style properties for e.g. faces, etc.\n* After making these changes, we observed no additional benefit from adding the transformer to the discriminator, and therefore for simplicity we disabled that.\n\n## Visualization\nThe code supports producing qualitative results and visualizations. For instance, to create attention maps for each layer:\n```python\npython run_network.py --gpus 0 --eval --expname clevr-exp --dataset clevr --vis-layer-maps\n```\n\nBelow you can see sample images and attention maps produced by the GANsformer:\n\n<div align=\"center\">\n <img src=\"https://cs.stanford.edu/people/dorarad/atts.png\" style=\"float:left\" width=\"831px\">\n</div>\n\n## Command-line Options\nIn the following we list some of the most useful model options. \n\n### Training\n* `--gamma`: We recommend exploring different values for the chosen dataset (default: `10`)\n* `--truncation-psi`: Controls the image quality/diversity trade-off. (default: `0.7`)\n* `--eval-images-num`: Number of images to compute metrics over. We recommend selecting a lower number to expedite training (default: `50,000`)\n* `--restart`: To restart training from sracth instead of resuming from the latest snapshot\n* `--pretrained-pkl`: To load a pretrained model, either a local one or from drive `gdrive:<dataset>-snapshot.pkl` for the datasets in the catalog.\n* `--data-dir` and `--result-dir`: Directory names for the datasets (`tfrecords`) and logging/results.\n\n### Model (most useful)\n* `--transformer`: To add transformer layers to the generator (GANsformer)\n* `--components-num`: Number of latent components, which will attend to the image. We recommend values in the range of `8-16` (default: `1`)\n* `--latent-size`: Overall latent size (default: `512`). The size of each latent component will then be `latent_size/components_num`\n* `--num-heads`: Number of attention heads (default: `1`)\n* `--integration`: Integration of information in the transformer layer, e.g. `add` or `mul` (default: `mul`)\n\n### Model (others)\n* `--g-start-res` and `--g-end-res`: Start and end resolution for the transformer layers (default: all layers up to resolution 2<sup>8</sup>) \n* `--kmeans`: Track and update image-to-latents assignment centroids, used in the duplex attention\n* `--mapping-ltnt2ltnt`: Perform self-attention over latents in the mapping network\n* `--use-pos`: Use trainable positional encodings for the latents.\n* `--style False`: To turn-off one-vector global style modulation (StyleGAN2).\n\n### Visualization\n* **Sample imaegs**\n * `--vis-images`: Generate image samples \n * `--vis-latents`: Save source latent vectors\n* **Attention maps**\n * `--vis-maps`: Visualize attention maps of last layer and first head\n * `--vis-layer-maps`: Visualize attention maps of all layer and heads\n * `--blending-alpha`: Alpha weight when visualizing a bledning of images and attention maps\n* **Image interpolations**\n * `--vis-interpolations`: Generative interplations between pairs of source latents\n * `--interpolation-density`: Number of samples in between two end points of an interpolation (default: `8`)\n* **Others**\n * `--vis-noise-var`: Create noise variation visualization\n * `--vis-style-mix`: Create style mixing visualization\n\nRun `python run_network.py -h` for the full options list.\n\n## Sample images (more examples)\n<div align=\"center\">\n <img src=\"https://cs.stanford.edu/people/dorarad/faces.png\" style=\"float:left\" width=\"750px\">\n <br>\n <img src=\"https://cs.stanford.edu/people/dorarad/bedroom.png\" style=\"float:left\" width=\"750px\">\n <br>\n <img src=\"https://cs.stanford.edu/people/dorarad/clevr_new.png\" style=\"float:left\" width=\"750px\">\n <br>\n <img src=\"https://cs.stanford.edu/people/dorarad/cities_small.png\" style=\"float:left\" width=\"750px\">\n</div>\n\n## CUDA / Installation\nThe model relies on custom TensorFlow ops that are compiled on the fly using [NVCC](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html). \n\nTo set up the environment e.g. for cuda-10.0:\n```python\nexport PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}\n```\n\nTo test that your NVCC installation is working correctly, run:\n```python\nnvcc test_nvcc.cu -o test_nvcc -run\n| CPU says hello.\n| GPU says hello.\n```\n\n## Architecture Overview\nThe GANsformer consists of two networks:\n\n**Generator**: which produces the images (`x`) given randomly sampled latents (`z`). The latent z has a shape `[batch_size, component_num, latent_dim]`, where `component_num = 1` by default (Vanilla GAN, StyleGAN) but is > 1 for the GANsformer model. We can define the latent components by splitting `z` along the second dimension to obtain `z_1,...,z_k` latent components. The generator likewise consists of two parts:\n* **Mapping network**: converts sampled latents from a normal distribution (`z`) to the intermediate space (`w`). A series of Feed-forward layers. The k latent components either are mapped independently from the `z` space to the `w` space or interact with each other through self-attention (optional flag).\n* **Synthesis network**: the intermediate latents w are used to guide the generation of new images. Images features begin from a small constant/sampled grid of `4x4`, and then go through multiple layers of convolution and up-sampling until reaching the desirable resolution (e.g. `256x256`). After each convolution, the image features are modulated (meaning that their variance and bias are controlled) by the intermediate latent vectors `w`. While in the StyleGAN model there is one global w vectors that controls all the features equally. The GANsformer uses attention so that the k latent components specialize to control different regions in the image to create it cooperatively, and therefore perform better especially in generating images depicting multi-object scenes.\n* **Attention** can be used in several ways\n * **Simplex Attention**: when attention is applied in one direction only from the latents to the image features (**top-down**).\n * **Duplex Attention**: when attention is applied in the two directions: latents to image features (**top-down**) and then image features back to latents (**bottom-up**), so that each representation informs the other iteratively.\n * **Self Attention between latents**: can also be used so to each direct interactions between the latents.\n * **Self Attention between image features** (SAGAN model): prior approaches used attention directly between the image features, but this method does not scale well due to the quadratic number of features which becomes very high for high-resolutions.\n \n**Discriminator**: Receives and image and has to predict whether it is real or fake – originating from the dataset or the generator. The model perform multiple layers of convolution and downsampling on the image, reducing the representation's resolution gradually until making final prediction. Optionally, attention can be incorporated into the discriminator as well where it has multiple (k) aggregator variables, that use attention to adaptively collect information from the image while being processed. We observe small improvements in model performance when attention is used in the discriminator, although note that most of the gain in using attention based on our observations arises from the generator.\n\n## Codebase\nThis codebase builds on top of and extends the great [StyleGAN2 repository](https://github.com/NVlabs/stylegan2) by Karras et al. \n\nThe GANsformer model can also be seen as a generalization of StyleGAN: while StyleGAN has one global latent vector that control the style of all image features globally, the GANsformer has *k* latent vectors, that cooperate through attention to control regions within the image, and thereby better modeling images of multi-object and compositional scenes.\n\nIf you have questions, comments or feedback, please feel free to contact me at [email protected], Thank you! :)\n"
553
+ },
554
+ {
555
+ "name": "direwolf-arch-rice",
556
+ "description": "🐺🍚 A guide to replicating my riced Arch Linux set-up.",
557
+ "language": null,
558
+ "stars": 1,
559
+ "forks": 0,
560
+ "updated_at": "2023-05-24T14:29:55Z",
561
+ "created_at": "2023-01-05T02:12:31Z",
562
+ "html_url": "https://github.com/gperdrizet/direwolf-arch-rice",
563
+ "topics": [],
564
+ "size": 13286,
565
+ "readme": "# Ricing Arch Linux\n\n[![Sparkline](https://stars.medv.io/ibrahimbutt/direwolf-arch-rice.svg)](https://stars.medv.io/ibrahimbutt/direwolf-arch-rice)\n\n## Foreword\n\n### Who is this guide for?\n\nThose who are interested in ricing or would like to know what it is, whether they are experienced Linux users or complete beginners.\n\nThose who want control over the way their desktop environment [DE] looks, far beyond the offerings of Windows and OS X.\n\nThose who dislike extra/unneeded features cluttering their DE. With ricing and Linux in general, you can keep what you want/need and remove everything else. This is especially helpful for older systems.\n\n### Hold up... \"ricing\"?\n\nIf the term confuses you, you aren't alone. You're probably thinking, what does rice have to do with computers, at all? Below is the definition of ricing taken from [r/unixporn](https://www.reddit.com/r/unixporn/):\n\n> \"Rice\" is a word that is commonly used to refer to making visual improvements and customizations on one's desktop. It was inherited from the practice of customizing cheap Asian import cars to make them appear to be faster than they actually were - which was also known as \"ricing\". Here on /r/unixporn, the word is accepted by the majority of the community and is used sparingly to refer to a visually attractive desktop upgraded beyond the default.\n\n## What You'll Be Creating Today\n\n![The Setup](https://github.com/IbrahimButt/Direwolf-Arch-Rice/blob/master/images/finishedsetup.png)\n\nThere's not a lot going on, right? Yeah, that was the whole point. I mostly use Firefox and Vim. I don't need much. It's my personal setup and what I'm using at the time of writing. If you want more, this guide will teach you the basics and provide a set-up to 'improve' on with your own needs in mind.\n\nVisit [r/unixporn](https://www.reddit.com/r/unixporn/) to see what others have created.\n\n### Overview of Setup\n\n#### Time Commitment\n\nYou should be done in an hour, however, it may take longer depending on your internet connection.\n\n#### Arch Linux\n\nIn a nutshell, [Arch](https://www.archlinux.org/) is an independently developed general-purpose GNU/Linux distribution. The main reason you would choose this over other distributions is that it comes with the bare minimum and zero bloat. This allows you to have a lean system from the beginning.\n\nIf you've heard of Arch, you may have heard the installation isn't so simple. You may even find it to put you off. Don't worry about that. [Anarchy Linux](https://anarchyinstaller.gitlab.io/) makes installation easy. The only difference is that Anarchy Linux has an installer.\n\nInstalling Arch manually is outside the scope of this guide. If you prefer to install it manually, visit the [installation guide](https://wiki.archlinux.org/index.php/installation_guide). Otherwise, use [Anarchy Linux](https://gitlab.com/anarchyinstaller/installer/-/releases).\n\n*Tip: To save time, download Arch/Anarchy Linux while you read on.*\n\n#### Window Manager\n\nWe will be using [i3](https://i3wm.org/) as our WM. It is a dynamic window tiling manager. This means, when a window is opened, it takes up the whole desktop. When you open another window, the new and existing one will be resized to be equal. This happens each time you open a new window. Mathematically, when two windows are open, each will take one-half of screen space. When a third window is opened, they'll each take one-third of screen space and so on. The same applies if they are opened vertically. Windows can be resized, arranged in tabs and stacks. They can also be floated, meaning you can move and resize windows how you would in Windows and OS X.\n\n![Example of i3WM tiling](https://github.com/IbrahimButt/Direwolf-Arch-Rice/blob/master/images/i3wm-example.png)\n\nYou can read the usage documentation [here](https://i3wm.org/docs/userguide.html#_using_i3).\n\n#### Package Installer\n\nBesides Pacman, the default package installer shipped with Arch. We will be installing [Yay](https://aur.archlinux.org/packages/yay):\n\n> Yay, yet another yogurt. Pacman wrapper and AUR helper written in go.\n\nAll you need to know for now is, it saves you a lot of time in the long-term. Without it, you would need to go through the manual build process for each package that can't be installed through Pacman. This is one of those things you wish you knew when you were starting out.\n\n#### Terminal Emulator\n\nWe'll be using rxvt-unicode, also known as urxvt. It's fast, lightweight and highly customizable. Furthermore, Wal can automatically apply a generated colorscheme to urxvt.\n\n#### Status Bar\n\nThe Polybar repository tells it best:\n\n> A fast and easy-to-use tool for creating status bars.\n>\n> Polybar aims to help users build beautiful and highly customizable status bars for their desktop environment, without the need of having a black belt in shell scripting. Here are a few screenshots showing you what it can look like:\n\nPolybar is modular. Meaning, if you want to see what workspace you're on and which ones have an open window, you add a module for said functionality. If you want to see the time and date, you add another module. The one I have configured and is included in this guide is very minimal, since I don't need other modules. For examples with more modules, visit the Polybar [repository](https://github.com/jaagr/polybar) and/or u/unixporn with a [restrcited search](https://www.reddit.com/r/unixporn/search?q=polybar&restrict_sr=on) to see what can be achieved.\n\n#### Application Launcher/Dynamic Menu and File Manager\n\nPersonally, I love application launchers. It makes your workflow noticeably more efficient, than if you were to go onto a list of applications and click on the one you need to open. We will be going with dmenu. A simple, fast and lightweight dynamic menu.\n\n[Ranger](https://github.com/ranger/ranger) is a Vim inspired CLI file-manager and is very quick to use once you get the hang of it. Besides, it can match your colour scheme. More on that later.\n\n![Dmenu and ranger in action](https://github.com/IbrahimButt/Direwolf-Arch-Rice/blob/master/images/ranger-dmenu.png)\n\n*Note: i3 by default does not have a feature where you can see all your applications.*\n\n#### Themeing\n\nTwo ways in which the colour scheme can be altered is through the .Xresources file and Wal. We will be using the Python version of Wal, called [Pywal](https://github.com/dylanaraps/pywal).\n\nTaken from the [Arch Wiki](https://wiki.archlinux.org/index.php/x_resources):\n\n> Xresources is a user-level configuration dotfile, typically located at ~/.Xresources. It can be used to set X resources, which are configuration parameters for X client applications.\n>\n> They can do many operations, including:\n> * defining terminal colours\n> * configuring terminal preferences\n> * setting DPI, antialiasing, hinting and other X font settings\n> ...\n\nTaken from the Pywal repository:\n> `wal` is a script that takes an image (or a directory of images), generates a colour scheme (using `imagemagick`) and then changes all of your open terminal's colours to the new colour scheme on the fly. wal then caches each generated colour scheme so that cycling through wallpapers while changing colour schemes is instantaneous.\n>\n> `wal` also merges the new colour scheme into the Xresources database so that programs on your system such as `Rofi` or `i3` use the new colours automatically. `wal` finally exports the colors into various formats so that you can use the colours in web pages, scripts, other programs etc.\n\nPolybar can also use the colour scheme generated by Wal if you configure it to.\n\n##### Fonts\n\nWe will be using [Overpass](http://overpassfont.org/) by [Red Hat](https://www.redhat.com/). It comes with 8 weight variants and a monospaced version, named Overpass Mono, which you can see in the status bar.\n\n![Overpass Font](https://github.com/IbrahimButt/Direwolf-Arch-Rice/blob/master/images/font.png)\n\n#### Neofetch\n\nTaken from the [Neofetch](https://github.com/dylanaraps/neofetch) repository:\n\n> Neofetch is a CLI system information tool written in BASH. Neofetch displays information about your system next to an image, your OS logo, or any ASCII file of your choice. The main purpose of Neofetch is to be used in screenshots to show other users what OS/Distro you're running, what Theme/Icons you're using etc.\n\nAlthough not necessary, I will be showing you how to work with Neofetch since it's so popular.\n\n#### Text Editor\n\nThroughout this guide, we'll be using [Vim](http://www.vim.org/), a powerful yet lightweight text editor. For those who don't know how to use it, I'll be including the commands needed to follow this guide.\n\n## Lets Get Cooking!\n\n### Getting Started\n\nFirstly, you need to install Arch. If you're doing the manual installation, the Arch guide will walk you through formatting your USB. For those using Anarchy Linux, see below on how to make a bootable USB depending on the OS you are currently using.\n\n#### Windows\n\nDownload [Rufus](https://rufus.akeo.ie/) and open it up. Select your USB and down in Format Options, press the button with the disk/hard-drive and select the ISO.\n\nRufus should now match what's in the below screenshot, with the exception of the \"Dvice\", \"New volume label\" and the ISO image information at the very bottom.\n\n![Rufus Setup](https://github.com/IbrahimButt/Direwolf-Arch-Rice/blob/dev/images/Rufus.PNG)\n\nWhen you're ready, press start. If are asked for permission to download additional files, allow it.\n\n#### OS X\n\nDownload and use [Etcher](https://etcher.io/). Select the ISO file and USB, then hit Flash.\n\n![Etcher Usage.](https://www.balena.io/static/steps-8006dca57323756b1b84fb9408742409.gif)\n\n#### Linux\n\n![RosaImageWriter](http://wiki.rosalab.ru/en/images/0/0b/RosaImageWriter-2.6-eng.png)\n\nDownload and execute RosaImageWriter with root permissions using `sudo ./RosaImageWriter` or in KDE, press on the execeutable.\n\n### Pre-Installation Notes\n\nFor the purpose of this guide, I will assume you are using 'netctl' for managing your wireless connection.\n\nNow go ahead and install Arch.\n\n### If You Already Have Arch Installed\n\nTo follow this guide, you'll need i3, rxvt-unicode and dmenu. Fire up your terminal and run `sudo pacman -S i3 rxvt-unicode dmenu vim`.\n\n### First Boot/Log-In\n\nIf you installed a login manager, make sure to select i3 as the desktop environment. For example, the gnome login manager has a small settings/cog icon that lets you do so. If you didn't install a graphical login manager, you'll see what appears to be a fullscreen terminal. Enter your username and press enter, then do the same with your password. Once you are logged in, type `startx` and press enter to launch i3.\n\nYou will be prompted to select the windows or alt key as a modifier. The modifier key is used for controlling the window manager. After this, select yes to creating a config file.\n\nOpen the terminal by pressing `mod+enter`, then run sudo wifi-menu to create a wireless profile and remember its name. Then run `sudo netctl enable <profile_name>`. This automatically connects you to wifi on each boot. Now run `reboot`.\n\n### Screen Resolution\n\nYour screen resolution may be incorrect. Run `xrandr` and identify your display. Then run `xrandr --output <source_name> --mode 2560x1440 --rate <refresh_rate>` For me it is `xrandr --output DP1-8 --mode 2560x1440 --rate 59.95`. If you have multiple monitors, check out the [documentation](https://wiki.archlinux.org/index.php/Xrandr). The xrandr setting isn't permanent for now, we'll get to that later.\n\n\n### Guide Dependencies\n\nBefore we get to the ricing, we need to install a few things first.\n\n#### Install Dmenu, Vim and Ranger\n\n`sudo pacman -S dmenu vim ranger`\n\nTo use Dmenu, press `mod+d`. Only packages that have a GUI will appear if selected through Dmenu, otherwise it'll seem as if it's not working. This is normal.\n\nTo Use Ranger, run `ranger`.\n\n#### Install Yay\n\n```\ncd ~\nmkdir -p /tmp/yay_install\ncd /tmp/yay_install\n\nsudo pacman -S base-devel\n\nsudo pacman -S expac yajl git\n\ngit clone https://aur.archlinux.org/yay.git\ncd yay\nmakepkg -si\n\ncd ~\nrm -rf /tmp/yay_install\n```\n\n#### Install Pywal\n\nPython 3.5 or above is required, so ensure it's installed by running `python -V`. If it isn't, install it: `pacaur -S python`.\n\nWhen you're good to go:\n```\nsudo pacman -S feh imagemagick python-pip python-pywal\n```\n*Note: You don't need to view package build. If you decide to view it, it'll be displayed in Vim. Type `:q` to exit Vim.*\n\n![Wallpaper](https://github.com/IbrahimButt/Direwolf-Arch-Rice/blob/master/images/wallpaper.jpg)\n\nRight click on the image above and save as `bg1.jpg`. Now do the following:\n```\ncd ~\nmkdir -p ~/Pictures/Wal/\nmv ~/Downloads/bg1.jpg ~/Pictures/Wal/\nwal -i ~/Pictures/Wal/bg1.jpg\n```\n\n#### Install Polybar\n\nFirst you'll need to install the dependencies and then Polybar itself:\n```\nsudo pacman -S cairo libxcb python2 xcb-proto xcb-util-image xcb-util-wm xcb-util-xrm jsoncpp\nyay -S polybar-git\n```\n\n#### Install Dot Files\n\n```\ncd ~\ngit clone https://github.com/IbrahimButt/direwolf-arch-rice.git\ncp -r ~/direwolf-arch-rice/.config/ ~/\n\ncp -r ~/direwolf-arch-rice/.Xresources ~/\nxrdb .Xresources\n```\nYou will need to run wal -i ~/Pictures/Wal/bg1.jpg again here, so Urxvt uses the colorscheme.\n\nRefresh i3 by pressing mod+r.\n\nOnly terminals and windows opened after this point will have those two changes applied to them.\n\n#### Install Fonts\n\n`yay -S otf-overpass`\n\nRefresh i3 to load changes.\n\n### Make Changes To i3 Config\nRead through the whole config file and understand what's happening. Change anything that's necessary. The comments will give you hints as to what you may want to change. Do not skip this step. It'll teach you how to use i3.\n\n### Preview Images In Ranger\n\nInstall w3m: `sudo pacman -S w3m`. Then run `vim ~/.config/ranger/rc.conf`. Read it and understand it. Lastly, run `ranger --copy-config=scope`.\n\nRun `ranger` in the terminal and use arrows keys to navigate. Make your way to `~/Pictures/Wal/bg1.jpg` and you should see a preview of it.\n\n### Neofetch System Info and Replace ASCII Logo With Image\n\n`neofetch --w3m --source ~/Pictures/Wal/bg1.jpg`\n\nTo customise what is displayed when you run `neofetch` or the above command, comment in/out lines in `~/.config/neofetch/config`.\n\n### Activate Polybar\n\n` polybar bar`\n\nGo into ranger and type `zh` to display hidden files. Then go to `~/.config/polybar/launch.sh`. Here you'll have a preview of the file. Read it to understand what is happening each time you boot/refresh i3. On line 5, replace `DPI-8` with the source name of your display connection from running `xrandr`.\n\n## Done!\n\nYour set up should be identical to mines now.\n\n## Known Issues\n\nThe xrandr setting needs to be set on each boot if you're using startx. Therefore, I've added it as an `exec_always` in the i3 config. Refresh i3 to apply it on each boot. I'm currently in the process of figuring this out. If you have any other issues, feel free to raise it on here..\n\n## Shameless Plug\n\nSee what I'm upto and my latest work, or say hello, on Twitter: [@madebyibrahim](https://twitter.com/madebyibrahim)\n\n\n"
566
+ },
567
+ {
568
+ "name": "seedscan",
569
+ "description": "Simple python utility using scanimage and ffmpeg to make long duration timelapse videos with a flatbed scanner.",
570
+ "language": "Python",
571
+ "stars": 1,
572
+ "forks": 0,
573
+ "updated_at": "2023-05-24T14:29:50Z",
574
+ "created_at": "2021-11-28T22:56:12Z",
575
+ "html_url": "https://github.com/gperdrizet/seedscan",
576
+ "topics": [],
577
+ "size": 19,
578
+ "readme": "# seedscan\nSimple python utility using scanimage and ffmpeg to make long duration timelapse videos with a flatbed scanner.\n\n## Setup notes\n### Scanner permissions\nBy default USB scanner can only be accessed by scanimage via sudo. To allow user acces, find the scanner's vendor and product hex IDs with **lsusb**. IDs are the two colon seperated values after 'ID'.\n```\n$ lsusb\n$ Bus 001 Device 002: ID 04b8:0110 Seiko Epson Corp. GT-8200U/GT-8200UF [Perfection 1650/1650 PHOTO]`\n```\nThen add the following to a file named **50-usb-epsonscanner.rules** (or something similar) in **/etc/udev/rules.d** using your vendor and product IDs.\n```\nSUBSYSTEM==\"usb\", ATTRS{idVendor}==\"04b8\", ATTR{idProduct}==\"0110\", MODE=\"0666\"\n```\nReboot and you should be able to use scanimage without sudo.\n\n### Cron\nScanns are triggered via a cron job. Add the following to the user's cronfile (i.e. **crontab -e**). A scan every 10 minutes seems like a good place to start, but this can be changed to fit the experiment.\n```\n*/10 * * * * python /path/to/seedscan/scan.py\n```\n\n### CircuitPython (for sensors)\nTo run the temp/humidity/pressure sensors, we need CircuitPython and the library for the sensor (AdaFruit MS8607)First. I am using a RasperryPi Zero W for which detailed instructions can be found here: [CircuitPython](https://learn.adafruit.com/circuitpython-on-raspberrypi-linux/installing-circuitpython-on-raspberry-pi), [MS8607 library](https://learn.adafruit.com/adafruit-te-ms8607-pht-sensor/python-circuitpython). Here is the short version.\n\nCheck that you are running python 3* and pip to match, then install CircuitPython:\n```\n$ sudo pip3 install --upgrade setuptools\n$ sudo pip3 install --upgrade adafruit-python-shell\n$ wget https://raw.githubusercontent.com/adafruit/Raspberry-Pi-Installer-Scripts/master/raspi-blinka.py\n$ sudo python3 raspi-blinka.py\n```\nNote: this will set python 3 as system wide default and requires a reboot to complete. Also, output indicates that pre-installing setuptools may be unnecessary.\n\nThen install the library for the MS8607:\n```\nsudo pip3 install adafruit-circuitpython-ms8607\n``` \nLast thing is to change permissions so that non-root users can access I2C devices:\n```\n$ sudo groupadd i2c\n$ sudo chown :i2c /dev/i2c-1\n$ sudo chmod g+rw /dev/i2c-1\n$ sudo usermod -aG i2c user\n```\nThen you should be able to access ic2-i withou t elevating privileges. Test is with:\n```\ni2cdetect -y 1\n```\n"
579
+ }
580
+ ]
tests/test_data/job_call.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"job_title": "AI/ML & Foundational Model Engineer", "company_description": "Neural is a forward-thinking AI company dedicated to building innovative AI solutions, driving innovation across dynamic industries by harnessing Artificial Intelligence and Machine Learning technologies.", "job_description": "Design, train, fine-tune, and deploy large-scale language and multimodal models for geospatial, aerospace, and mission-critical decision systems. Work on foundation model development, supporting capabilities like anomaly detection, autonomous reasoning, and dynamic knowledge graphs.", "key_skills": ["Transformer model architecture", "NLP", "Computer vision", "Machine learning workflows", "Model fine-tuning", "Data annotation", "Production model deployment", "Cross-functional collaboration"], "tools_technologies": ["PyTorch", "TensorFlow", "Hugging Face", "LangChain", "Label Studio", "Snorkel", "Vector databases"], "experience_level": "3-5+ years of hands-on AI/ML engineering experience", "education_requirements": "None specified"}
tests/test_data/linkedin_resume.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "contact_info": "Contact\[email protected]\nwww.linkedin.com/in/gperdrizet\n(LinkedIn)\ngithub.com/gperdrizet (Portfolio)\nTop Skills\nScientific Research\nMachine learning engineering\nApplied Machine Learning",
3
+ "certifications": "Certifications\nCreate ML Models with BigQuery ML\nSkill Badge\nHugging Face Agents Course\nMachine Learning Engineering\nCareer Track\nAI Agents Fundamentals\nEngineer Data for Predictive\nModeling with BigQuery ML Skill\nBadge\nHonors-Awards\nPoster Presentation Award\nBest presentation by a student\nmember\nMolecular Mechanisms of Cancer\nResearch Fellowship\nRuth L. Kirschstein National\nResearch Service Fellowship\nPublications\nDiscovering RNA-Protein\nInteractome by Using Chemical\nContext Profiling of the RNA-Protein\nInterface\nTranscriptional pausing coordinates\nfolding of the aptamer domain\nand the expression platform of a\nriboswitch\nEffects of iron depletion on\nEntamoeba histolytica alcohol\ndehydrogenase 2 (EhADH2) and\ntrophozoite growth: implications for\nantiamoebic therapyGeorge Perdrizet\nFounder | Machine Learning Engineer | Large Language Models\n(LLMs) | PhD in Biochemistry and Molecular Biology\nYonkers, New York, United States",
4
+ "summary": "Summary\nMachine learning engineer, research scientist and educator. Seeking\na collaborative environment in which to apply high level quantitative\nreasoning and cutting edge tools to solve problems with data. Ten\nyears experience in diverse data driven fields.",
5
+ "experience": "Experience\n4Geeks Academy\nSenior Data Science Mentor, Machine Learning Specialist\nNovember 2024 - Present (9 months)\nMiami, Florida, United States\nLed student teams in creating and deploying end-to-end machine learning\napplications.\nImproved open source curriculum by contributing new materials and solutions\nvia Git and GitHub.\nPrepared students from diverse backgrounds for employment by teaching and\ndemonstrating data science and machine learning tools and techniques.\nAsk Agatha\nFounder\nJuly 2024 - Present (1 year 1 month)\nNew York City Metropolitan Area\nReceived $25,000 is Google Cloud credits from the Google Cloud for Startups\nProgram.\nFinalist in Backdrop Build V5 cohort.\nDesigned, build and deployed novel algorithm to detect LLM generated text.\nLos Medanos College\nAdjunct Professor\nAugust 2017 - August 2022 (5 years 1 month)\nImproved student success rate from 75% to greater than 90% in\nundergraduate chemistry courses.\nContributed protocols, methods and quantitative assessment tools to in-house\nlab manual, helping to save over $20,000 annually in materials costs.\nEnhanced educational product by providing practical experience and\ntheoretical knowledge of experimentation, hypothesis development,\nquantitative problem solving and applying an analytical mindset.\nSupported student achievement by collaborating with cross-functional teams of\nfaculty, stockroom staff, student tutors and administration.",
6
+ "education": "University of Chicago\nDoctor of Philosophy - PhD, Biochemistry and Molecular\nBiology · (2008 - 2014)\nSpringboard\nMachine Learning Engineering Career Track · (2019 - 2020)\nRoger Williams University\nBachelor of Science - BS, Biology and Psychology · (2003 - 2008)"
7
+ }