Spaces:

holistic-ai
/

LibVulnWatch

Running

App Files Files Community

seonglae-holistic commited on 11 days ago

Commit

3c3ce5c

2 Parent(s): fdddab8 f198cca

merge: branch 'main' of https://huggingface.co/spaces/holistic-ai/LibVulnWatch

Browse files

Files changed (4) hide show

README.md +3 -1
app.py +1 -1
assessment-results/agent_development_kit.json +1 -1
src/about.py +32 -19

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ sdk: gradio
 app_file: app.py
 pinned: true
 license: mit
-short_description: Duplicate this leaderboard to initialize your own!
 sdk_version: 5.19.0
 ---
@@ -46,3 +46,5 @@ You'll find
 - the main table' columns names and properties in `src/display/utils.py`
 - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
 - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`

 app_file: app.py
 pinned: true
 license: mit
+short_description: Vulnerability scores for AI libraries (ACL '25, ICML '25)
 sdk_version: 5.19.0
 ---
 - the main table' columns names and properties in `src/display/utils.py`
 - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
 - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
+> **LibVulnWatch** was presented at the **ACL&nbsp;2025 Student Research Workshop** and accepted to the **ICML&nbsp;2025 Technical AI Governance workshop**. The system uncovers hidden security, licensing, maintenance, dependency and regulatory risks in popular AI libraries and publishes a public leaderboard for transparent ecosystem monitoring.

app.py CHANGED Viewed

@@ -255,7 +255,7 @@ with demo:
             citation_button = gr.Code(
                 value=CITATION_BUTTON_TEXT,
                 label=CITATION_BUTTON_LABEL,
-                lines=6,
                 elem_id="citation-button",
                 language="yaml",
             )

             citation_button = gr.Code(
                 value=CITATION_BUTTON_TEXT,
                 label=CITATION_BUTTON_LABEL,
+                lines=14,
                 elem_id="citation-button",
                 language="yaml",
             )

assessment-results/agent_development_kit.json CHANGED Viewed

@@ -8,7 +8,7 @@
     "last_updated": "2024-06-07T12:00:00Z",
     "active_maintenance": true,
     "independently_verified": true,
-    "report_url": "https://github.com/981526092/LibVulnWatch/raw/main/report/google_adk-python_v1.4.2.html",
     "repository_url": "https://github.com/google/adk-python",
     "github_stars": 3800,
     "license": "MIT",

     "last_updated": "2024-06-07T12:00:00Z",
     "active_maintenance": true,
     "independently_verified": true,
+    "report_url": "https://981526092.github.io/LibVulnWatch/google_adk-python_v1.4.2.html",
     "repository_url": "https://github.com/google/adk-python",
     "github_stars": 3800,
     "license": "MIT",

src/about.py CHANGED Viewed

@@ -28,28 +28,32 @@ TITLE = """<h1 align="center" id="space-title">LibVulnWatch: Vulnerability Asses
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-## Systematic Vulnerability Assessment and Leaderboard Tracking for Open-Source AI Libraries
-This leaderboard provides continuous vulnerability assessment for open-source AI libraries across five critical risk domains:
-- **License Validation**: Legal risks based on license type, compatibility, and requirements
-- **Security Assessment**: Vulnerability severity and patch responsiveness
-- **Maintenance Health**: Sustainability and governance practices
-- **Dependency Management**: Vulnerability inheritance and supply chain security
-- **Regulatory Compliance**: Compliance readiness for various frameworks
-Lower scores indicate fewer vulnerabilities and lower risk. The Trust Score is an equal-weighted average of all five domains, providing a balanced assessment of overall library trustworthiness.
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = """
-## How LibVulnWatch Works
-Our assessment methodology evaluates libraries through:
-1. **Static Analysis**: Code review, license parsing, and documentation examination
-2. **Dynamic Analysis**: Vulnerability scanning, dependency checking, and API testing
-3. **Metadata Analysis**: Repository metrics, contributor patterns, and release cadence
-Each library receives a risk score (0-10) in each domain, with lower scores indicating lower risk.
 """
 EVALUATION_QUEUE_TEXT = """
@@ -80,9 +84,18 @@ If your library shows as "FAILED" in the assessment queue, check that:
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
-CITATION_BUTTON_TEXT = r"""@article{LibVulnWatch2025,
-  title={LibVulnWatch: Systematic Vulnerability Assessment and Leaderboard Tracking for Open-Source AI Libraries},
-  author={First Author and Second Author},
-  journal={ICML 2025 Technical AI Governance Workshop},
-  year={2025}
 }"""

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+## LibVulnWatch – Continuous, Multi-Domain Risk Scoring for AI Libraries
+_As presented at the **ACL 2025 Student Research Workshop** and the **ICML 2025 Technical AI Governance (TAIG) workshop**_, LibVulnWatch provides an evidence-based, end-to-end pipeline that uncovers **hidden vulnerabilities** in open-source AI libraries across five governance-aligned domains:
+• **License Validation** – compatibility, provenance, obligations
+• **Security Assessment** – CVEs, patch latency, exploit primitives
+• **Maintenance Health** – bus-factor, release cadence, contributor diversity
+• **Dependency Management** – transitive risk, SBOM completeness
+• **Regulatory Compliance** – privacy/export controls, policy documentation
+In the paper we apply the framework to **20 popular libraries**, achieving **88 % coverage of OpenSSF Scorecard checks** and surfacing **up to 19 previously-unreported risks per library**.
+Lower scores indicate lower risk, and the **Trust Score** is the equal-weight average of the five domains.
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = """
+## Methodology at a Glance
+LibVulnWatch orchestrates a **graph of specialised agents** powered by large language models. Each agent contributes one evidence layer and writes structured findings to a shared memory:
+1️⃣ **Static agents** – licence parsing, secret scanning, call-graph reachability
+2️⃣ **Dynamic agents** – fuzzing harnesses, dependency-confusion probes, CVE replay
+3️⃣ **Metadata agents** – GitHub mining, release-cadence modelling, community health
+4️⃣ **Policy agents** – mapping evidence to NIST SSDF, EU AI Act, and related frameworks
+The aggregator agent converts raw findings into 0–10 scores per domain, producing a reproducible JSON result that is **88 % compatible with OpenSSF Scorecard checks**. All artefacts (SBOMs, logs, annotated evidence) are archived and linked in the public report.
 """
 EVALUATION_QUEUE_TEXT = """
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
+CITATION_BUTTON_TEXT = r"""@inproceedings{wu2025libvulnwatch,
+  title={LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source {AI} Libraries},
+  author={Zekun Wu and Seonglae Cho and Umar Mohammed and CRISTIAN ENRIQUE MUNOZ VILLALOBOS and Kleyton Da Costa and Xin Guan and Theo King and Ze Wang and Emre Kazim and Adriano Koshiyama},
+  booktitle={ACL 2025 Student Research Workshop},
+  year={2025},
+  url={https://openreview.net/forum?id=yQzYEAL0BT}
+}
+@inproceedings{anonymous2025libvulnwatch,
+  title={LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source {AI} Libraries},
+  author={Zekun Wu and Seonglae Cho and Umar Mohammed and CRISTIAN ENRIQUE MUNOZ VILLALOBOS and Kleyton Da Costa and Xin Guan and Theo King and Ze Wang and Emre Kazim and Adriano Koshiyama},
+  booktitle={ICML Workshop on Technical AI Governance (TAIG)},
+  year={2025},
+  url={https://openreview.net/forum?id=MHhrr8QHgR}
 }"""