Spaces:

holistic-ai
/

LibVulnWatch

Running

App Files Files Community

wu981526092 commited on 12 days ago

Commit

3342a22

1 Parent(s): 83290d2

Enrich About & README with details from ACL/ICML paper; fix YAML frontmatter

Browse files

Files changed (2) hide show

README.md +3 -1
src/about.py +18 -14

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ sdk: gradio
 app_file: app.py
 pinned: true
 license: mit
-short_description: Duplicate this leaderboard to initialize your own!
 sdk_version: 5.19.0
 ---
@@ -46,3 +46,5 @@ You'll find
 - the main table' columns names and properties in `src/display/utils.py`
 - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
 - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`

 app_file: app.py
 pinned: true
 license: mit
+short_description: Continuous multi-domain vulnerability assessment for open-source AI libraries (ACL '25 SRW & ICML '25 TAIG)
 sdk_version: 5.19.0
 ---
 - the main table' columns names and properties in `src/display/utils.py`
 - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
 - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
+> **LibVulnWatch** was presented at the **ACL&nbsp;2025 Student Research Workshop** and accepted to the **ICML&nbsp;2025 Technical AI Governance workshop**. The system uncovers hidden security, licensing, maintenance, dependency and regulatory risks in popular AI libraries and publishes a public leaderboard for transparent ecosystem monitoring.

src/about.py CHANGED Viewed

@@ -28,28 +28,32 @@ TITLE = """<h1 align="center" id="space-title">LibVulnWatch: Vulnerability Asses
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-## Systematic Vulnerability Assessment and Leaderboard Tracking for Open-Source AI Libraries
-This leaderboard provides continuous vulnerability assessment for open-source AI libraries across five critical risk domains:
-- **License Validation**: Legal risks based on license type, compatibility, and requirements
-- **Security Assessment**: Vulnerability severity and patch responsiveness
-- **Maintenance Health**: Sustainability and governance practices
-- **Dependency Management**: Vulnerability inheritance and supply chain security
-- **Regulatory Compliance**: Compliance readiness for various frameworks
-Lower scores indicate fewer vulnerabilities and lower risk. The Trust Score is an equal-weighted average of all five domains, providing a balanced assessment of overall library trustworthiness.
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = """
-## How LibVulnWatch Works
-Our assessment methodology evaluates libraries through:
-1. **Static Analysis**: Code review, license parsing, and documentation examination
-2. **Dynamic Analysis**: Vulnerability scanning, dependency checking, and API testing
-3. **Metadata Analysis**: Repository metrics, contributor patterns, and release cadence
-Each library receives a risk score (0-10) in each domain, with lower scores indicating lower risk.
 """
 EVALUATION_QUEUE_TEXT = """

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+## LibVulnWatch – Continuous, Multi-Domain Risk Scoring for AI Libraries
+_As presented at the **ACL 2025 Student Research Workshop** and the **ICML 2025 Technical AI Governance (TAIG) workshop**_, LibVulnWatch provides an evidence-based, end-to-end pipeline that uncovers **hidden vulnerabilities** in open-source AI libraries across five governance-aligned domains:
+• **License Validation** – compatibility, provenance, obligations
+• **Security Assessment** – CVEs, patch latency, exploit primitives
+• **Maintenance Health** – bus-factor, release cadence, contributor diversity
+• **Dependency Management** – transitive risk, SBOM completeness
+• **Regulatory Compliance** – privacy/export controls, policy documentation
+In the paper we apply the framework to **20 popular libraries**, achieving **88 % coverage of OpenSSF Scorecard checks** and surfacing **up to 19 previously-unreported risks per library**.
+Lower scores indicate lower risk, and the **Trust Score** is the equal-weight average of the five domains.
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = """
+## Methodology at a Glance
+LibVulnWatch orchestrates a **graph of specialised agents** powered by large language models. Each agent contributes one evidence layer and writes structured findings to a shared memory:
+1️⃣ **Static agents** – licence parsing, secret scanning, call-graph reachability
+2️⃣ **Dynamic agents** – fuzzing harnesses, dependency-confusion probes, CVE replay
+3️⃣ **Metadata agents** – GitHub mining, release-cadence modelling, community health
+4️⃣ **Policy agents** – mapping evidence to NIST SSDF, EU AI Act, and related frameworks
+The aggregator agent converts raw findings into 0–10 scores per domain, producing a reproducible JSON result that is **88 % compatible with OpenSSF Scorecard checks**. All artefacts (SBOMs, logs, annotated evidence) are archived and linked in the public report.
 """
 EVALUATION_QUEUE_TEXT = """