wu981526092 commited on
Commit
3342a22
Β·
1 Parent(s): 83290d2

Enrich About & README with details from ACL/ICML paper; fix YAML frontmatter

Browse files
Files changed (2) hide show
  1. README.md +3 -1
  2. src/about.py +18 -14
README.md CHANGED
@@ -7,7 +7,7 @@ sdk: gradio
7
  app_file: app.py
8
  pinned: true
9
  license: mit
10
- short_description: Duplicate this leaderboard to initialize your own!
11
  sdk_version: 5.19.0
12
  ---
13
 
@@ -46,3 +46,5 @@ You'll find
46
  - the main table' columns names and properties in `src/display/utils.py`
47
  - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
48
  - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
 
 
 
7
  app_file: app.py
8
  pinned: true
9
  license: mit
10
+ short_description: Continuous multi-domain vulnerability assessment for open-source AI libraries (ACL '25 SRW & ICML '25 TAIG)
11
  sdk_version: 5.19.0
12
  ---
13
 
 
46
  - the main table' columns names and properties in `src/display/utils.py`
47
  - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
48
  - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
49
+
50
+ > **LibVulnWatch** was presented at the **ACL 2025 Student Research Workshop** and accepted to the **ICML 2025 Technical AI Governance workshop**. The system uncovers hidden security, licensing, maintenance, dependency and regulatory risks in popular AI libraries and publishes a public leaderboard for transparent ecosystem monitoring.
src/about.py CHANGED
@@ -28,28 +28,32 @@ TITLE = """<h1 align="center" id="space-title">LibVulnWatch: Vulnerability Asses
28
 
29
  # What does your leaderboard evaluate?
30
  INTRODUCTION_TEXT = """
31
- ## Systematic Vulnerability Assessment and Leaderboard Tracking for Open-Source AI Libraries
32
 
33
- This leaderboard provides continuous vulnerability assessment for open-source AI libraries across five critical risk domains:
34
- - **License Validation**: Legal risks based on license type, compatibility, and requirements
35
- - **Security Assessment**: Vulnerability severity and patch responsiveness
36
- - **Maintenance Health**: Sustainability and governance practices
37
- - **Dependency Management**: Vulnerability inheritance and supply chain security
38
- - **Regulatory Compliance**: Compliance readiness for various frameworks
39
 
40
- Lower scores indicate fewer vulnerabilities and lower risk. The Trust Score is an equal-weighted average of all five domains, providing a balanced assessment of overall library trustworthiness.
 
 
 
 
 
 
 
41
  """
42
 
43
  # Which evaluations are you running? how can people reproduce what you have?
44
  LLM_BENCHMARKS_TEXT = """
45
- ## How LibVulnWatch Works
 
 
46
 
47
- Our assessment methodology evaluates libraries through:
48
- 1. **Static Analysis**: Code review, license parsing, and documentation examination
49
- 2. **Dynamic Analysis**: Vulnerability scanning, dependency checking, and API testing
50
- 3. **Metadata Analysis**: Repository metrics, contributor patterns, and release cadence
51
 
52
- Each library receives a risk score (0-10) in each domain, with lower scores indicating lower risk.
53
  """
54
 
55
  EVALUATION_QUEUE_TEXT = """
 
28
 
29
  # What does your leaderboard evaluate?
30
  INTRODUCTION_TEXT = """
31
+ ## LibVulnWatch – Continuous, Multi-Domain Risk Scoring for AI Libraries
32
 
33
+ _As presented at the **ACL 2025 Student Research Workshop** and the **ICML 2025 Technical AI Governance (TAIG) workshop**_, LibVulnWatch provides an evidence-based, end-to-end pipeline that uncovers **hidden vulnerabilities** in open-source AI libraries across five governance-aligned domains:
 
 
 
 
 
34
 
35
+ β€’ **License Validation** – compatibility, provenance, obligations
36
+ β€’ **Security Assessment** – CVEs, patch latency, exploit primitives
37
+ β€’ **Maintenance Health** – bus-factor, release cadence, contributor diversity
38
+ β€’ **Dependency Management** – transitive risk, SBOM completeness
39
+ β€’ **Regulatory Compliance** – privacy/export controls, policy documentation
40
+
41
+ In the paper we apply the framework to **20 popular libraries**, achieving **88 % coverage of OpenSSF Scorecard checks** and surfacing **up to 19 previously-unreported risks per library**.
42
+ Lower scores indicate lower risk, and the **Trust Score** is the equal-weight average of the five domains.
43
  """
44
 
45
  # Which evaluations are you running? how can people reproduce what you have?
46
  LLM_BENCHMARKS_TEXT = """
47
+ ## Methodology at a Glance
48
+
49
+ LibVulnWatch orchestrates a **graph of specialised agents** powered by large language models. Each agent contributes one evidence layer and writes structured findings to a shared memory:
50
 
51
+ 1️⃣ **Static agents** – licence parsing, secret scanning, call-graph reachability
52
+ 2️⃣ **Dynamic agents** – fuzzing harnesses, dependency-confusion probes, CVE replay
53
+ 3️⃣ **Metadata agents** – GitHub mining, release-cadence modelling, community health
54
+ 4️⃣ **Policy agents** – mapping evidence to NIST SSDF, EU AI Act, and related frameworks
55
 
56
+ The aggregator agent converts raw findings into 0–10 scores per domain, producing a reproducible JSON result that is **88 % compatible with OpenSSF Scorecard checks**. All artefacts (SBOMs, logs, annotated evidence) are archived and linked in the public report.
57
  """
58
 
59
  EVALUATION_QUEUE_TEXT = """