NecessityWorks OWASP Benchmark Scorecard

0.91

Youden Index

TPR − FPR

94%

True Positive Rate

Real vulns caught

3%

False Positive Rate

Benign code flagged

94%

True Positive Rate

3%

False Positive Rate

11/11

CWE Categories

2,740

Test Cases

Youden Index: 0.91

The Youden Index (TPR − FPR) measures a tool's ability to catch real vulnerabilities while avoiding false alarms — 0.0 is a coin flip, 1.0 is perfect. At 0.91 across all 11 OWASP CWE categories, NecessityWorks discriminates vulnerable from safe code far above the legacy-SAST field, whose published OWASP Benchmark scores average about 0.24 and top out near 0.39.

0.91

−1.0 Inverted 0.0 Random +1.0 Perfect

Highest published Youden Index on OWASP BenchmarkJava v1.2. 94% of real vulnerabilities caught at a 3% false-positive rate — full suite, no cherry-picking.

Per-CWE Detection Results — Youden Index by Vulnerability Class

Vulnerability Class	CWE	Legacy SAST	NecessityWorks
SQL Injection	CWE-89	0.34	0.97
OS Command Injection	CWE-78	0.31	0.96
Path Traversal	CWE-22	0.19	0.93
Insecure Randomness	CWE-330	0.25	0.92
Cross-Site Scripting	CWE-79	0.28	0.91
Insecure Cookie	CWE-614	0.29	0.90
Weak Hash	CWE-328	0.22	0.89
LDAP Injection	CWE-90	0.21	0.88
XPath Injection	CWE-643	0.20	0.88
Weak Cryptographic Algorithm	CWE-327	0.18	0.86
Trust Boundary Violation	CWE-501	0.15	0.84

Beyond the Benchmark — Additional Verified Findings

NecessityWorks also identified real security issues inside OWASP "safe" test cases that the benchmark does not score — verified by human analysts. Per-CWE scoring counts these as bonus detections, not false positives: a false positive is only recorded when the tool flags the specific CWE under test on code that is safe for that CWE. This demonstrates the ability to surface vulnerabilities that pattern-matching SAST tools miss entirely.

Multi-Agent Analysis Pipeline

01

AST Index

02

Attack Paths

03

Code Intel

04

Existing Controls

05

Preliminary Analysis Engines

▼

Dozens of Specialist Agents

aligned to CWE vulnerability groups

Five preprocessing stages enrich context before AI analysis — each specialist receives AST data, call graphs, reachability maps, and SAST findings alongside the code diff.

Industry Comparison — OWASP Benchmark Youden Index

NecessityWorks

0.91

FindSecBugs v1.4.6

0.39

SonarQube v3.14

0.33

Commercial SAST avg
OWASP SAST-01–SAST-06

0.26

FindBugs · PMD

0.00

Source: OWASP Benchmark v1.2 published scorecards (owasp.org/www-project-benchmark • OWASP-Benchmark/BenchmarkJava). OWASP publishes commercial SAST tools anonymized as SAST-01–SAST-06 — the bar shows their mean Youden (0.26); individually they range 0.17–0.33. Across all published SAST scorecards the average Youden is ≈ 0.24. NecessityWorks measured on the same BenchmarkJava v1.2 suite (2,740 cases).

Methodology

Tested against the full OWASP BenchmarkJava v1.2 suite — 2,740 test cases across 11 vulnerability categories, the same public suite every commercial SAST tool is measured against. No subset, no private corpus, no fine-tuning on the benchmark. Every finding was generated by the same multi-agent pipeline customers run in production. Each case was submitted as an independent code review; results were scored by the official OWASP Benchmark scorecard and adversarially reviewed against ground truth. Scoring uses per-CWE matching: a false positive is only counted when the tool flags the specific CWE under test on code that is safe for that CWE. Where ground truth was disputed, the more conservative score is reported.

Request Early Access →

NecessityWorks AI-Native SAST

Youden Index: 0.91

Per-CWE Detection Results — Youden Index by Vulnerability Class

Beyond the Benchmark — Additional Verified Findings

Multi-Agent Analysis Pipeline

Industry Comparison — OWASP Benchmark Youden Index

Methodology