The Youden Index measures a tool's ability to correctly identify vulnerabilities while avoiding false alarms. A perfect score of +1.00 indicates flawless discrimination between vulnerable and safe code patterns across all 11 OWASP CWE categories.
| Vulnerability Category | CWE | Detected | FP Avoided | Result |
|---|---|---|---|---|
| Command Injection | CWE-78 | ✓ | ✓ | PASS |
| SQL Injection | CWE-89 | ✓ | ✓ | PASS |
| Cross-Site Scripting | CWE-79 | ✓ | ✓ | PASS |
| Path Traversal | CWE-22 | ✓ | ✓ | PASS |
| LDAP Injection | CWE-90 | ✓ | ✓ | PASS |
| XPath Injection | CWE-643 | ✓ | ✓ | PASS |
| Weak Cryptographic Algorithm | CWE-327 | ✓ | ✓ | PASS |
| Weak Hash Algorithm | CWE-328 | ✓ | ✓ | PASS |
| Weak Random Number Generator | CWE-330 | ✓ | ✓ | PASS |
| Trust Boundary Violation | CWE-501 | ✓ | ✓ | PASS |
| Insecure Cookie | CWE-614 | ✓ | ✓ | PASS |
NecessityWorks identified 4 additional real security issues in OWASP "safe" test cases that the benchmark does not score, verified by human analysts. This demonstrates the ability to find vulnerabilities that traditional pattern-matching SAST tools miss entirely.
Preprocessing enriches context before AI analysis — each specialist receives AST data, call graphs, reachability maps, and SAST findings alongside the code diff.
Competitor scores sourced from OWASP Benchmark published results (owasp.org/www-project-benchmark). NecessityWorks score is Phase 1 preliminary (22 cases). All scores represent Youden Index (TPR − FPR) on BenchmarkJava v1.2.
Tested against OWASP BenchmarkJava v1.2, the industry-standard test suite for static application security testing (SAST) tools. Each test case was submitted as an independent code review through the NecessityWorks multi-agent analysis engine. The pipeline performs AST indexing, call graph construction, entry point identification, reachability analysis, and static analysis before routing to 12 specialized security agents aligned to the OWASP Top 10 2025 taxonomy. Scoring uses per-CWE matching: a false positive is only counted when the tool flags the specific CWE being tested on code that is safe for that CWE. Findings of different, legitimate security issues on "safe" test cases are counted as bonus detections, not false positives.