Skip to content

Conversation

Akindotcome
Copy link

This PR improves the SARIF formatter with the following changes:

  • Added partialFingerprints (primaryLocationLineHash) for stable deduplication across runs/refactors.
  • Included CWE tags and Bandit test IDs in tags for better rule categorization.
  • Set precision based on Bandit’s confidence levels (HIGH/MEDIUM/LOW → high/medium/low).
  • Ensured absolute Windows paths are preserved in artifactLocation.uri to match test expectations.
  • Added raw original_path property for clarity and debugging.

~ All unit tests in tests/unit/formatters/ pass.
~ Functional tests are unchanged by this PR.

This should make Bandit’s SARIF output more compliant and interoperable with tools that consume SARIF logs.

Closes #646
Related to #737

@Akindotcome
Copy link
Author

Hi, I am still hoping my PR gets reviewed

@Akindotcome
Copy link
Author

@sigmavirus24 , @ericwb , @lukehinds
Hi - friendly ping on this SARIF improvement PR.

It adds partialFingerprints, tags/precision, preserves Windows absolute paths, and unit tests/unit/formatters/ pass.

Could one of you kindly review when you have a moment?

Thanks

@Akindotcome Akindotcome force-pushed the feature/sarif-cwe-fingerprints branch from 15e97ad to 87da4ed Compare September 29, 2025 20:44
""" # noqa: E501
"""
# noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you moved this to a new line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these examples are valid restructuredtext.

original_paths = set()

if len(rules) > 0:
for iss in issues:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the same variable names to minimizing diffs.

Suggested change
for iss in issues:
for issue in issues:

original_paths.add(iss.as_dict().get("filename", ""))
except (AttributeError, TypeError, KeyError):
# Best-effort only
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bandit itself would raise an issue about an empty except block.

Copy link
Member

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of particularly bad LLM slop here. I was promised they were good at writing python because of how much python was out there. I feel lied to.



def create_result(issue, rules, rule_indices):
"""Convert a Bandit Issue into a SARIF Result and ensure its rule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long and needs to be rewritten

"""
issue_dict = issue.as_dict()

# Ensure rule exists / get index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this comment?

""" # noqa: E501
.. versionadded:: 1.7.8
"""
# noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a noqa here? It does absolutely nothing. Remove it

Comment on lines +302 to +308
def _precision_from_confidence(confidence: str) -> str:
# Bandit uses HIGH/MEDIUM/LOW strings for confidence
c = (confidence or "").upper()
if c in ("HIGH", "MEDIUM", "LOW"):
return c.lower()
return "medium"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessarily inefficient

issue_dict["code"],
)

# Map severity -> SARIF level; omit default "warning" per SARIF (and tests)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments really are unnecessary. They're explaining what happens which the function names do a decent job of already

result_props = {
"issue_confidence": issue_dict["issue_confidence"],
"issue_severity": issue_dict["issue_severity"],
# Ensure raw path appears in serialized SARIF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you repeated that comment here. One or both should disappear and be explained with one very clear explanation of why some windows test needs a raw filename (whatever that is)

"original_path": filename_raw,
}

# Add a light-weight tags array on results too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useless comment

physical_location, line_range, col_offset, end_col_offset, code
):
"""
Populates physical_location.region (and context_region if code provided).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More invalid, poorly written, style-inconsistent docstrings

)

if code:
# Wider context for viewer UX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a helpful comment

# Tags: always include "security"; include CWE tag only if present
# and non-zero
tags = ["security"]
cwe_id = (issue_dict.get("issue_cwe") or {}).get("id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is unnecessarily bad python code. Just,

Suggested change
cwe_id = (issue_dict.get("issue_cwe") or {}).get("id")
cwe_id = issue_dict.get("issue_cwe", {}).get("id")

I don't know why I have to give you feedback for your LLM to process at this point.

Copy link

@gpotter2 gpotter2 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if it was just us getting spammed by this user so I went see what other PRs they made and found this. It looks suspiciously AI generated, glad you came to the same conclusion.

(4 PRs about the same topic in a week ! each time growing exponentially)
secdev/scapy#4843
secdev/scapy#4844
secdev/scapy#4845
secdev/scapy#4848

Anyway good luck. Personally I think I'll throw in a 30days ban.

@Akindotcome
Copy link
Author

@sigmavirus24, @ericwb
Thank you for the thorough reviews and feedback. I appreciate the time and detail from everyone.

I’ll address the points you raised (docstring/reST fixes, variable naming consistency, removing the empty except, simplifying confidence/CWE handling, and trimming unnecessary comments) and push the updates immediately I have done the corrections. I’ll be available to iterate further if anything else comes up.

@gpotter2
Sorry for the misunderstanding here, I am only trying to contribute to meaningful open source projects and grow knowledge-wise as well. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for the SARIF (Static Analysis Results Interchange Format)
4 participants