LogLens parser coverage is intentionally visible. Noisy logs should produce a useful coverage shape instead of a quiet success claim.
assets/noisy_auth_sample.log is a sanitized syslog_legacy sample for reviewer inspection. It mixes recognized authentication evidence with common log noise:
- malformed timestamp evidence
- unsupported but bucketed
sshdpreauth, disconnect, and negotiation lines - partial PAM evidence that is either lower-confidence parsed evidence or telemetry-only warning evidence
- sudo denial variants that still become typed audit events
- empty, blank, rotated, and irrelevant service lines
- multiple hosts and intentionally unusual synthetic usernames
The locked expected coverage summary lives in tests/fixtures/parser_matrix/noisy_auth_expected.json. It focuses on parser quality fields rather than detector findings:
total_input_lines: 27skipped_blank_lines: 3parsed_lines: 8unparsed_lines: 16parse_success_rate: 0.3333333333failure_categories: coarse parser boundary categories for unsupported linestop_unknown_patterns: the five most common unsupported-pattern buckets
assets/mixed_auth_corpus.log is a 150-line sanitized syslog_legacy corpus for dirty-input review. It mixes Ubuntu / Debian auth.log-style and RHEL-family secure-style host labels while keeping the same BSD syslog header contract. This is a parser-observability fixture, not a claim of complete distro coverage.
The corpus repeats ten small evidence batches. Each batch includes recognized sshd, sudo, su, pam_unix, pam_faillock, and pam_sss evidence; unsupported sshd preauth and pam_unix session-close telemetry; an unsupported service program; a malformed source IP; an invalid timestamp; and one blank line.
For reviewer inspection without running the test suite, assets/mixed_auth_parser_coverage.json captures the deterministic parser coverage view for this corpus: parser-quality counters, normalized event-type counts, unknown-pattern buckets, failure categories, and warning line references.
Locked parser expectations:
total_input_lines: 150skipped_blank_lines: 10parsed_lines: 90unparsed_lines: 50- normalized event counts: 10 invalid-user SSH failures, 10 failed-publickey SSH events, 10 accepted-publickey SSH events, 10 sudo command events, 10 sudo auth failures, 30 PAM auth failures, and 10
suauth failures failure_categories: 10 each forknown_program_unknown_message,malformed_source_ip,unknown_program,unknown_timestamp, andunsupported_pam_varianttop_unknown_patterns: 10 each forinvalid_month_token,malformed_source_ip,pam_unix_session_closed,program_cron, andsshd_connection_closed_preauth
A low parse success rate is not automatically a bug for this fixture. The sample is deliberately noisy, and the useful property is that unsupported evidence remains explainable through warnings, failure_categories, and top_unknown_patterns.
The matrix should stay defensive and public-safe: use documentation IP ranges, synthetic hostnames, and synthetic usernames only.