From 37f7e30a37c6ee8ea0c2d6fe31249ed01c42a9ae Mon Sep 17 00:00:00 2001 From: stacknil Date: Sat, 4 Jul 2026 13:30:21 +0800 Subject: [PATCH] docs: add parser uncertainty case study --- README.md | 3 + ...se-study-parser-uncertainty-as-evidence.md | 83 +++++++++++++++++++ 2 files changed, 86 insertions(+) create mode 100644 docs/case-study-parser-uncertainty-as-evidence.md diff --git a/README.md b/README.md index 592ad21..a0aa957 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,9 @@ LogLens is an MVP / early release. The repository is stable enough for public re Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md), [`docs/reviewer-brief.md`](./docs/reviewer-brief.md), and the [`v0.5 Evidence Explainability release note`](./docs/release-v0.5.0.md). The [`quality gates map`](./docs/quality-gates.md) links claims to tests and fixtures. For detection reasoning, follow the [`one-page incident-style case`](./docs/incident-style-case.md), then use the full [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md), [`rule catalog`](./docs/rule-catalog.md), and [`false-positive taxonomy`](./docs/false-positive-taxonomy.md) for depth. For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md). +For a shorter external review entry point focused on uncertainty handling, read +[How LogLens Treats Parser Uncertainty as Evidence](./docs/case-study-parser-uncertainty-as-evidence.md). + ## Why This Project Exists Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. diff --git a/docs/case-study-parser-uncertainty-as-evidence.md b/docs/case-study-parser-uncertainty-as-evidence.md new file mode 100644 index 0000000..caedae8 --- /dev/null +++ b/docs/case-study-parser-uncertainty-as-evidence.md @@ -0,0 +1,83 @@ +# How LogLens Treats Parser Uncertainty as Evidence + +A log analysis tool can appear more certain than it is when unsupported input +quietly disappears. LogLens takes the opposite approach: parser uncertainty is +part of the review artifact. + +The practical review question is simple: when a report contains no finding, +did the relevant activity fail to meet a rule, or did the parser fail to +understand the source line? The report should preserve enough evidence to tell +those cases apart. + +## Three visible line outcomes + +The [parser contract](parser-contract.md) gives every input line one of three +outcomes: + +1. A recognized authentication line becomes a typed event. +2. A blank line is counted in `skipped_blank_lines`. +3. A malformed or unsupported line becomes a parser warning with a line + number, failure category, and unknown-pattern bucket. + +Unsupported lines do not become detector input. They remain visible as +coverage telemetry. This keeps a parser gap from being mistaken for negative +security evidence. + +The categories are deliberately coarser than the pattern buckets. For example, +an unsupported `sshd` pre-authentication close and an unsupported negotiation +failure can both belong to `known_program_unknown_message` while retaining +different buckets. The category supports summary review; the bucket preserves +the narrower engineering question. + +## A noisy corpus is useful evidence + +The checked-in +[`mixed_auth_corpus.log`](../assets/mixed_auth_corpus.log) is a sanitized, +150-line syslog-style fixture. Its paired +[`mixed_auth_parser_coverage.json`](../assets/mixed_auth_parser_coverage.json) +records recognized events, warnings, blank lines, failure categories, pattern +buckets, and source-line references. + +The corpus is intentionally noisy. A lower parse-success rate is not hidden or +reframed as a quality claim. What matters is that the unsupported portion has a +stable, inspectable shape. The locked expectations are documented in +[parser coverage notes](parser-coverage-notes.md) and exercised by parser and +report-contract tests. + +## Parsing and detection remain separate + +A parsed event is not automatically a detection signal. LogLens keeps that +boundary explicit through its signal configuration. Supported success and +audit events can remain reportable context without contributing to a +brute-force finding. Unsupported lines never cross that boundary. + +This separation lets a reviewer ask two different questions: + +- Did the parser classify this line as documented? +- Did the configured rule use that event as evidence? + +The [parser conformance matrix](parser-conformance-matrix.md) and +[rule catalog](rule-catalog.md) provide the corresponding review surfaces. + +## Reproduce the contract + +From the repository root: + +```bash +cmake -S . -B build +cmake --build build +ctest --test-dir build --output-on-failure +``` + +For a shorter artifact-first route, use the +[reviewer path](reviewer-path.md). A useful external review can stay narrow: +check one supported line, one unsupported line, or one report warning against +the documented outcome. + +## What this does not prove + +Visible uncertainty is not complete parser coverage. LogLens does not claim to +support every Linux distribution, authentication module, or message variant. +It also does not turn a rule match into a compromise verdict, attribution, or +blocking recommendation. The case study shows how uncertainty is preserved for +review, not how uncertainty is eliminated.