Fix HTTP/2-on-HTTP/1 mislabel on resumed TLS sessions and the recv_status spin#889
Merged
Conversation
…atus spin
On a resumed TLS 1.3 session ssl:negotiated_protocol/1 reports nothing, so a
reused HTTP/2 connection was labeled HTTP/1, fed h2 frames to the h1 parser, and
recv_status looped forever on an unconsumable buffer (one CPU pegged).
Carry the ALPN protocol across resumption: a per host and advertised-ALPN memo
records the protocol learned on a full handshake; session_tickets is gated on
that memo so resumption is offered only once the protocol is known, and a resumed
session resolves against the gate-time snapshot only when it actually resumed.
session_tickets is excluded from the pool key so the gate does not churn buckets.
Guard recv_status: read from the socket on {more} and fail fast with
{error, {bad_response, not_http}} when the bytes cannot start an HTTP/1 status
line, instead of spinning.
A custom-ssl_options handshake (not resumption-eligible, does not seed the
ticket store) could overwrite the shared {Host, AlpnProtos} ALPN entry, so a
later default resumed session read a poisoned snapshot and spoke the wrong
protocol.
Only handshakes hackney enabled resumption for (session_tickets in the effective
opts, i.e. the default config, the single ticket source per host) now update the
memo; non-resumable handshakes use their own reported ALPN and leave the entry
untouched.
The previous check treated any session_tickets entry as resumable, so a custom
ssl_options handshake carrying {session_tickets, disabled} or manual could still
overwrite the shared {Host, AlpnProtos} memo. Gate on {session_tickets, auto}
(what effective_opts adds for the resumable default config) via the new
hackney_ssl:auto_tickets/1, so only that ticket source updates the memo.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes intermittent, reuse-only hangs against HTTPS/h2 servers where one CPU is pegged.
hackney enables TLS 1.3 session resumption for the default config. On a resumed session
ssl:negotiated_protocol/1returnsprotocol_not_negotiated, so the connection was labeledhttp1while the server spoke h2. The HTTP/1 parser could not parse the h2 SETTINGS frame andrecv_statuslooped forever on the unconsumable buffer.recv_status fail-fast
recv_statusnow reads from the socket on{more}and fails fast with{error, {bad_response, not_http}}when the buffered bytes cannot begin an HTTP/1 status line (after tolerating leading CRLF), instead of spinning.Carry ALPN across resumption
The protocol learned on a full handshake is remembered and reused on resumed sessions:
{Host, advertised-ALPN}memo records the protocol from a full handshake (advertised ALPN kept in offered order).session_ticketsis gated on that memo: resumption is offered only once the protocol is known, and the cached value is snapshotted at the gate and carried into resolution, so a concurrent eviction cannot relabel a resumed connection.ssl:connection_informationreports the session actually resumed. A full handshake that reports no ALPN is a realhttp1and refreshes the memo (no stalehttp2).{session_tickets, auto}, added byeffective_optsfor the default config). A custom-ssl_optionshandshake, including one carrying{session_tickets, disabled}ormanual, uses its own reported ALPN and never writes the shared entry, so it cannot poison a later default resumed session.session_ticketsis excluded from the pool key so the gate does not churn buckets.When a memo entry is evicted while OTP still holds a ticket, the next connection finds no entry, so the gate does a full handshake (no resumption), which reports ALPN and re-learns it. Eviction is never a mislabel, only an extra handshake.
The 4.4.5 changelog entry also covers the previously-merged but unreleased keep-alive pooling fix (#888).