Skip to content

Data race between gc.get_stats() and a concurrent collection under free-threading #151646

@Naserume

Description

@Naserume

Bug report

Bug description:

gc.get_stats() (gc_get_stats_impl) plain-copies the per-generation stats structs
gcstate->generation_stats->{young,old}.items[index] without any lock or atomics,

cpython/Modules/gcmodule.c

Lines 377 to 379 in 9e863fa

stats[0] = gcstate->generation_stats->young.items[gcstate->generation_stats->young.index];
stats[1] = gcstate->generation_stats->old[0].items[gcstate->generation_stats->old[0].index];
stats[2] = gcstate->generation_stats->old[1].items[gcstate->generation_stats->old[1].index];

while a collection on another thread writes those same items[index] fields (collections++, collected += m, ...) at the end of gc_collect_main,

/* Update stats */
struct gc_generation_stats *stats = get_stats(gcstate, generation);
stats->ts_start = start;
stats->ts_stop = stop;
stats->collections++;
stats->collected += m;
stats->uncollectable += n;
stats->duration += duration;
stats->candidates += state.candidates;

so gc.get_stats() races the collector.

Reproducer:

import gc
from threading import Thread

def collector():
    for _ in range(50000):
        a = {}; a['self'] = a
        gc.collect()

def reader():
    for _ in range(50000):
        gc.get_stats()

threads  = [Thread(target=collector) for _ in range(1)]
threads += [Thread(target=reader)    for _ in range(8)]
for t in threads: t.start()
for t in threads: t.join()

TSAN Report:

==================
WARNING: ThreadSanitizer: data race (pid=2301505)
  Read of size 8 at 0x7fffb41a0180 by thread T2:
    #0 gc_get_stats_impl /cpython/Modules/gcmodule.c:379:16 
    #1 gc_get_stats /cpython/Modules/clinic/gcmodule.c.h:456:12 
    #2 cfunction_vectorcall_NOARGS /cpython/Objects/methodobject.c:508:24 
    #3 _PyObject_VectorcallTstate /cpython/./Include/internal/pycore_call.h:144:11 
    #4 PyObject_Vectorcall /cpython/Objects/call.c:327:12 
    #5 _Py_VectorCallInstrumentation_StackRefSteal /cpython/Python/ceval.c:768:11 
    #6 _PyEval_EvalFrameDefault /cpython/Python/generated_cases.c.h:1846:35 
    ...

  Previous write of size 8 at 0x7fffb41a0180 by thread T1:
    #0 gc_collect_main /cpython/Python/gc_free_threading.c:2288:23 
    #1 _PyGC_Collect /cpython/Python/gc_free_threading.c:2568:12 
    #2 gc_collect_impl /cpython/Modules/gcmodule.c:93:12 
    #3 gc_collect /cpython/Modules/clinic/gcmodule.c.h:143:21 
    #4 _Py_BuiltinCallFastWithKeywords_StackRef /cpython/Python/ceval.c:841:11 
    #5 _PyEval_EvalFrameDefault /cpython/Python/generated_cases.c.h:2508:35 
    ...

SUMMARY: ThreadSanitizer: data race /cpython/Modules/gcmodule.c:379:16 in gc_get_stats_impl
==================

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions