Skip to content

gh-88110: clear concurrent.futures.thread._threads_queues after fork to avoid joining parent process' threads#126098

Merged
serhiy-storchaka merged 11 commits intopython:mainfrom
Drino:bug88110_tests
Nov 22, 2024
Merged

gh-88110: clear concurrent.futures.thread._threads_queues after fork to avoid joining parent process' threads#126098
serhiy-storchaka merged 11 commits intopython:mainfrom
Drino:bug88110_tests

Conversation

@Drino
Copy link
Contributor

@Drino Drino commented Oct 29, 2024

I've added a test to marmarek@ PR: #101940

_threads_queues are copied as-is into the fork memory, but there are no threads in the child process, so child process crashes when calling t.join() in _python_exit.

I'm facing this issue for the second time during last two years, so I hope it can be fixed :)

marmarek and others added 4 commits October 29, 2024 02:10
… fork

Threads are gone after fork, so clear the queues too. Otherwise the
child process (here created via multiprocessing.Process) crashes on
interpreter exit with:

    Traceback (most recent call last):
      File "/usr/lib64/python3.11/multiprocessing/popen_fork.py", line 72, in _launch
        code = process_obj._bootstrap(parent_sentinel=child_r)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/multiprocessing/process.py", line 332, in _bootstrap
        threading._shutdown()
      File "/usr/lib64/python3.11/threading.py", line 1561, in _shutdown
        atexit_call()
      File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
        t.join()
      File "/usr/lib64/python3.11/threading.py", line 1109, in join
        raise RuntimeError("cannot join current thread")
    RuntimeError: cannot join current thread

Fixes python#88110
@ghost
Copy link

ghost commented Oct 29, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Co-authored-by: RUANG (James Roy) <longjinyii@outlook.com>
@Drino
Copy link
Contributor Author

Drino commented Nov 2, 2024

Friendly ping for this review, @rruuaanng

Hopefully this one line of code can make it to main :)

@Drino Drino changed the title gh-88110: clear concurrent.futures.thread._threads_queues after fork gh-88110: clean concurrent.futures.thread._threads_queues after fork to avoid joining parent process' threads Nov 10, 2024
@Drino Drino changed the title gh-88110: clean concurrent.futures.thread._threads_queues after fork to avoid joining parent process' threads gh-88110: clear concurrent.futures.thread._threads_queues after fork to avoid joining parent process' threads Nov 10, 2024
@ZeroIntensity
Copy link
Member

cc @serhiy-storchaka

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please silence the deprecation warning:

test_process_fork_from_a_threadpool (test.test_concurrent_futures.test_thread_pool.ThreadPoolExecutorTest.test_process_fork_from_a_threadpool) ... /home/serhiy/py/cpython-tmp/Lib/multiprocessing/popen_fork.py:67: DeprecationWarning: This process (pid=1243139) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
0.00s ok

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is failing.

Colonel Mustard did it in the Library with the Lead Pipe.
@gpshead gpshead added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Nov 20, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit b8d3f31 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@gpshead gpshead self-assigned this Nov 20, 2024
@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 20, 2024
@serhiy-storchaka serhiy-storchaka merged commit 1848ce6 into python:main Nov 22, 2024
@miss-islington-app
Copy link

Thanks @Drino for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 22, 2024
… fork to avoid joining parent process' threads (pythonGH-126098)

Threads are gone after fork, so clear the queues too. Otherwise the
child process (here created via multiprocessing.Process) crashes on
interpreter exit.

(cherry picked from commit 1848ce6)

Co-authored-by: Andrei Bodrov <Drino@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 22, 2024

GH-127163 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 22, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 22, 2024
… fork to avoid joining parent process' threads (pythonGH-126098)

Threads are gone after fork, so clear the queues too. Otherwise the
child process (here created via multiprocessing.Process) crashes on
interpreter exit.

(cherry picked from commit 1848ce6)

Co-authored-by: Andrei Bodrov <Drino@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 22, 2024

GH-127164 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Nov 22, 2024
serhiy-storchaka added a commit that referenced this pull request Nov 22, 2024
…r fork to avoid joining parent process' threads (GH-126098) (GH-127163)

Threads are gone after fork, so clear the queues too. Otherwise the
child process (here created via multiprocessing.Process) crashes on
interpreter exit.

(cherry picked from commit 1848ce6)

Co-authored-by: Andrei Bodrov <Drino@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@Drino Drino deleted the bug88110_tests branch November 24, 2024 21:32
serhiy-storchaka added a commit that referenced this pull request Nov 30, 2024
…r fork to avoid joining parent process' threads (GH-126098) (GH-127164)

Threads are gone after fork, so clear the queues too. Otherwise the
child process (here created via multiprocessing.Process) crashes on
interpreter exit.

(cherry picked from commit 1848ce6)

Co-authored-by: Andrei Bodrov <Drino@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
… fork to avoid joining parent process' threads (pythonGH-126098)

Threads are gone after fork, so clear the queues too. Otherwise the
child process (here created via multiprocessing.Process) crashes on
interpreter exit.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
yxieca pushed a commit to sonic-net/sonic-mgmt that referenced this pull request Feb 11, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Feb 12, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Zhuohui Tan <zhuohui.tan@amd.com>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Feb 25, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
aronovic pushed a commit to aronovic/sonic-mgmt that referenced this pull request Mar 3, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Mihut Aronovici <aronovic@cisco.com>
ravaliyel pushed a commit to ravaliyel/sonic-mgmt that referenced this pull request Mar 12, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Ravali Yeluri (WIPRO LIMITED) <v-ryeluri@microsoft.com>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
What is the motivation for this PR?
To fix the ansible worker dead issue observed in sonic-mgmt test.
This issue is that the ansible worker is detected dead when calling ansible from thread worker in thread pool.
This is same as python/cpython#88110.

The root cause is that, concurrent.futures.thread thread pool registers a callback to poll the threads when python interpreter exits, and those thread workers are stored in the dictionary concurrent.futures.thread._threads_queues. The ansible forked child worker process will inherit this dictionary, which contains those orphaned threads. And when the ansible child worker process tries to exit, the callback polls those orphaned threads, causing the worker process returns 1 and ansible complains the worker dead.

How did you do it?
Backport the cpython fix python/cpython#126098 if the sonic-mgmt python version is < 3.12.8.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Abhishek <abhishek@nexthop.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants