Python Multithreading

Multithreading¶

A normal thread will continue executing even if the main thread finishes.
- We can specify that we want the main thread to wait for other threads using the .join()
There are 2 ways of creating threads in python
- Class based
- Function based

Function based way for creating threads¶

from threading import Thread
def func_4_thread(n_max: int = 1_000_000) -> None:
    n = 0
    while n < n_max:
        n += 1

# notice how we provide the arguments
my_thread = Thread(target=func_4_thread, args=(10_000_000,))
my_thread.start()

Class based way for creating threads¶

Inherit the Thread object and implement the run method

from threading import Thread

class MyThread(Thread):
    def __init__(self, n_max=1_000_000) -> None:
        Thread.__init__(self)
        self.n_max = n_max

    def func_thread(self) -> None:
        n = 0
        while n < self.n_max:
            n += 1

    def run(self) -> None:
        self.func_thread()

my_thread = MyThread(n_max=1_000_000)
my_thread.start()

Joining the threads to the main thread¶

Simple Example:

# SuperFastPython.com

# example of executing a target task function in a separate thread

from time import sleep
from threading import Thread

# a simple task that blocks for a moment and prints a message
def task():
    sleep(1)
    print('This is coming from another thread')

thread = Thread(target=task)

# start the task in a new thread
thread.start()

print('Waiting for the new thread to finish...')
thread.join() # asking the main thread to wait for this thread to finish.

This way of creating threads is useful for running one-off ad hoc tasks in a separate thread, although it becomes cumbersome when you have many tasks to run.
Each thread that is created requires the application of resources (e.g. memory for the thread’s stack space).
- The computational costs for setting up threads can become expensive if we are creating and destroying many threads over and over for ad hoc tasks.
Instead, we would prefer to keep worker threads around for reuse if we expect to run many ad hoc tasks throughout our program.
- This can be achieved using a thread pool.

Daemon threads¶

At an OS level, daemons are background processes that run without interaction with the user.
In the context of Python threads, daemons are simply background threads.
The difference with normal threads is that the program will exit when there are only daemon threads running.
- In other words, the program will wait for normal threads to finish (no cancellation); as soon as they are done, all running daemons will be terminated, and the program will exit.
For a thread to be a daemon thread we specify daemon = True

Return values from threads¶

There is no way of returning from a thread so we use queues to store the results.
- Thread object does not have a way of returning objects from the target.

Example:

import time
from threading import Thread
from queue import Queue

def func_4_thread(q_out: Queue) -> None:
    print("thread doing work...")
    time.sleep(5)
    func_result = func()
    q_out.put(func_result)

func_result_queue: Queue = Queue(maxsize=0)
thread = Thread(target=func_4_thread, args=(func_result_queue,))
thread.start()

func_result = func_result_queue.get()
print(func_result, "from queue")

Using locks in threading¶

When to use a lock?

Are the objects I’m using and the operation I’m performing thread-safe? If then then we should use a lock().

Example:

import time
from threading import Thread, Lock

my_cache = {str(i): i for i in range(100)}
lck = Lock()

def check_cache() -> None:
    while True:
        with lck:
            for key, value in my_cache.items():
                # do important stuff with cache items
                pass

def add_to_cache() -> None:
    while True:
        with lck:
            current_time = time.time_ns()
            my_cache[str(current_time)] = current_time
            time.sleep(0.1)

check_cache_thread = Thread(target=check_cache)
check_cache_thread.start()

add_to_cache_thread = Thread(target=add_to_cache)
add_to_cache_thread.start()

What happens if we don't apply a lock here?
- There is chance then when check_cache() acquires the GIL it is iterating through the cache.
- While it is halfway done iterating add_to_cache() acquires the GIL and modifies the size of the cache.
- Now when check_cache occupies the GIL again it finds that the dictionary size has changed and will give a RuntimeError: dictionary changed size during iteration.
A lock ensures add_to_cache doesn't acquire the GIL while check_cache is working on it.

Threads are a simple way to achieve concurrency.

We don’t need event loops, funky function definitions, or spawning additional processes and communicating with them.

References¶

Python Concurrency — Threading and the GIL | by Diego Barba | Towards Data Science

Last updated: 2022-11-09