High Performance Cooking – or how computers prepare our numerical meals…

As you are reading this text you are most likely using a device that contains a collection of central processing units (CPUs) running processes and threads, and potentially even uses a graphics processing unit (GPU) to render it nicely for you. Besides all of us using these “things” every day in computers, tablets, and smartphones, processors/cores/nodes are a key element of the toolbox of modern science. Yet, I could never remember the difference between these terms, and if you ever wanted to memorize the difference with a sweet kitchen metaphor then this blog post will serve you well!

For simplicity, I will illustrate two cases. First, the simpler case of a typical laptop or desktop computer, and then a more specific modern GPU accelerated cluster setup.

1. The restaurant case

If you would open up your computer (it doesn’t matter too much if it is a desktop or laptop), you would find a typical hardware set-up with the following components performing your computation:

1 physical socket for the CPU chip, that is a collection of
multiple (typically 8-16) cores = independent computation units inside that chip

If you connect that to the software side, different processes run on the CPU chip, e.g., a browser application or python program performing a simulation. These processes consist each of multiple subtasks, the threads.

Let’s map that to the restaurant analogy: The computer would then correspond to a restaurant, the socket with the CPU chip to the kitchen, and each core to a cook. A kitchen gets in many orders – the processes – that consist of dishes – the threads – which a cook will cook.

It is then also straight forward to understand the difference between two different ways to make your program compute faster: Multi-processing vs multi-threading. In the simplest setup, there is one list of all the orders with their dishes, and that is handed to one cook that serially prepares all of them. Not very efficient, when you have many other cooks that are just sitting around. Wouldn’t it be much cleverer to distribute the orders among the cooks (=multi-processing) or even share the work at the dish-level (multi-threading)? This is exactly what is being used in many cases, and for the people writing their own code: Message Passing Interface (MPI) is commonly used to communicate between different processes, whereas the Open Multi-Processing (OMP) is commonly used for multi-threading in compiled languages like C++ and Fortran. High-level languages like Python and Julia have their own equivalents built in.

There is one additional trick to speed up your computation: hyperthreading a single core. Cores often have waiting times between performing an actual computation step, for example when they wait for writing/reading numbers from the disk before they can be actually multiplied. Hyperthreading means to execute a second thread at the same time one the same core by switching quickly back and forth between the two threads. This way one can fill the waiting time gap of one thread to make progress on the other thread. This is similar to the cook that must wait for an ingredient to boil and in the meantime continues chopping other ingredients.

2. Scaling up to the GPU-accelerated restaurant chain

When computations become really heavy, the easiest way to speed up computation time is to use more computing units and distribute the work (if possible) among them. This leads to many computers working in parallel on the same problem: A high-performance computing (HPC) cluster. In this context, one can think of the hardware as many desktop computers stacked into a big rack, and typically an entire room filled with those racks. Then, the single computers are termed “nodes”. In our restaurant analgoy we extend the single restaurant to a chain, where every node is a restaurant now. And different to the typical personal computer, it is more common to use multiple sockets (= CPU chips, typically two) per node – so the restaurants get equipped with multiple kitchens.

This is already how many of the most intense scientific calculations are “cooked”. But of course, there is more… Specifically when it comes to intensive graphic rendering, the computation requires a lot of the same type of computation – massively parallel operations such as matrix multiplications. This led to the development of hardware that is optimized for such types of calculations: the GPUs. While a CPU is designed to balance speed with the versatility all the different tasks needed to perform for the operating system of a computer, GPUs are only good at one thing but then very fast. You can imagine this as a kitchen full of specialized ultrafast GPU choppers, compared to the CPU chef that can chop, cook, boil, fry… etc. at a decent speed. But if you know that you expect a lot of chopping you can speed up your food preparation by adding another specialized chopping kitchen and let those accelerate your chopping. This is exactly the idea behind GPU acceleration.

So, if we wrap this all up together (see table below) your modern computer is most-likely a restaurant with a CPU kitchen socket, which coordinates many multitasking core chefs to prepare order-processes consisting of dish-threads. And if it gets to real high-performance cooking, many of those restaurant nodes make up a restaurant chain cluster. And finally, if you hear next time about how GPU acceleration will solve all our problems, you remember it’s just a specialized chopping section and might be really good at speeding up parts of the problem.

I hope you keep ordering from your computer and don’t have to starve too long for your results!

High Performance Computing	High Performance Cooking
Cluster	Restaurant chain
Computer/node	Restaurant
Socket/CPU chip	Kitchen
Core	Chef
Process	Order
Thread	Dish (part of an order)
Hyperthreading	1 chef cooking two things at the same time
GPU chip	Specialized chopping unit

High Performance Cooking – or how computers prepare our numerical meals…

1. The restaurant case

2. Scaling up to the GPU-accelerated restaurant chain

About The Author

Marc Klinger-Plaisier

1. The restaurant case

2. Scaling up to the GPU-accelerated restaurant chain

Share this:

About The Author

Marc Klinger-Plaisier

Related Posts