GPU vs. CPU¶
How does CPU differ from GPU (in terms of training ML models)?
Think about CPUs as a 4-lane highway with trucks delivering the computation, and GPUs as a 100-lane highway with little shopping carts. GPUs are great at parallelism, but only for less complex tasks.
Deep learning, specifically, benefits from that since it's mainly batches of matrix multiplication, and these can be parallelized very easily. So training a neural network in a GPU can be 10x faster than on a CPU. But other models don't get that benefit at all.
Thank you! This makes a lot more sense now.
Wait, I've got a deep analogy coming!
Let’s say that I am in a very large library, and my goal is to count all of the books.
There’s a librarian. That librarian is super smart and knowledgeable about where books are, how they’re organized, how the library works, and all that. The librarian is the boss! The librarian is perfectly capable of counting the books on their own and they’ll probably be very good and organized about it.
But what if there was a big team of people who could count the books with the librarian? We don’t need these people to be library experts — it’s not like you have to be a librarian to count books — we just need people who can count accurately.
- If you have 3 people who count books, that speeds up your counting.
- If you have 10 people who count books, your counting gets even faster.
- If you have 100 people who count books… that’s awesome!
A CPU is like a librarian.
- Just like you need a librarian running a library, you need a CPU. A CPU can basically do any jobs that you need done.
- Just like a librarian could count all of the books on their own, a CPU can do math things like building machine learning models.
A GPU is like a team of people counting books.
- Just like counting books is something that can be done by many people without specific library expertise, a GPU makes it much easier to take a job, split it among many different units, and do math things like building machine learning models.
A GPU can usually accomplish certain tasks much faster than a CPU can.
- If you’re part of a team of people who are counting books, maybe the librarian assigns every person a shelf. You count the books on your shelf. At the exact same time, everyone else counts the books on their shelves. Then you all come together, add up your books, and get the total number of books in the library.
This process is called parallelizing, which is just a fancy word for “we split a big job into small chunks, and these small chunks can be done at the same time.” We say parallelizing because we’re doing these jobs “in parallel.” You count your books at the same time as your neighbor counts their books. (Jobs that can't be done in parallel are usually done sequentially, which means "one after another.")
Let’s say you have 100 shelves in your library and it takes 1 minute to count all of the books on 1 shelf.
- If the librarian was the only person counting, they couldn’t parallelize their work, because only one person can count one stack of books at one time. So your librarian will take 100 minutes to count all of the books.
- If you have a team of 100 people and they each count their shelves, then every single book is counted in 1 minute. (Now, it will take a little bit of time for the librarian to assign who gets what shelf. It’ll also take a little bit of time to add all of the numbers together from the 100 people. But those parts are relatively fast, so you’re still getting your whole library counted in, maybe, 2-3 minutes.)
Three minutes instead of 100 minutes—that’s way better! Again: GPUs can usually accomplish certain tasks much faster than a CPU can.
There are some cases when a GPU probably isn’t needed
Let’s say you only have one shelf of books. Taking the time for the librarian to assign 100 people to different parts of the shelf, counting, then adding them up probably isn’t worth it. It might be faster for the librarian to just count all of the books.
If a data science job can’t be parallelized (split into small chunks where the small chunks can be done at the same time), then a GPU usually isn’t going to be helpful. Luckily for us, some very smart people have made the vast majority of data science tasks parallelizable.
Let’s look at a simple math example: calculating the average of a set of numbers.
If you calculate the average with a CPU, it’s kind of like using your librarian. Your CPU has to add up all of the numbers, then divide by the sample size.
If you leverage a GPU to help calculate the average, it’s kind of like using a full team. Your CPU splits the numbers up into small chunks, then each of your GPU workers (called cores) sums their chunk of numbers. Then, your CPU (librarian) will coordinate combining those numbers back together into the average.
If you’re calculating the average of a set of, say, a billion numbers, it will probably be much faster for your CPU to split that billion into chunks and having separate GPU workers doing the addition rather than your CPU doing all of it by itself.
Let’s look at a more complicated machine learning example: a random forest is basically a large number of decision trees. Let’s say you want to build a random forest with 100 decision trees.
If you build a random forest on a CPU, it’s kind of like using your librarian to do the entire counting on their own. Your CPU basically has to build a first tree, then build a second tree, then build a third tree, and so on.
If you leverage a GPU to help build a random forest, then it’s kind of like using a full team to count your books with the librarian coordinating everyone. One part of your GPU (called a core) will build the first tree. At the same time, another GPU core will build the second tree. This all happens simultaneously!
Here’s a good image from NVIDIA that helps to compare CPUs to GPUs.
Just like your librarian has to manage all sorts of things in the library (counting, organizing, staffing the front desk, replacing books), your CPU has a bunch of different jobs that it does. The green ALU boxes in the CPU image represent “arithmetic logic units.” These ALUs are used to do mathematical calculations. Your CPU can do some mathematical calculations on its own, just like your librarian can count books! But a lot of your CPU’s room is taken up by those other boxes, because your CPU is responsible for lots of other things, too. It’s not just for mathematical calculations.
Just like your team of counters are there to do one job (count books), your GPU is optimized to basically just do mathematical calculations. It’s way more powerful when it comes to doing math things.
So, in short:
CPUs have many jobs to do. CPUs can do mathematical calculations on their own.
GPUs are highly optimized to do mathematical calculations.
If you have a job that relies on math (like counting or averaging or building a machine learning model), then a GPU can probably do some of the math much faster than a CPU can. This is because we can use the discoveries of very smart people to parallelize (split big jobs into small chunks).
If you have a job that doesn’t rely on math or is very small, a GPU probably isn’t worth it.
Speaking of an analogy, here is a video about, quite literally, an analogy. Specifically analog CPUs (as opposed to digital). This video is very interesting, very well presented, and gives a full history of CPUs and GPUs usage wrt AI, and why the next evolution could be analog computers. Well worth watching!!
Ah, I was hoping for a Robot 3 analogy, they are always fantastic 🙂 Thanks all who shared!
A more simplified and general comparison:
CPU's are designed to coordinate AND calculate a bunch of math - they have a bunch of routing set up and they're going to have drivers [or operating systems] built to make that pathing and organizing as easy as the simple calculations. Because they're designed to be a "brain" for a computer, they're built to do it ALL.
GPU's are designed to be specialized for, well, graphics hence the name. To quickly render video and 3d graphics, you want a bunch of very simple calculations performed all at once - instead of having one "thing" [CPU cores] calculating the color for a 1920x1080 display [a total of 2073600 pixels], maybe you have 1920 "things" [GPU cores] dedicated to doing one line of pixels each and all running in parallel.
"Split this Hex code for this pixel's color into a separate R, G, and B value and send it to the screen's pixel matrix" is a much simpler task than, say, the "convert this video file into a series of frames, combine them with the current display frame of this other application, be prepared to interrupt this task to catch and respond to keyboard/mouse input, and keep this background process running the whole time..." tasks that a CPU might be doing. Because of this, a GPU can be slower and more limited than a CPU while still being useful, and it might have unique methods to complete its calculations so it can be specialized for X purpose [3d rendering takes more flexibility than "display to screen"]. Maybe it only knows very simple conversions or can't keep track of what it used to be doing - "history" isn't always useful for displaying graphics, especially if there's a CPU and a buffer [RAM] keeping track of history for you.
Since CPU's want to be usable for a lot of different things, there tends to be a lot of Operating Systems/drivers to translate between the higher level code I might write and the machine's specific registers and routing. BUT since a GPU is made with the default assumption "this is going to make basic graphics data more scalable" they often have more specialized machine functionality, and drivers can be much more limited in many cases. It might be harder to find a translator that can tell the GPU how to do the very specific thing that would be helpful in a specific use case, vs the multiple helpful translators ready to explain to your CPU how to do what you need
Wait, this analogy thing is fun. How about, if your CPU is a teacher, your GPU is a classroom full of elementary school students.
Sometimes it might be worth having the teacher explain to the class how to help with a group project… but it depends on the cost of the teacher having to figure out how to talk to each student in a way they'll understand and listen plus the energy the teacher now has to spend making sure they're doing what they're supposed to and getting the materials they need along the way. Meanwhile, your teacher came pre-trained and already knows how to do a bunch of more complicated tasks and organization!
If it's a project where a lot of unskilled but eager help can make things go faster, then it might be worth using a GPU. But before you can get the benefits, you need to make sure you know what languages each kid in the classroom speaks and what they already know how to do. Sometimes, its just easier and more helpful to focus on making sure your teachers can do the tasks themselves before recruiting the kids.