Brave Search

How can goroutines be more scalable than kernel threads, if the kernel threads only use a few pages of physical memory?

reddit.com › r › golang › comments › 117a4x7 › how_can_goroutines_be_more_scalable_than_kernel

With M:N scheduling and channels, you can get further efficiency due to optimizations (limiting) in context switching. This talk explains it well: https://youtu.be/KBZlN0izeiY?t=536 . Watch through at least 18:50. Answer from thabc on reddit.com

Medium

daminibansal.medium.com › os-threads-vs-goroutines-understanding-the-concurrency-model-in-go-bad187372c89

OS Threads vs Goroutines: Understanding the Concurrency Model in Go | by Damini Bansal | Medium

November 13, 2024 - Managed by Go runtime: Goroutines are scheduled and managed by the Go runtime rather than the OS kernel. The Go runtime uses a small, highly efficient scheduler to multiplex many goroutines onto a smaller number of OS threads.

reddit.com › r/golang › how can goroutines be more scalable than kernel threads, if the kernel threads only use a few pages of physical memory?

r/golang on Reddit: How can goroutines be more scalable than kernel threads, if the kernel threads only use a few pages of physical memory?

February 21, 2023 -

I've been studying the way goroutines work under the hood and I'm confused about the memory usage benefits from goroutines.

From what I've read, one of the most cited advantages of goroutines is that it only allocates 2kb of memory of stack per goroutine, which grows dynamically according to usage, while kernel threads allocate a few MB of memory (Linux, Windows) by default, depending on the platform.

But from what I can understand, the few megabytes allocated by the kernel threads are allocated in virtual memory and initially only two pages are allocated in physical memory (one for the stack and one guard page), which would total 8kb in a system with 4kb pages. The OS then allocates more stack pages on demand until it reaches the total allocated virtual memory. This seems to be true both on Linux and Windows.

Shouldn't this give only a 4x reduction in memory usage of goroutines when compared to kernel threads? (which is a pretty big reduction, but not in the order that's often advertised in online articles)

I also understand that goroutines avoid some overheads from context switching that are necessary in kernel threads, since they avoid some userspace to kernel transitions. But in this article, Ron Pressler argues that the majority of the scalability benefits from user level threads comes from being able to do more things concurrently (due to less memory usage), rather than the elimination of task switching overheads.

So am I correct in believing that there would be only a ~4x improvement in scalability by using goroutines? are there other limits in play that reduces the scalability of kernel threads?

EDIT: My knowledge of operating systems concepts is quite rusty so there might be incorrect information in my question, particularly with regards to how the physical memory of threads are allocated. But I believe the question is still valid since the idea of the OS not immediately allocating a couple megabytes of physical memory seems to be true.

EDIT: If anyone is still interested. After researching a little bit more, it seems like the advantages of goroutines come down to having better granularity and being able to shrink the stack size.

On this HN thread, someone argued the same thing as me:

...; Take stack size for example: assuming the kernel stack is 10kB, if your thread itself uses 10kB of stack, you've cut the theoretical memory advantage of M:N down from the cited 1000x to a mere 2x…

To which Ron Pressler answered:

Ah, except that's not so easy to do. It's very hard to have "tight" stacks that are managed by the kernel for the reasons I mentioned here: (link to answer which I copy pasted below)

Once a page is committed, it cannot be uncommitted until the thread dies, because the OS can't be sure how much of the stack is actually used. It cannot even assume that only addresses above sp are used. Also, the granularity is that of a page, which could be significantly larger than a whole stack of some small, "shallow" thread, and we want lots of small threads.

This seems to be inline with Joe Armstrong says around 13 mins in this presentation about Erlang.

It's threads aren't in the programming language, threads are something in the operating system – and they inherit all the problems that they have in the operating system. One of the problems is granularity of the memory management system. The memory management in the operating system protects whole pages of memory, so the smallest size that a thread can be is the smallest size of a page. That's actually too big.

Top answer

1 of 15

49

With M:N scheduling and channels, you can get further efficiency due to optimizations (limiting) in context switching. This talk explains it well: https://youtu.be/KBZlN0izeiY?t=536 . Watch through at least 18:50.

2 of 15

26

Goroutine, aks "green threads", aka user scheduled threads are effectively a way to handle event driven callbacks. Each thread will run until it needs to wait on a read, or an event, or a lock, and then the thread will jump to the next one. No time wasted switching out contexts or waiting for the OS to wake up the next thread with stuff todo. Using the OS, do the threads even wake up on the same processor? You have 16 processors arbitrarily waking and stealing and moving and blasting their caches with around 100k threads, you're going to have a bad time. Switching threads ( which are just processes sharing memory space + file descriptors and a few other attributes in linux ) isn't free. Using # of processor threads each keeping some percent of the events in their own queues avoids all that overhead. You can poll all of the events for a given thread for readiness in a single kernel epoll call ( or whatever windows completion ports equivalent goes here ). Better than 100k OS threads each waiting to have its context dragged into place on a processor behind 100k blocking waits. Linux has process ids that usually cap out around 32k or 64k as well.

Discussions

multithreading - the difference between goroutine and thread - Stack Overflow

So “threads but better”. ... Sign up to request clarification or add additional context in comments. ... Find the answer to your question by asking. Ask question ... See similar questions with these tags. ... New site design and philosophy for Stack Overflow: Starting February 24, 2026... I’m Jody, the Chief Product and Technology Officer at Stack Overflow. Let’s... 14 What is relationship between goroutine ... More on stackoverflow.com

stackoverflow.com

concurrency - Are go-langs goroutine pools just green threads? - Software Engineering Stack Exchange

A single process faking multiple threads has a lot of problems. One of them is that all the faked threads stall on any page fault. My question is - are go-lang's goroutines (for a default pool) just green threads? More on softwareengineering.stackexchange.com

softwareengineering.stackexchange.com

December 30, 2013

What are the differences when running goroutines on single thread using GO vs NODE.js

Conceptually, the JS version will use less memory and is potentially faster, as it’s stackless. But of course, Go is faster because of its runtime and compiled nature. Here’s a post from someone who tried to benchmark the differences (note: I found this on Google, haven’t checked it out myself) https://matklad.github.io/2021/03/22/async-benchmarks-index.html Edit: looks like I’m getting downvoted for actually answering the question lol. Go trades memory usage and performance for developer convenience in this situation. There’s 4-5KB memory overhead per goroutine and on top of that the compiler needs to inject cooperative scheduling to stop compute heavy code from stealing all the cpu cycles. The model of the nodejs execution is more efficient, and if implemented in Rust it would outperform golang easily, but in a nodejs comparison with golang the latter easily wins. More on reddit.com

r/golang

45

36

November 6, 2023

multithreading - Difference between threads in Go and C++ - Stack Overflow

Only three registers involved (saved/restored) in switching Goroutine: Program Counter, Stack Pointer and DX. This is why it can be switched very fast comparing to OS threads and why you can spawn more goroutines than OS threads. More on stackoverflow.com

stackoverflow.com

Videos

youtube.com

Go Routines vs Threads | Why are goroutines blazingly fast

00:56

YouTube

Go: Threads vs. Goroutines - YouTube

December 1, 2024

00:51

YouTube

Goroutines vs. Threads: Go's Concurrency Secret (2KB vs 1MB!) - ...

September 27, 2025

instagram.com

Unlock Go's Secret to Scalability: Goroutines vs OS Threads

18:09

YouTube

Parallelism & Concurrency | Multitasking & Multiprocessing | Thread ...

January 23, 2024

218

m.youtube.com

#golang #striversity 08.10 - Go Goroutine vs Thread

View all

GeeksforGeeks

geeksforgeeks.org › go language › golang-goroutine-vs-thread

Golang | Goroutine vs Thread - GeeksforGeeks

July 12, 2025 - Goroutine: A Goroutine is a function or method which executes independently and simultaneously in connection with any other Goroutines present in your program. Or in other words, every concurrently executing activity in Go language is known ...

Google Groups

groups.google.com › g › golang-nuts › c › j51G7ieoKh4 › m › wxNaKkFEfvcJ

Goroutines vs OS threads

This is mainly because: > > 1. ... switching becomes significant > (isn't this true in Go as well?) The context of a goroutine is much smaller and easier to change than the context of an OS thread....

Jayconrod

jayconrod.com › posts › 128 › goroutines-the-concurrency-model-we-wanted-all-along

Goroutines: the concurrency model we wanted all along — jayconrod.com

July 2, 2023 - A goroutine is Go's version of a thread. Like threads in other languages, each goroutine has its own stack. Goroutines may execute in parallel, concurrently with other goroutines.

Google Groups

groups.google.com › g › golang-nuts › c › 0Szdmmy22pk

[go-nuts] Goroutines vs. Threads vs. Processes

Hello all, Although operating system ... run in their own independent address space. THREADS share address space with other threads... So, here's the question... I know that Goroutines are the construct within Go that allow for parallelization, but the definitions I've seen ...

Stack Overflow

stackoverflow.com › questions › 46944813 › the-difference-between-goroutine-and-thread

multithreading - the difference between goroutine and thread - Stack Overflow

Top answer

1 of 1

8

Thread is a natural OS object it’s have enough. Threads manipulations are expensive operations. They require switch to kernel return back, save and restore stack and so on. Many servers used threads but it’s unreal to keep a lot of threads and do not go out of resources. Also there’s a special task to synchronize them.

So new concept emerged - coroutine or coprogram. They could be imagined as parts of execution path between synchronization points: input-output, send-receive so on. They are very light and could be better orchestrated

So “threads but better”.

Find elsewhere

Google Bing Mojeek

Words from Shane

shane.ai › posts › threads-and-goroutines

Threads and Goroutines :: Words from Shane

June 12, 2023 - From a programmer point of view, a goroutine is basically a thread. It’s a function that runs concurrently (and potentially in parallel) with the rest of your program. Executing a function in a goroutine can allow you to utilize more CPU cores.

Stack Exchange

softwareengineering.stackexchange.com › questions › 222642 › are-go-langs-goroutine-pools-just-green-threads

concurrency - Are go-langs goroutine pools just green threads? - Software Engineering Stack Exchange

Top answer

1 of 1

90

I'm only a casual Go user, so take the following with a grain of salt.

Wikipedia defines green threads as "threads that are scheduled by a virtual machine (VM) instead of natively by the underlying operating system". Green threads emulate multithreaded environments without relying on any native OS capabilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support.

Go (or more exactly the two existing implementations) is a language producing native code only - it does not use a VM. Furthermore, the scheduler in the current runtime implementations relies on OS level threads (even when GOMAXPROCS=1). So I think talking about green threads for the Go model is a bit abusive.

Go people have coined the goroutine term especially to avoid the confusion with other concurrency mechanisms (such as coroutines or threads or lightweight processes).

Of course, Go supports a M:N threading model, but it looks much closer to the Erlang process model, than to the Java green thread model.

Here are a few benefits of the Go model over green threads (as implemented in early JVM):

Multiple cores or CPUs can be effectively used, in a transparent way for the developer. With Go, the developer should take care of concurrency. The Go runtime will take care of parallelism. Java green threads implementations did not scale over multiple cores or CPUs.
System and C calls are non blocking for the scheduler (all system calls, not only the ones supporting multiplexed I/Os in event loops). Green threads implementations could block the whole process when a blocking system call was done.
Copying or segmented stacks. In Go, there is no need to provide a maximum stack size for the goroutine. The stack grows incrementally as needed. One consequence is a goroutine does not require much memory (4KB-8KB), so a huge number of them can be happily spawned. Goroutine usage can therefore be pervasive.

Now, to address the criticisms:

With Go, you do not have to write a userspace scheduler: it is already provided with the runtime. It is a complex piece of software, but it is the problem of Go developers, not of Go users. Its usage is transparent for Go users. Among the Go developers, Dmitri Vyukov is an expert in lockfree/waitfree programming, and he seems to be especially interested in addressing the eventual performance issues of the scheduler. The current scheduler implementation is not perfect, but it will improve.
Synchronization brings performance problem and complexity: this is partially true with Go as well. But note the Go model tries to promote the usage of channels and a clean decomposition of the program in concurrent goroutines to limit synchronization complexity (i.e. sharing data by communicating, instead of sharing memory to communicate). By the way, the reference Go implementation provides a number of tools to address performance and concurrency issues, like a profiler, and a race detector.
Regarding page fault and "multiple threads faking", please note Go can schedule goroutine over multiple system threads. When one thread is blocked for any reason (page fault, blocking system calls), it does not prevent the other threads to continue to schedule and run other goroutines. Now, it is true that a page fault will block the OS thread, with all the goroutines supposed to be scheduled on this thread. However in practice, the Go heap memory is not supposed to be swapped out. This would be the same in Java: garbage collected languages do not accomodate virtual memory very well anyway. If your program must handle page fault in a graceful way, if it probably because it has to manage some off-heap memory. In that case, wrapping the corresponding code with C accessor functions will simply solve the problem (because again C calls or blocking system calls never block the Go runtime scheduler).

So IMO, goroutines are not green threads, and the Go language and current implementation mostly addresses these criticisms.

reddit.com › r/golang › what are the differences when running goroutines on single thread using go vs node.js

r/golang on Reddit: What are the differences when running goroutines on single thread using GO vs NODE.js

November 6, 2023 -

Hello,
I will try to explain what I mean. I am learning about goroutines, and I understand that they are like Go's "special" threads that run on real OS threads. They are very lightweight, so in my mind, I imagine them as buckets in asynchronous programming. Similar to Node.js, which follows a single-threaded asynchronous request/response model.

My question is, what are the differences in handling requests between goroutines and a single-threaded asynchronous approach, like Node.js, when running on a single thread?

what in theory will be faster ?

Top answer

1 of 7

100

Conceptually, the JS version will use less memory and is potentially faster, as it’s stackless. But of course, Go is faster because of its runtime and compiled nature. Here’s a post from someone who tried to benchmark the differences (note: I found this on Google, haven’t checked it out myself) https://matklad.github.io/2021/03/22/async-benchmarks-index.html Edit: looks like I’m getting downvoted for actually answering the question lol. Go trades memory usage and performance for developer convenience in this situation. There’s 4-5KB memory overhead per goroutine and on top of that the compiler needs to inject cooperative scheduling to stop compute heavy code from stealing all the cpu cycles. The model of the nodejs execution is more efficient, and if implemented in Rust it would outperform golang easily, but in a nodejs comparison with golang the latter easily wins.

2 of 7

21

what in theory will be faster ? ??? Execution speed is complicated and simple questions about performance lead to bad and useless answers.

Codecademy

codecademy.com › docs › go › goroutines

Go | Goroutines | Codecademy

January 4, 2023 - Comparatively, goroutines are light-weight versions of threads that operate within the context of the Go runtime.

Agrim123

agrim123.com › posts › goroutines-vs-threads.html

Goroutines vs Threads · agrim

April 14, 2020 - The creation of a goroutine does not require much memory, only 2kB of stack space. They grow by allocating and freeing heap storage as required. Whereas threads start at a much larger space, along with a region of memory called a guard page that acts as a guard between one thread’s memory and another.

Stack Overflow

stackoverflow.com › questions › 76988506 › difference-between-threads-in-go-and-c

multithreading - Difference between threads in Go and C++ - Stack Overflow

Top answer

1 of 3

11

Goroutines are not the same as OS threads. Threads in C++ are actually OS threads. Answer can be very complex, but in short:

Go has its own runtime
Go can run multiple goroutines on one OS thread, it's managed by Go scheduler.
Go scheduler is work-stealing. You can find more details here: Scheduling In Go : Part II - Go Scheduler
Goroutines is lightweight. They has only several kB stacks (2-8kB, depends on Go version and it's also dynamic: can be increased/decreased by Go). OS thread size usually several MB.
Only three registers involved (saved/restored) in switching Goroutine: Program Counter, Stack Pointer and DX. This is why it can be switched very fast comparing to OS threads and why you can spawn more goroutines than OS threads.

2 of 3

5

I don't understand is how is Go handling this situation

In Golang there's a layer, a runtime scheduler, between the actual system threads and your code. This scheduler keeps a thread pool of actual OS threads which it uses to execute your threaded code.

why is Cpp not configured to handle the situation in the same way?

Because C++ threads are OS threads. You'll have to create a thread pool if you need it - or use Coroutines, std::async or Execution Polices.

Classflame

classflame.com › languages › Understanding Concurrency: Goroutines in Go vs. C++ Threads

Understanding Concurrency: Goroutines in Go vs. C++ Threads

Goroutines are lightweight with each requiring only a few kilobytes of stack memory . This allows for creating...

Packtpub

subscription.packtpub.com › book › programming › 9781801079310 › 7 › ch07lvl1sec77 › processes-threads-and-goroutines

Processes, threads, and goroutines

A quick and simplistic way to differentiate a thread from a process is to consider a process as the running binary file and a thread as a subset of a process. A goroutine is the minimum Go entity...

Adroitcoders

adroitcoders.com › what-is-a-goroutine-in-golang-and-how-does-it-differ-from-a-traditional-thread

What is a goroutine in Golang and how does it differ from a traditional thread? - Adroit Coders Software Development Services

July 12, 2023 - 1. Goroutines are lightweight: Goroutines are much lighter in terms of memory and overhead compared to traditional threads. While a thread typically requires a few kilobytes of memory, a goroutine starts with just a few kilobytes and can dynamically ...

Hacker News

news.ycombinator.com › item

Goroutines and Async-Await are two very different takes on the same problem. In ... | Hacker News

May 7, 2019 - In C# or Python 3, each function is colored as either sync or async. You can quite easily call an async function from a sync context, but doing a blocking sync call from an async context is forbidden (although possible) · In Go there is no such distinction: you just write your functions in ...

GitConnected

levelup.gitconnected.com › understanding-processes-threads-and-goroutines-in-the-context-of-go-de2fec21c801

Understanding Processes, Threads, and Goroutines in the Context of Go | by Adam Szpilewicz | Level Up Coding

April 28, 2024 - Threads are lighter than processes, making them more efficient in terms of resource use and…

YouTube

youtube.com › watch

#golang #striversity 08.10 - Go Goroutine vs Thread - YouTube

31:17

Finally, let's talk about how goroutines compare to OS threads.Documentation, slides, and code for the series are here:https://github.com/xiliax/web-programm...

Published March 26, 2017

Medium

durgeshatal1995.medium.com › understanding-the-differences-between-go-routines-and-threads-b631068d4fdd

Understanding the Differences Between Go Routines and Threads | by durgesh atal | Medium

December 5, 2023 - A Go routine is a lightweight, independently executing unit of work managed by the Go runtime. Unlike traditional threads, Go routines are not OS threads; they are managed by the Go scheduler, allowing for efficient concurrency with minimal overhead.