I've been studying the way goroutines work under the hood and I'm confused about the memory usage benefits from goroutines.
From what I've read, one of the most cited advantages of goroutines is that it only allocates 2kb of memory of stack per goroutine, which grows dynamically according to usage, while kernel threads allocate a few MB of memory (Linux, Windows) by default, depending on the platform.
But from what I can understand, the few megabytes allocated by the kernel threads are allocated in virtual memory and initially only two pages are allocated in physical memory (one for the stack and one guard page), which would total 8kb in a system with 4kb pages. The OS then allocates more stack pages on demand until it reaches the total allocated virtual memory. This seems to be true both on Linux and Windows.
Shouldn't this give only a 4x reduction in memory usage of goroutines when compared to kernel threads? (which is a pretty big reduction, but not in the order that's often advertised in online articles)
I also understand that goroutines avoid some overheads from context switching that are necessary in kernel threads, since they avoid some userspace to kernel transitions. But in this article, Ron Pressler argues that the majority of the scalability benefits from user level threads comes from being able to do more things concurrently (due to less memory usage), rather than the elimination of task switching overheads.
So am I correct in believing that there would be only a ~4x improvement in scalability by using goroutines? are there other limits in play that reduces the scalability of kernel threads?
EDIT: My knowledge of operating systems concepts is quite rusty so there might be incorrect information in my question, particularly with regards to how the physical memory of threads are allocated. But I believe the question is still valid since the idea of the OS not immediately allocating a couple megabytes of physical memory seems to be true.
EDIT: If anyone is still interested. After researching a little bit more, it seems like the advantages of goroutines come down to having better granularity and being able to shrink the stack size.
On this HN thread, someone argued the same thing as me:
...; Take stack size for example: assuming the kernel stack is 10kB, if your thread itself uses 10kB of stack, you've cut the theoretical memory advantage of M:N down from the cited 1000x to a mere 2x…
To which Ron Pressler answered:
Ah, except that's not so easy to do. It's very hard to have "tight" stacks that are managed by the kernel for the reasons I mentioned here: (link to answer which I copy pasted below)
Once a page is committed, it cannot be uncommitted until the thread dies, because the OS can't be sure how much of the stack is actually used. It cannot even assume that only addresses above sp are used. Also, the granularity is that of a page, which could be significantly larger than a whole stack of some small, "shallow" thread, and we want lots of small threads.
This seems to be inline with Joe Armstrong says around 13 mins in this presentation about Erlang.
It's threads aren't in the programming language, threads are something in the operating system – and they inherit all the problems that they have in the operating system. One of the problems is granularity of the memory management system. The memory management in the operating system protects whole pages of memory, so the smallest size that a thread can be is the smallest size of a page. That's actually too big.
multithreading - the difference between goroutine and thread - Stack Overflow
concurrency - Are go-langs goroutine pools just green threads? - Software Engineering Stack Exchange
What are the differences when running goroutines on single thread using GO vs NODE.js
multithreading - Difference between threads in Go and C++ - Stack Overflow
Videos
Hello,
I will try to explain what I mean. I am learning about goroutines, and I understand that they are like Go's "special" threads that run on real OS threads. They are very lightweight, so in my mind, I imagine them as buckets in asynchronous programming. Similar to Node.js, which follows a single-threaded asynchronous request/response model.
My question is, what are the differences in handling requests between goroutines and a single-threaded asynchronous approach, like Node.js, when running on a single thread?
what in theory will be faster ?
Goroutines are not the same as OS threads. Threads in C++ are actually OS threads. Answer can be very complex, but in short:
- Go has its own runtime
- Go can run multiple goroutines on one OS thread, it's managed by Go scheduler.
- Go scheduler is work-stealing. You can find more details here: Scheduling In Go : Part II - Go Scheduler
- Goroutines is lightweight. They has only several kB stacks (2-8kB, depends on Go version and it's also dynamic: can be increased/decreased by Go). OS thread size usually several MB.
- Only three registers involved (saved/restored) in switching Goroutine: Program Counter, Stack Pointer and DX. This is why it can be switched very fast comparing to OS threads and why you can spawn more goroutines than OS threads.
I don't understand is how is Go handling this situation
In Golang there's a layer, a runtime scheduler, between the actual system threads and your code. This scheduler keeps a thread pool of actual OS threads which it uses to execute your threaded code.
why is Cpp not configured to handle the situation in the same way?
Because C++ threads are OS threads. You'll have to create a thread pool if you need it - or use Coroutines, std::async or Execution Polices.