Thread is a natural OS object it’s have enough. Threads manipulations are expensive operations. They require switch to kernel return back, save and restore stack and so on. Many servers used threads but it’s unreal to keep a lot of threads and do not go out of resources. Also there’s a special task to synchronize them.
So new concept emerged - coroutine or coprogram. They could be imagined as parts of execution path between synchronization points: input-output, send-receive so on. They are very light and could be better orchestrated
So “threads but better”.
Answer from Eugene Lisitsky on Stack Overflowmultithreading - the difference between goroutine and thread - Stack Overflow
concurrency - How do goroutines work? (or: goroutines and OS threads relation) - Stack Overflow
Is it worth using Goroutines (or multi-threading in general) when nothing is blocking?
How can goroutines be more scalable than kernel threads, if the kernel threads only use a few pages of physical memory?
Videos
If a goroutine is blocking, the runtime will start a new OS thread to handle the other goroutines until the blocking one stops blocking.
Reference : https://groups.google.com/forum/#!topic/golang-nuts/2IdA34yR8gQ
Ok, so here's what I've learned: When you're doing raw syscalls, Go indeed creates a thread per blocking goroutine. For example, consider the following code:
package main
import (
"fmt"
"syscall"
)
func block(c chan bool) {
fmt.Println("block() enter")
buf := make([]byte, 1024)
_, _ = syscall.Read(0, buf) // block on doing an unbuffered read on STDIN
fmt.Println("block() exit")
c <- true // main() we're done
}
func main() {
c := make(chan bool)
for i := 0; i < 1000; i++ {
go block(c)
}
for i := 0; i < 1000; i++ {
_ = <-c
}
}
When running it, Ubuntu 12.04 reported 1004 threads for that process.
On the other hand, when utilizing Go's HTTP server and opening 1000 sockets to it, only 4 operating system threads were created:
package main
import (
"fmt"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hi there, I love %s!", r.URL.Path[1:])
}
func main() {
http.HandleFunc("/", handler)
http.ListenAndServe(":8080", nil)
}
So, it's a mix between an IOLoop and a thread per blocking system call.
This is more of a computer science question, but for a program that has no blocking operations (e.g. file or network IO), and just needs to churn through some data, is it worth parallelising this? If the work needs to be done either way, does adding Goroutines make it any faster?
Sorry if this is a silly question, I've always seen multithreading as a way to handle totally different lines of execution, rather than just approaching the same line of execution with multiple threads.
I've been studying the way goroutines work under the hood and I'm confused about the memory usage benefits from goroutines.
From what I've read, one of the most cited advantages of goroutines is that it only allocates 2kb of memory of stack per goroutine, which grows dynamically according to usage, while kernel threads allocate a few MB of memory (Linux, Windows) by default, depending on the platform.
But from what I can understand, the few megabytes allocated by the kernel threads are allocated in virtual memory and initially only two pages are allocated in physical memory (one for the stack and one guard page), which would total 8kb in a system with 4kb pages. The OS then allocates more stack pages on demand until it reaches the total allocated virtual memory. This seems to be true both on Linux and Windows.
Shouldn't this give only a 4x reduction in memory usage of goroutines when compared to kernel threads? (which is a pretty big reduction, but not in the order that's often advertised in online articles)
I also understand that goroutines avoid some overheads from context switching that are necessary in kernel threads, since they avoid some userspace to kernel transitions. But in this article, Ron Pressler argues that the majority of the scalability benefits from user level threads comes from being able to do more things concurrently (due to less memory usage), rather than the elimination of task switching overheads.
So am I correct in believing that there would be only a ~4x improvement in scalability by using goroutines? are there other limits in play that reduces the scalability of kernel threads?
EDIT: My knowledge of operating systems concepts is quite rusty so there might be incorrect information in my question, particularly with regards to how the physical memory of threads are allocated. But I believe the question is still valid since the idea of the OS not immediately allocating a couple megabytes of physical memory seems to be true.
EDIT: If anyone is still interested. After researching a little bit more, it seems like the advantages of goroutines come down to having better granularity and being able to shrink the stack size.
On this HN thread, someone argued the same thing as me:
...; Take stack size for example: assuming the kernel stack is 10kB, if your thread itself uses 10kB of stack, you've cut the theoretical memory advantage of M:N down from the cited 1000x to a mere 2x…
To which Ron Pressler answered:
Ah, except that's not so easy to do. It's very hard to have "tight" stacks that are managed by the kernel for the reasons I mentioned here: (link to answer which I copy pasted below)
Once a page is committed, it cannot be uncommitted until the thread dies, because the OS can't be sure how much of the stack is actually used. It cannot even assume that only addresses above sp are used. Also, the granularity is that of a page, which could be significantly larger than a whole stack of some small, "shallow" thread, and we want lots of small threads.
This seems to be inline with Joe Armstrong says around 13 mins in this presentation about Erlang.
It's threads aren't in the programming language, threads are something in the operating system – and they inherit all the problems that they have in the operating system. One of the problems is granularity of the memory management system. The memory management in the operating system protects whole pages of memory, so the smallest size that a thread can be is the smallest size of a page. That's actually too big.