Python concurrency only runs on one thread, because of the global interpreter lock and how asyncio works. No idea on Go, assume it is multithreaded. Answer from nekokattt on reddit.com
🌐
Reddit
reddit.com › r/python › how different is python concurrency vs. golang concurrency?
r/Python on Reddit: How different is python concurrency vs. Golang concurrency?
February 18, 2022 - Well then we just disagree. Having used both some things are easier in python vs go like timing out a coroutine vs a goroutine (1 line vs 10+).
🌐
Quora
quora.com › Are-Python-coroutines-and-Go-goroutines-the-same
Are Python coroutines and Go goroutines the same? - Quora
The Python coroutines operate within an async/await programming model. Go goroutines use a synchronous channel based programming model. Python coroutines are still limited in concurrency by the Global interpreter lock.
🌐
Reddit
reddit.com › r/golang › goroutine
r/golang on Reddit: Goroutine
June 29, 2023 -

"Goroutines are unique to Go (though some other languages have a concurrency

primitive that is similar). They’re not OS threads, and they’re not exactly green

threads—threads that are managed by a language’s runtime—they’re a higher level of

abstraction known as coroutines."

Hey, I was able to understand this statement to a certain level but can someone make it a bit more clear. I can't wrap my head around what exactly a coroutine is.

I am referring the book 'Concurrency in Go' by Katherine Cox-Buday. Additionally, if you guys have any suggestions for references do list them down.

Top answer
1 of 5
36
OS threads are relatively heavy weight. They are great for computation type workloads but IO type workloads where threads do limited work between waiting for IO (typical in server workloads) are less ideal for OS threads. They also have a relatively expensive setup and teardown costs as well as at least one page of dedicated memory per thread. Green threads was a technique early versions of the JVM supported where N Java threads could be multiplexed onto a single OS thread. Despite what wikipedia says, they are not easy to implement - you need to make sure NO blocking calls are made or all threads other threads hang (among other issues especially at the time). I'm not even sure any version of Java actually shipped used green threads by default. Go multiplexes N gouroutines onto M (where M <= N) OS threads. This is apparently a reasonable approach as it allows a large number of goroutines, and in theory an occasional blocking call doesn't block everything, etc. Java gets around some of the cost of using OS threads (green threads were removed pretty early) by using thread pools, aka threads are not terminated right away and can be reused once the Java thread is logically finished. This reduces the setup/teardown costs but not necessarily the memory footprint costs.
2 of 5
9
i think threads (including green threads) are simply parallel threads of execution that premtively switch -- usually on a timer. coroutines perform cooperative multitasking where they return control to the scheduler at opportune times like when they block. so i think the big distinction threads use preemtive task switching while coroutines use cooperative.
Top answer
1 of 7
88

IMO, a coroutine implies supporting of explicit means for transferring control to another coroutine. That is, the programmer programs a coroutine in a way when they decide when a coroutine should suspend execution and pass its control to another coroutine (either by calling it or by returning/exiting (usually called yielding)).

Go's "goroutines" are another thing: they implicitly surrender control at certain indeterminate points1 which happen when the goroutine is about to sleep on some (external) resource like I/O completion, channel send etc. This approach combined with sharing state via channels enables the programmer to write the program logic as a set of sequential light-weight processes which removes the spaghetti code problem common to both coroutine- and event-based approaches.

Regarding the implementation, I think they're quite similar to the (unfortunately not too well-known) "State Threads" library, just quite lower-level (as Go doesn't rely on libc or things like this and talks directly to the OS kernel) — you could read the introductory paper for the ST library where the concept is quite well explained.


Update from 2023-08-25: Russ Cox has written a good essay on why a standard coroutine package for Go would be useful, and how it could look like.


1 In fact, these points are less determinate than those of coroutines but more determinate than with true OS threads under preemptive multitasking, where each thread might be suspended by the kernel at any given point in time and in the flow of the thread's control.
Update on 2021-05-28: actually, since Go 1.14, goroutines are scheduled (almost) preemptively. It should be noted though, that it's still not that hard-core preemption a typical kernel does to the threads it manages but it's quite closer than before; at least it's now impossible for a goroutine to become non-preemptible once it enters a busy loop.

2 of 7
68

Not quite. The Go FAQ section Why goroutines instead of threads? explains:

Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.

To make the stacks small, Go's run-time uses resizable, bounded stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time grows (and shrinks) the memory for storing the stack automatically, allowing many goroutines to live in a modest amount of memory. The CPU overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number.

🌐
Reddit
reddit.com › r/golang › structured concurrency & go
r/golang on Reddit: Structured concurrency & Go
February 21, 2026 -

I work at one of those large companies where migration work never stops. We recently acquired a few other companies. To coalesce the platforms of multiple companies, we're rewriting a big chunk of our codebase in Go. The new platform itself is also being built from scratch in Go.

But the catch is we haven't historically been a Go shop. A lot of folks are coming from Python and Kotlin backends. So in our knowledge-sharing channel, we constantly see feature comparisons across these languages.

One thing that came up recently is how hard structured concurrency feels in Go. go func() is unstructured by default unless you wire it up with sync primitives like WaitGroup. A bunch of people also pointed out how Python’s TaskGroup or Kotlin’s coroutineScope make cancellation feel trivial. In Go, cancellation semantics require explicit context checking and manual bailouts.

We had some interesting internal discussions around this that I think would be valuable for others going through similar journey.

So I summarized some of the key points that came up and added a few examples. I’m curious how others approach structured concurrency in Go. How do you avoid the usual leaks that happen with manual plumbing?

https://rednafi.com/go/structured-concurrency/

🌐
The Content Authority
thecontentauthority.com › home › grammar › word usage › goroutine vs coroutine: unraveling commonly confused terms
Goroutine vs Coroutine: Unraveling Commonly Confused Terms
June 27, 2023 - Goroutines are an efficient way to handle I/O-bound tasks. A coroutine is a cooperative multitasking technique. Coroutines are used in many programming languages, including Python and Lua.
🌐
O'Reilly
oreilly.com › library › view › go-building-web › 9781787123496 › ch19s03.html
Understanding goroutines versus coroutines - Go: Building Web Applications [Book]
August 31, 2016 - A coroutine is a cooperative task control mechanism, but in its most simplistic sense, a coroutine is not concurrent. While coroutines and goroutines are utilized in similar ways, Go's focus on concurrency provides a lot more than just state ...
Authors   Nathan KozyraMat Ryer
Published   2016
Pages   665
🌐
Hacker News
news.ycombinator.com › item
Creating millions of coroutines or "goroutines" isn't really interesting by itse... | Hacker News
October 10, 2018 - 1. is Go scheduler better than Linux scheduler if you have a thousand concurrent goroutines or threads · 2. is really creating goroutines that much faster than creating a native thread
Top answer
1 of 2
12

I think I know part of the answer. I tried to summarize my understanding of the differences, in order of importance, between asyncio tasks and goroutines:

1) Unlike under asyncio, one rarely needs to worry that their goroutine will block for too long. OTOH, memory sharing across goroutines is akin to memory sharing across threads rather than asyncio tasks since goroutine execution order guarantees are much weaker (even if the hardware has only a single core).

asyncio will only switch context on explicit await, yield and certain event loop methods, while Go runtime may switch on far more subtle triggers (such as certain function calls). So asyncio is perfectly cooperative, while goroutines are only mostly cooperative (and the roadmap suggests they will become even less cooperative over time).

A really tight loop (such as with numeric computation) could still block Go runtime (well, the thread it's running on). If it happens, it's going to have less of an impact than in python - unless it occurs in mutliple threads.

2) Goroutines are have off-the-shelf support for parallel computation, which would require a more sophisticated approach under asyncio.

Go runtime can run threads in parallel (if multiple cores are available), and so it's somewhat similar to running multiple asyncio event loops in a thread pool under a GIL-less python runtime, with a language-aware load balancer in front.

3) Go runtime will automatically handle blocking syscalls in a separate thread; this needs to be done explicitly under asyncio (e.g., using run_in_executor).

That said, in terms of memory cost, goroutines are very much like asyncio tasks rather than threads.

2 of 2
1

I suppose you could think of it working that way underneath, sure. It's not really accurate, but, close enough.

But there is a big difference: in Go you can write straight line code, and all the I/O blocking is handled for you automatically. You can call Read, then Write, then Read, in simple straight line code. With Python asyncio, as I understand it, you need to queue up a function to handle the reads, rather than just calling Read.

Find elsewhere
🌐
Python.org
discuss.python.org › python help
Have there some like Goroutines in Python 3.13 or maybe 3.14 - Python Help - Discussions on Python.org
May 15, 2024 - I’m referring to A Tour of Go and I’m not saying that we must use the go keyword but should be fine to have something to manage the multithreading automatically, but… which should be the approach?
🌐
Hacker News
news.ycombinator.com › item
Everything *can* be a coroutine (goroutine) in Go, very simple. Just add `go` be... | Hacker News
February 15, 2024 - It's literally as simple as it sounds: · https://gobyexample.com/goroutines
🌐
Reddit
reddit.com › r/golang › go vs python async?
r/golang on Reddit: Go vs Python async?
October 29, 2017 -

For anyone who's tried both Go and Python async (ex uvloop + sanic, apistar, etc) for their webapp, what are the pros and cons of working in each language?

🌐
DEV Community
dev.to › leapcell › concurrency-in-go-vs-rustc-goroutines-vs-coroutines-27f5
Concurrency in Go vs Rust/C++: Goroutines vs Coroutines - DEV Community
March 27, 2025 - Although the names and implementation methods of coroutines vary among different languages, essentially, coroutines are mainly divided into two categories: stackful coroutines and stackless coroutines. The former is represented by goroutines, and the latter is typified by async/await.
🌐
Vipul Sharma's Blog
vipul.xyz › 2017 › 09 › performance-analysis-goroutine-pythons-coroutine.html
Performance Analysis: Goroutine & Python’s Coroutine
September 3, 2017 - I made 1000 HTTP requests using Goroutines and Python’s Coroutines · Used Go 1.6.2 and Python 3.6 · Implemented in Go using net/http package · Implemented in Python using aiohttp, requests and urllib3 libraries · Ran it over $10 DigitalOcean droplet · Scroll to bottom of this post to see results.
🌐
Reddit
reddit.com › r/golang › why a goroutine is not a thread but a green thread? in the end a green thread is not just a thread?
r/golang on Reddit: Why a goroutine is not a thread but a green thread? In the end a green thread is not just a thread?
November 16, 2022 -

I mean, I get that when we spawn a go routine the runtime generates a green thread which is a lightweight thread... but doesn't this need ultimately a thread to run?

So if a green thread depends on a thread, a goroutine is not a thread but there's no big difference between of being a thread and depending on a thread.

What I'm getting wrong here? What's the actual benefit besides not depending on a specific OS?

Top answer
1 of 12
34
TL;DR: OS-level threads are big and expensive compared to application-level threads. One: Context switching is much more expensive, because OS threads use a full hardware context switch. A hardware context requires switching out all the registers, including hidden registers like your virtual memory maps. Threads don't pay that cost as badly as processes, because they often-but-not-always use the same virtual memory maps, but an intra-process context switch is still over a thousand cycles on Linux. Two: Scheduling is harder, because the OS doesn't have as much information about what is happening inside the program as an application-level scheduler can have. In short, the only way to tell the kernel "hey, I'm waiting for something, wake me up when it's ready" or "hey, this thing is ready, please wake everyone up and let them know" is to use a system call. If you don't expect to wait for very long, you can use atomics in a loop, which is a strategy called "busy-waiting", but you're actually using the CPU while you wait. For longer waits, you should tell the kernel about it; on Linux, this is called a "futex", which is a primitive that sync.Mutex uses under the hood. The Go scheduler, on the other hand, knows exactly what's waiting, because Go threads all share the same privileges, and therefore can share the same virtual memory space, and therefore can just write the information about what they're blocked on directly to shared memory without the overhead of a system call. System calls themselves require two context switches: one from thread to OS when you make the call, and other from OS to thread when the call returns. A blocking system call just... doesn't return until it's time to return. Instead, the physical CPU goes off to do something else, and the "something else" is chosen by the kernel scheduler. It might be another process, even a process owned by another user, and the kernel might run more than one something else before getting back to you. Each of those somethings else will get an entire "timeslice", which on Linux these days can vary in size, but it's often in the realm of microseconds to milliseconds, i.e. thousands to millions of cycles. When you make a blocking system call, i.e. one where the kernel starts doing some work and then comes back to your thread later, you are guaranteed to incur this cost. This especially includes calls to "futex". The Go scheduler, on the other hand, will keep using the OS thread for as long as the time slice lasts; you don't give up the rest of your current time slice when Go switches between goroutines, and instead its donated to the goroutine that you wake up. In other words, waking up a sleeping goroutine is just an unconditional jump / branch / goto. This means that there's a huge, huge speed difference between this: static int global_value = 0; static pthread_mutex_t global_mutex = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t global_cond = PTHREAD_COND_INITIALIZER; void other_main(void* unused) { /* may call futex and block, or may busy-wait, based on libpthread's best guess */ /* whether or not this blocks depends on who wins the race */ /* if it blocks, then we give up other_main's current timeslice */ pthread_mutex_lock(&global_mutex); /* we just got a new timeslice, assuming that we blocked */ global_value++; /* these will always call futex, but they're non-blocking, so we keep our timeslice */ pthread_cond_signal(&global_cond); pthread_mutex_unlock(&global_mutex); /* thread and timeslice both end here */ /* note that our remaining timeslice is *not* donated to main */ } int main() { pthread_t other; pthread_create(&other, NULL, other_main, NULL); /* may call futex and block, or may busy-wait, based on libpthread's best guess */ /* whether or not this blocks depends on who wins the race */ /* if it blocks, then we give up main's current timeslice */ pthread_mutex_lock(&global_mutex); /* we just got a new timeslice, assuming that we blocked */ while (global_value == 0) { /* this will pretty much always call futex and block */ pthread_cond_wait(&global_cond, &global_mutex); /* we just got a new timeslice */ } /* this will always call futex, but it's non-blocking, so we always keep the timeslice that we got from pthread_mutex_lock/pthread_cond_wait returning */ pthread_mutex_unlock(&global_mutex); return global_value; } and this: func main() { value := 0 ch := make(chan struct{}) go func() { value++ close(ch) }() <-ch os.Exit(value) } Unless there are literally no other threads running, not even kernel threads, the second can run thousands to millions of times faster than the first, because the first will have to wait for new timeslices whereas the second can do 100% of the work in a single timeslice. Go also knows that <-ch doesn't need the mutex for very long, because it's only going to check to see if the channel is closed, so theoretically the Go runtime could optimize that implicit mutex into a busy-wait loop, without ever calling "futex" on either end of the channel. I don't know if it actually does that, though. I suspect it does.
2 of 12
20
goroutines are being run on threads, but lets say you are running running 1k goroutines, they don't need 1k threads, just the amount of your CPU threads (GOMAXPROCS).