It's good. Smart, insightful, and creative. My only complaint is that it's too direct, but that can be solved with prompting. Answer from NealAngelo on reddit.com
🌐
Reddit
reddit.com › r/accelerate › people are seriously downplaying the performance of grok 3
r/accelerate on Reddit: People are seriously downplaying the performance of Grok 3
February 18, 2025 -

I know we all have ill feelings about Elon, but can we seriously not take one second to validates its performance objectively.

People are like "Well, it is still worse than o3", we do not have access to that yet, it uses insane amounts of compute, and the pre-training only stopped a month ago, there is still much much potential to train the thinking models to exceed o3. Then there is "Well, it uses 10-15x more compute, and it is barely an improvement, so it is actually not impressive at all". This is untrue for three reason.
Firstly Grok-3 is definitely a big step up from Grok 2.
Secondly scaling has always been very compute-intensive, there is a reason that intelligence had not been a winning evolutionary trait for a long time and still is. It is expensive. If we could predictably get performance improvements like this for every 10-15x scaling in compute, then we would have Superintelligence in no time, especially considering how now three scaling paradigms stack on top of each other: Pre-Training, Post-Training and RL, inference-time-compute.
Thirdly if you look at the LLaMA paper in 54 days of training with 16000 H100, they had 419 component failures, and the small XAI team is training on 100-200 thousands ~h100's for much longer. This is actually quite an achievement.

Then people are also like "Well, GPT-4.5 will easily destroy this any moment now". Maybe, but I would not be so sure. The base Grok 3 performance is honestly ludicrous and people are seriously downplaying it.

When Grok 3 is compared to other base models, it is waay ahead of the pack. People got to remember the difference between the old and new Claude 3.5 sonnet was only 5 points in GPQA, and this is 10 points ahead of Claude 3.5 Sonnet New. You also got to consider the controversial maximum of GPQA Diamond is 80-85 percent, so a non-thinking model is getting close to saturation. Then there is Gemini-2 Pro. Google released this just recently, and they are seriously struggling getting any increase in frontier performance on base-models. Then Grok 3 just comes along and pushes the frontier ahead by many points.

I feel like a part of why the insane performance of Grok 3 is not validated more is because of thinking models. Before thinking models performance increases like this would be absolutely astonishing, but now everybody is just meh. I also would not count out Grok 3 thinking model getting ahead of o3, given its great performance gains, while still being in really early development.

The grok 3 mini base model is approximately on par with all the other leading base-models, and you can see its reasoning version actually beating Grok-3, and more importantly the performance is actually not too far off o3. o3 still has a couple of months till it gets released, and in the mean time we can definitely expect grok-3 reasoning to improve a fair bit, possibly even beating it.

Maybe I'm just overestimating its performance, but I remember when I tried the new sonnet 3.5, and even though a lot of its performance gains where modest, it really made a difference, and was/is really good. Grok 3 is an even more substantial jump than that, and none of the other labs have created such a strong base-model, Google is especially struggling with further base-model performance gains. I honestly think this seems like a pretty big achievement.

Elon is a piece of shit, but I thought this at least deserved some recognition, not all people on the XAI team are necessarily bad people, even though it would be better if they moved to other companies. Nevertheless this should at least push the other labs forward in releasing there frontier-capabilities so it is gonna get really interesting!

🌐
Reddit
reddit.com › r/openai › grok 3 & grok 3 think tested: initial impressions
r/OpenAI on Reddit: Grok 3 & Grok 3 THINK Tested: Initial Impressions
February 19, 2025 -

I tested both Grok 3 and Grok 3 THINK on coding, math, reasoning and common sense. Here are a few early observations:

- The non-reasoning model codes better than the thinking model

- The reasoning model is very fast, it looked slightly faster than Gemini 2.0 Flash Thinking, which in itself is quite fast

- Grok 3 THINK is very smart and approaches problems like DeepSeek R1 does, even uses "Wait, but..."

- G3-Think doesn't seem to load balance, it thinks unnecessarily long at times for easy questions, like R1 does

- Grok 3 didn't seem significantly better than existing top models like Claude 3.5 Sonnet or o3-mini, though we'll finalize testing after API access

- G3-Think is not deterministic, it failed 2 our of 3 attempts at a hard coding problem, each having different results (Exercism REST API challenge):

> Either it has a higher than normal temperature setting,

> introduces regressions in the "daily improvements" Elon Musk mentioned,

> or is load balancing different versions

> Coding Challenge GitHub repo: https://github.com/exercism/python/blob/main/exercises/practice/rest-api
> Coding Challenge: https://exercism.org/tracks/python/exercises/rest-api

- For those who just want to see the entire test suite: https://youtu.be/hN9kkyOhRX0

What are your initial impressions of Grok 3?

🌐
Reddit
reddit.com › r/openai › grok 3 isn't the "best in the world" — but how xai built it so fast is wild
r/OpenAI on Reddit: Grok 3 isn't the "best in the world" — but how xAI built it so fast Is wild
April 20, 2025 -

When Grok 3 launched, Elon hyped it up—but didn't give us a 100% proof it was better than the other models. Fast forward two months, xAI has opened up its API, so we can finally see how Grok truly performs.

Independent tests show Grok 3 is a strong competitor. It definitely belongs among the top models, but it's not the champion Musk suggested it would be. Plus, in these two months, we've seen Gemini 2.5, Claude 3.7, and multiple new GPT's arrive.

But the real story behind Grok is how fast xAI execution is:

In about six months, a company less than two years old built one of the world's most advanced data centers, equipped with 200,000 liquid-cooled Nvidia H100 GPUs.

Using this setup, they trained a model ten times bigger than any of the previous models.

So, while Grok 3 itself isn't groundbreaking in terms of performance, the speed at which xAI scaled up is astonishing. By combining engineering skill with a massive financial push, they've earned a spot alongside OpenAI, Google, and Anthropic.

See more details and thoughts in my full analysis here.

I'd really love your thoughts on this—I'm a new author, and your feedback would mean a lot!

🌐
Reddit
reddit.com › r/openai › how is grok 3 smartest ai on earth ? simply it's not but it is really good if not on level of o3
r/OpenAI on Reddit: How is grok 3 smartest ai on earth ? Simply it's not but it is really good if not on level of o3
February 18, 2025 - As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking.
🌐
Reddit
reddit.com › r/grok › crap, grok is the best ai right now isn't it?
r/grok on Reddit: Crap, Grok is the best AI right now isn't it?
February 20, 2025 -

I've been using ChatGPT nearly daily for the last year or so and am very familiar with it. I wondered what else was out there so I dabbled with Gemini but found it frustratingly PC in its responses with disclaimers on every little thing. I had it help with some book editing and thought the quality was pretty mid. It contradicted itself more than once.

I pulled up Grok and ran through similar prompts and the understanding it displayed and answers it gave blew Gemini away imho. I feel like the responses are smarter, shorter and more to the point.

I really wish that Grok had a memory system like ChatGPT has. If they did I'd be tempted to switch permanently. I don't have any problems with ChatGPT, but I guess you always want to be using the best and brightest.

Find elsewhere
🌐
Reddit
reddit.com › r/artificial › can someone tell me what makes people think grok is superior to chatgpt?
r/artificial on Reddit: Can someone tell me what makes people think Grok is superior to ChatGPT?
July 22, 2025 -

I am honestly curious what benefit Grok 4 gives aside from being really good at coding and having great access to live info due to X. I use ChatGPT for creative type work, generative AI(images and video), research on products and subjects i am interested in, learning new things, etc. I dont do any coding, I am just curious what is so amazing about Grok to think it is so much better? And some say the Voice Mode is better, better how? What does $30 SuperGrok get me that ChatGPT Plus doesnt?

🌐
Reddit
reddit.com › r › grok
Grok
March 10, 2012 - Grok has cooked my brain · u/Wulf_3rdTimesACharm · • · Grok has cooked my brain · Discussion · Like seriously. Mentally it has fucked me up good. This is a confession: This has made all my gooning fantasies come true, drained up to 6 hours a day and pretty much ruined my life.
🌐
Reddit
reddit.com › r/grok › grok 3 is great.
r/grok on Reddit: Grok 3 is great.
September 30, 2024 -

So I've been using claude 3.7 sonnet pro for a month, paid plan. . Decided to try grok 3. For code generation it has been excellent. I'm almost ready to cancel claude and go super grok. Seems to make a lot less mistakes or go down rabbit holes,

🌐
Reddit
reddit.com › r/localllama › i tested grok 3 against deepseek r1 on my personal benchmark. here's what i found out
r/LocalLLaMA on Reddit: I tested Grok 3 against Deepseek r1 on my personal benchmark. Here's what I found out
February 21, 2025 -

So, the Grok 3 is here. And as a Whale user, I wanted to know if it's as big a deal as they are making out to be.

Though I know it's unfair for Deepseek r1 to compare with Grok 3 which was trained on 100k h100 behemoth cluster.

But I was curious about how much better Grok 3 is compared to Deepseek r1. So, I tested them on my personal set of questions on reasoning, mathematics, coding, and writing.

Here are my observations.

Reasoning and Mathematics

  • Grok 3 and Deepseek r1 are practically neck-and-neck in these categories.

  • Both models handle complex reasoning problems and mathematics with ease. Choosing one over the other here doesn't seem to make much of a difference.

Coding

  • Grok 3 leads in this category. Its code quality, accuracy, and overall answers are simply better than Deepseek r1's.

  • Deepseek r1 isn't bad, but it doesn't come close to Grok 3. If coding is your primary use case, Grok 3 is the clear winner.

Writing

  • Both models are equally better for creative writing, but I personally prefer Grok 3’s responses.

  • For my use case, which involves technical stuff, I liked the Grok 3 better. Deepseek has its own uniqueness; I can't get enough of its autistic nature.

Who Should Use Which Model?

  • Grok 3 is the better option if you're focused on coding.

  • For reasoning and math, you can't go wrong with either model. They're equally capable.

  • If technical writing is your priority, Grok 3 seems slightly better than Deepseek r1 for my personal use cases, for schizo talks, no one can beat Deepseek r1.

For a detailed analysis, Grok 3 vs Deepseek r1, for a more detailed breakdown, including specific examples and test cases.

What are your experiences with the new Grok 3? Did you find the model useful for your use cases?

🌐
Reddit
reddit.com › r/localllm › thoughts on grok 3?
r/LocalLLM on Reddit: Thoughts on Grok 3?
November 28, 2024 -

It won't be free, and minimum cost is I believe $30 a month to use it. Thing is on 200k H100s and heard they are thinking to change them to all H200s.

That data center running it is an absolute beast, and current comparisons show it is leading in quality but it won't ever be free or run it privately.

On one hand I'm glad more advancements are being made, competition breeds higher quality products. On the other hell no I'm not paying for it as I enjoy locally ran ones only, even if they are only a fraction of potential because of hardware limitions (aka cost).

Is any here thinking of giving it a try once fully out to see how it does with LLM based things and image generation?

🌐
Reddit
reddit.com › r/grok › has anyone here actually used grok ai? what was your real experience like—strengths, weaknesses, surprises? would you recommend it over other ais?
r/grok on Reddit: Has anyone here actually used Grok AI? What was your real experience like—strengths, weaknesses, surprises? Would you recommend it over other AIs?
July 20, 2025 -

I've been hearing a lot about Grok AI and am curious how it actually performs in real life. If you’ve tried it, what stood out to you—either good or bad? How does it compare to tools like ChatGPT or Gemini? Specifically, are there any tasks or features Grok AI can handle that other AIs can’t, or areas where it truly excels or does something unique? Any specific use cases or stories are welcome! I’d love to hear honest feedback from people who’ve experimented with it.

Top answer
1 of 5
9
I find grok is really really good whenever having super up to date and more niche info is needed. ChatGPT feels like it can instantly synthesize the first page of google search results - which is great if I’m asking general and well publicized knowledge. I found grok is best when asking about things like up and coming startups in AI, development or building with specific technologies like comfyUI, and so on. It also gives VERY detailed answers in my experience - so as a “co builder” chat for development or tech is nice
2 of 5
6
I pay for chatgpt, Gemini, and grok now. I find chatgpt the most consistent and my goto for fleshing out ideas and talking to about projects partly because it does the best job with pulling context from other chats, but I trust it the least to be critical and not overly encouraging or validating, I’ve been really liking that grok will tell me straight up if my idea is stupid it does great when websearch is required and also great for coding, I’ve almost stopped using the Gemini app since I got grok and use it occasionally for coding or deep research now but not a lot else (I use aistudio when I need long context now but will probably cancel my Gemini subscription soon. Grok’s been a lot better than the others about getting technical specs, and suggested params on things like AI models also. Happy to throw a query (get weird, no judgement here 😂) at any of them if someone doesn’t have access and wants to see how its work for their use case.
🌐
Reddit
reddit.com › r/singularity › first impressions of grok 3
r/singularity on Reddit: First impressions of Grok 3
March 21, 2024 - Not a big fan of Elon but credit where credit is due, Grok 3 certainly seems to take the top SOTA spot.
🌐
Reddit
reddit.com › r/grok › grok 3 vs. other ai tools: what sets it apart?
r/grok on Reddit: Grok 3 vs. Other AI Tools: What Sets It Apart?
March 4, 2025 -

I wrote an in-depth comparison of Grok 3 against other popular AI tools like GPT-4, Google Gemini, and DeepSeek V3. Thought you all might find some of the key takeaways interesting:

Grok 3's "Think" and "Big Brain" modes are pretty impressive for complex reasoning tasks. It outperformed GPT-4 and Gemini on some math benchmarks.

However, Grok 3 falls short in real-time data integration. Google Gemini seems to have the edge there.

Accessibility is a mixed bag. Grok 3 is subscription-only at $40/month, while alternatives offer free tiers or are open-source.

For coding tasks, Grok 3 showed a 20% improvement in accuracy compared to its predecessor.

Each tool has its strengths: Grok 3 for reasoning, GPT-4 for creative writing, Gemini for real-time search, and DeepSeek for efficiency on limited resources.

The full article goes into more detail on performance metrics, user experience, and specific use cases for each tool.

You can check it out here if you're interested: https://aigptjournal.com/explore-ai/ai-guides/grok-3-vs-other-ai-tools/

What's your experience been with these AI tools? Any surprises in how they compare to each other?

🌐
Reddit
reddit.com › r/singularity › grok 3 not performing well in real world performance: what does this say about benchmarks and scaling?
r/singularity on Reddit: Grok 3 Not Performing Well In Real World Performance: What Does This Say About Benchmarks And Scaling?
February 18, 2025 -

-100K Nvidia H100 GPUs, by far the most compute power of any AI model. (A single H100 costs $30,000.)

-200 million GPU hours for training.

-Trained on the largest synthetic dataset.

-Uses test-time compute like O1 and O3.

-Likely was several billion dollars to train.

-It performed well on benchmarks. Yet, many users report that models over a year old still outperform it in various tasks.

I was actually one of the few people optimistic about Grok 3 because the sheer amount of compute that went into it has implications for the future of LLMs as a whole.

DeepMind flopped with Gemini 2.0 Pro (they realized months ago that it couldn’t outperform Gemini 1.5, yet they released it anyway). Anthropic scrapped 3.5 Opus due to massive performance/cost issues in Fall 2024 and instead released a "new" 3.5 Sonnet, forcing them back to the drawing board. OpenAI kept delaying GPT-4.5/Orion.

Were the LLM critics right all along? Models like Gemini 2, Grok 3, and GPT-5 were supposed to generate tens of thousands of lines of clean, bug-free code and create highly creative, coherent 300+ page novels in one shot. Yet these SOTA models will still refuse to generate anything more than 5-10 pages in length, and when you try to force them, they lose coherency and begin to hallucinate.

No one is rushing to use these next-generation models. People forgot Gemini 2.0 even exists. It remains to be seen if GPT5 can meet the hype.

But I am starting to suspect that GPT5 might yet be another slight incremental upgrade over the likes of Gemini 2.0 Pro and Grok 3.

🌐
Reddit
reddit.com › r/grok › grok 3's think mode is terrible.
r/grok on Reddit: Grok 3's Think mode is terrible.
March 9, 2025 -

Think mode compared to regular mode is just terrible. For some reason it doesn't remember anything in a chat session. I am constantly having to upload code that I don't have to during the regular sessions. Am I using it wrong? It's so frustrating to use. It's constantly giving me made up code instead of using the code I've given only a few messages before.

🌐
Reddit
reddit.com › r/artificialinteligence › grok 3 is leagues above in creative writing. it’s not even close.
r/ArtificialInteligence on Reddit: Grok 3 is leagues above in creative writing. It’s not even close.
April 19, 2025 -

I finished a pretty good TV show and was super pissed at the ending. I asked ChatGPT (4o), Claude (3.7 sonnet), Grok (3), and DeepSeek r1 to write fanfictions. I wrote a detailed prompt and copied and pasted it in all of the above mentioned.

ChatGPT

It wasn't even close. First off, ChatGPT is total trash at creative writing. It will put your instructions into memory (not context) and if you give detailed instructions, your memory will run out very quickly. It also has terrible context.

At roughly chapter 12 (1000+ word chapters), it begins hallucinating and being extremely repetitive in diaglogue. It stopped progressing the story (I had typed "next chapter" for next chapter until then. After that I had to give instructions again.).

Shameful word limit per response. Cannot write more than 2000 words at a time.

Claude

Claude is okay. It is very good at planning out the story and executing chapters. Claude does refracts really well - eg it will create a markdown/plain text file within to the chat to isolate the chapter from the rest of the conversation. Pretty useful in keeping track of the story. However, it was pretty slow. And coherence and dialogue did not have the same level of structural framework as Grok.

Can write more than ChatGPT by a 1000 or so words - still not a lot.

Ranking: 2nd

Deepseek

The content itself was very well written. The main problem I faced was it included totally random characters from other tv shows/movies. I have no idea why it would do this. It had to be given instructions again and again and this makes the reading experience feel like work.

Can write far more than the former 2. I was able to get a 5000 word chapter.

Grok

Grok is leagues ahead. it's not even close. You can ask it to write 10,000 words per chapter and it takes it like a champ. the dialogue is more in tone with characters do the show and the storylines match the closest. It was also the fastest in text generation. The stories were coherent to the very end - it didn't seem to "lose" the story like ChatGPT.

so yeah. My 2 cents. I forgot about Gemini. I'll try it out. I read 1.5 is good at writing.