Hi guys,
I'm using SD on AWS g4dn.xlarge and I'm pretty happy with the results. Sometimes I'm getting 'Out of memory', but restarting A1111 fixes the problem.I'm going to buy RTX 3060, with a price of it comparing to what I pay to AWS, I can recoup it in 3+ months.
I can't find any comparations to check.Maybe someone can make a test with RTX 3060 and run this prompt:
((photo:1.2)), A cute cat mage, glowing fire sword, staff, dramatic lighting, dynamic pose, dynamic camera, masterpiece, best quality, dark shadows, ((dark fantasy)), detailed, realistic, 8k uhd, high quality((photo:1.2)), A cute cat mage, glowing fire sword, staff, dramatic lighting, dynamic pose, dynamic camera, masterpiece, best quality, dark shadows, ((dark fantasy)), detailed, realistic, 8k uhd, high qualityNegative prompt: canvas frame, (high contrast:1.2), (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render, ((watermarks)), smooth, plastic, blurry, low-resolution, deep-fried, oversaturatedSteps: 30, Sampler: Euler a, CFG scale: 7, Seed: 408625209, Size: 512x512, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Version: v1.5.1
Model: v1-5-pruned-emaonlyIt took 5 sec. and 5.35it/s - 5.48it/s to generate an 512*512 image on AWS.
What it/s you've got on RTX 3060?
Thank you so much.
I recently did some testing using Dolphin-Llama3 across various (inexpensive-ish) AWS instances to compare performance. The results are in line with what one might expect.
Testing was done using default settings with Ollama. I spun up a new instance on Ubuntu, installed Ollama and ran it with Dolphin-Llama3 —verbose.
Key Takeaways:
-Fastest Prompt Eval Rate: AWS g5 (fastest AWS instance tested)
-Fastest Eval Rate: Home PC w/RTX 3080
-Best Cost-Performance Balance: AWS g4dn.xlarge offers a good balance of performance and cost, at $0.58/hr.
-GPU speed is the key differentiator. Within the same family of models, such as the g4dn and g5 instances, the evaluation rates remain consistent. If the model fits in GPU memory there is no need for more cores/memory.
-I did notice that the more system memory available the greater number of tokens used in the output.
Test Results
AWS Instances
c7g.8xlarge (Compute Instance) •32 cores, 64GB RAM •Prompt Eval Rate: 38.38 tokens/s •Eval Rate: 25.07 tokens/s •Price: $1.27/hr, $941.16/mo r6g.4xlarge (Memory Instance) •16 cores, 128GB RAM •Prompt Eval Rate: 10.15 tokens/s •Eval Rate: 8.29 tokens/s •Price: $0.88/hr, $657.10/mo g4dn.xlarge (GPU Instance) •4 cores, 16GB RAM, 16GB GPU •Prompt Eval Rate: 222.23 tokens/s •Eval Rate: 41.71 tokens/s •Price: $0.58/hr, $434.50/mo g4dn.2xlarge (GPU Instance) •8 cores, 32GB RAM, 32GB GPU •Prompt Eval Rate: 214.25 tokens/s •Eval Rate: 41.74 tokens/s •Price: $0.84/hr, $621.24/mo g5.xlarge (GPU Instance) •4 cores, 16GB RAM, 24GB GPU •Prompt Eval Rate: 624.29 tokens/s •Eval Rate: 68.08 tokens/s •Price: $1.12/hr, $831.05/mo g5.2xlarge (GPU Instance) •8 cores, 32GB RAM, 24GB GPU •Prompt Eval Rate: 624.48 tokens/s •Eval Rate: 66.67 tokens/s •Price: $1.35/hr, $1,000.96/mo
Local Machines
M2 MacMini •M2, 8GB RAM, <8GB GPU •Prompt Eval Rate: 66.38 tokens/s •Eval Rate: 18.33 tokens/s M1 MacBook Air •M1, 16GB RAM, <16GB GPU •Prompt Eval Rate: 71.58 tokens/s •Eval Rate: 11.46 tokens/s Home PC w/RTX 3080 •Intel i5, 64GB RAM, 10GB GPU •Prompt Eval Rate: 185.67 tokens/s •Eval Rate: 83.79 tokens/s
Oracle Ampere
Ampere 16 Core, 32GB RAM •Prompt Eval Rate: 11.96 tokens/s (Duration: 1m34.955180835s) •Eval Rate: 9.01 tokens/s (Duration: 1m28.461256s) •Price: $0.1276/hr, $95/mo Ampere 32 Core, 32GB RAM •Prompt Eval Rate: 22.54 tokens/s (Duration: 47.93207936s) •Eval Rate: 14.11 tokens/s (Duration: 44.423782s) •Price: $0.2796/hr, $208/mo
Here's the data formatted in table for easier viewing - courtesy of u/sergeant113. https://www.reddit.com/r/LocalLLaMA/comments/1dclmwt/comment/l7zrgzm/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button