🌐
Reddit
reddit.com › r/localllm › aws gpu instances that can handle micro llms
r/LocalLLM on Reddit: AWS GPU instances that can handle micro LLMs
December 17, 2023 -

What is the best AWS server instance having a local dedicated GPU capable of running a GPT-3.5 equivalent micro LLM adequate for embedding and summarization. This would be for inference only, no training.

🌐
AWS
aws.amazon.com › amazon ec2 › instance types › p5 instances
Amazon EC2 P5 Instances – AWS
2 days ago - GPU-based EC2 instances, and reduce cost to train ML models by up to 40%. These instances help you iterate on your solutions at a faster pace and get to market more quickly. You can use P5, P5e, and P5en instances for training and deploying complex large language models (LLMs) and diffusion ...
🌐
Chariot Solutions
chariotsolutions.com › home › getting started with llm in the cloud with amazon dlami ec2 instances
Getting started with LLM in the Cloud with Amazon DLAMI EC2 Instances — Chariot Solutions
April 3, 2024 - Still, $6k for a workstation that can delay your need for cloud GPUs in development is not a bad investment · Managed cloud services – Amazon has a wide variety of cloud services available – including Amazon Kendra which costs $810 / month / developer for the license (yes, there is a free tier of 750 hours available to start), Amazon Bedrock, which is a serverless pay-as-you-go access platform for LLMs, and their many other ML APIs. EC2 instances tuned for GPU work – if you don't want to dig deeply into a managed solution yet, or you may use more than Amazon's own APIs and platforms, and you have a simple workflow to try out on an accelerated platform but no platform to use, try out Amazon's EC2 AMIs.
🌐
CodiLime
codilime.com › blog › data › data science › hosting llms on aws
Hosting LLMs on AWS
We explored the differences between ... LLMs on your own. The tutorial covered setting up AWS EC2 instances, particularly G5 instances with NVIDIA GPUs, ideal for demanding machine learning tasks....
🌐
DEV Community
dev.to › aws-builders › deploy-your-llm-on-aws-ec2-2ig3
Deploy Your LLM on AWS EC2 - DEV Community
September 14, 2024 - AWS instances like g4, g5, p3, and p4 are the latest generation of GPU-based instances and provide the highest performance in Amazon EC2 for deep learning and high-performance computing (HPC).
🌐
Mamezou
developer.mamezou-tech.com › en › blogs › 2025 › 08 › 21 › ec2-gpu-demo
Build Your Own LLM Environment on AWS! A Hands-On Guide to Running AI with EC2 GPU Instances and Ollama | Mamezou Developer Portal
With Ollama, you can run LLMs on the CPU alone, but having a GPU with sufficient VRAM (video memory) offers speed-up benefits. This time, since I plan to run gpt-oss, which OpenAI recently released and is now available via Ollama, I'll choose ...
🌐
Medium
medium.com › @thomasjay200 › run-your-own-llm-ollama-on-aws-with-nvidia-gpu-dab7dc008bfe
Run your own LLM — Ollama on AWS with Nvidia GPU | by Tom Jay | Medium
February 29, 2024 - You will need an AWS account, you will also need access to GPU based instances, this is not provided by default, there is “Service Request” page that you will need to request the instance type, for what we want we will request access to ...
🌐
Stack Overflow
stackoverflow.com › questions › 79381462 › is-it-possible-to-train-llms-on-ec2-gpus-using-lambda-for-on-demand-instance-act
amazon ec2 - Is it possible to train LLMs on EC2 GPUs using Lambda for on-demand instance activation? - Stack Overflow
Use a GPU-capable EC2 instance to host the LLM model, but by default, leave the instance switched off. When training is necessary, programmatically launch the instance using an AWS Lambda function.
Find elsewhere
🌐
Medium
medium.com › @chinmayd49 › self-host-llm-with-ec2-vllm-langchain-fastapi-llm-cache-and-huggingface-model-7a2efa2dcdab
Self host LLM with EC2, vLLM, Langchain, FastAPI, LLM cache and huggingFace model | by Chinmay Deshpande | Medium
November 22, 2023 - Lets start the technical discussion ... serve LLM faster and efficiently to customers. I would recommend go through AWS g5 instances and P4 instances which provides good performance for ML and LLM specifically....
🌐
AWS
aws.amazon.com › blogs › machine-learning › serving-llms-using-vllm-and-amazon-ec2-instances-with-aws-ai-chips
Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips | Artificial Intelligence
November 26, 2024 - Using vLLM on AWS Trainium and Inferentia makes it possible to host LLMs for high performance inference and scalability. In this post, we will walk you through how you can quickly deploy Meta’s latest Llama models, using vLLM on an Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instance.
🌐
AWS
aws.amazon.com › blogs › publicsector › deploy-llms-in-aws-govcloud-us-regions-using-hugging-face-inference-containers
Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers | AWS Public Sector Blog
June 19, 2024 - Another way this can be achieved is through Hugging Face Inference containers. We’ll utilize Amazon EC2 GPU instances and the Hugging Face Inference Container to host and serve custom LLMs in the AWS GovCloud (US) Regions.
🌐
Brandonharris
brandonharris.io › Local-LLMs-Getting-Started-with-LLaMa-and-AWS
Cloud LLaMa - Local LLM's and Getting Started with LLaMa on AWS EC2 – brandonharris.io
Choose p3.2xlarge instance type to start. There are a variety of GPU instances AWS and others offer, but typically the main constraint will be GPU memory. The p3.2xlarge offers a GPU with 16GB of GPU memory which is on the low end, but sufficient for our needs here.
🌐
AWS
aws.amazon.com › blogs › machine-learning › optimize-price-performance-of-llm-inference-on-nvidia-gpus-using-the-amazon-sagemaker-integration-with-nvidia-nim-microservices
Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA NIM Microservices | Artificial Intelligence
March 18, 2024 - NIM, part of the NVIDIA AI Enterprise software platform listed on AWS marketplace, is a set of inference microservices that bring the power of state-of-the-art LLMs to your applications, providing natural language processing (NLP) and understanding capabilities, whether you’re developing chatbots, summarizing documents, or implementing other NLP-powered applications. You can use pre-built NVIDIA containers to host popular LLMs that are optimized for specific NVIDIA GPUs for quick deployment or use NIM tools to create your own containers.
🌐
Reddit
reddit.com › r/aws › aws sagemaker or aws ec2 for llm model training
r/aws on Reddit: AWS Sagemaker or AWS EC2 for llm model training
April 21, 2024 -

Hi ! I have a question for ml practitioners who are familiar with AWS products.

In my workplace, we are assessing two options : using Amazon SageMaker or having an EC2 instance with GPU.

We mainly need the computing power (GPU) and nothing more. We are about to train a oss llm model with our own dataset. Havent considered regarding cloud gpu services such as (runpod and vast), current poc focus is on Aws products as our ecosystem is mostly in AWS itself.

Which is more adapted for our case cost-wise and from an ease-of-use point of view ?

Thank you in advance for your help.

🌐
AWS re:Post
repost.aws › questions › QU5GO1pICeTrWIvHAQK8W_Zw › what-are-the-cost-effective-options-for-on-demand-api-of-fine-tuned-llm-with-gpu
What are the cost effective options for on-demand API of fine tuned llm with gpu | AWS re:Post
September 23, 2024 - EC2 spot instances with GPU is a strong option for cost-efficiency, especially if you can automate starting/stopping the instance. Hugging Face offers a more convenient API-based option with usage-based billing.
🌐
AWS
aws.amazon.com › blogs › hpc › scaling-your-llm-inference-workloads-multi-node-deployment-with-tensorrt-llm-and-triton-on-amazon-eks
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS | AWS HPC Blog
December 2, 2024 - Feel free to edit the TensorRT-LLM specific parameters like batch size, depending on your workload. # Replace <PATH_TO_AWSOME_INFERENCE_GITHUB> with path to where you cloned the GitHub repo bash <PATH_TO_AWSOME_INFERENCE_GITHUB>/2.projects/multinode-triton-trtllm-inference/update_triton_configs.sh · You can find the example_values.yaml file that we use for deploying our application here. The relevant sections of this deployment manifest are: … gpu: NVIDIA-H100-80GB-HBM3 gpuPerNode: 8 persistentVolumeClaim: efs-claim tensorrtLLM: parallelism: tensor: 8 pipeline: 2 triton: image: name:${ACCOU
🌐
Medium
medium.com › @mr.sean.ryan › deploying-a-high-performance-llm-with-user-interface-on-aws-ec2-with-gpu-part-1-of-a-series-cc99a98e3185
Deploying a high performance LLM with user interface on AWS EC2 with GPU [part 1 of a series] | by Sean Ryan | Medium
April 14, 2024 - This series presents step-by-step directions to host an LLM (Large Language Model) with a basic user interface, on Amazon’s AWS cloud. There are various articles and documentation already available…