What is the best AWS server instance having a local dedicated GPU capable of running a GPT-3.5 equivalent micro LLM adequate for embedding and summarization. This would be for inference only, no training.
Videos
Is there any way of hosting llm using on a single A100 instance? I could only find p4d.24xlarge which has 8 A100. My current workload doesn't justify the cost for that instance.
Also as I am very new to AWS; any general recommendations on the most effective and efficient way of hosting llm on AWS are also appreciated. Thank you
Hi ! I have a question for ml practitioners who are familiar with AWS products.
In my workplace, we are assessing two options : using Amazon SageMaker or having an EC2 instance with GPU.
We mainly need the computing power (GPU) and nothing more. We are about to train a oss llm model with our own dataset. Havent considered regarding cloud gpu services such as (runpod and vast), current poc focus is on Aws products as our ecosystem is mostly in AWS itself.
Which is more adapted for our case cost-wise and from an ease-of-use point of view ?
Thank you in advance for your help.