Please help me work out the math here, as I think I am doing this wrong.
A Lambda of 128mb costs $0.0000000021/ms, this works out $0.00756/hour. A Lambda of 512mb costs $0.0000000083/ms, this works out $0.02988/hour.
Now if you look at EC2:
t4g.nano $0.0042/hour (0.5 GiB ram) t4g.micro $0.0084/hour (1GiB ram).
But... the Lambda will likely not run 100% of the time, and will stay warm for 10 minutes (not sure here?). And the RAM usage would be much better utilized if you got a function running, rather than an entire VPC.
Given all that, if the function can run with 128mb or less, it seems like a no-brainer to use Lambda.
However, if the function is bigger, it would only make sense to put it in an EC2 if it runs more than 30% of the time ($0.0084/hour cost of t4g.micro divided by 0.02988/h cost of 512mb lambda).
So why is everyone against Lambdas citing costs as the primary reason...?
Hi everyone I have a question regarding cost between lambda and ec2 I am building a simple node application using puppeteer that will only act as an api. Obviously don’t expect heavy usage for now but overall Iv read that lambda will be cheaper to run up until heavy usage? Just wanted to get your thoughts if ec2 is better than lambda both in performance and cost. Again this will not be heavily used in either case. Also which ones performance will be better?
Videos
Say that I have 5000 requests hitting an application every 5 seconds and I need to consume an object and write it to a database. Would a lambda with an http trigger that does an insert be more cost effective than having an ec2 instance that does the same thing?
Edit: Just want to say you all are awesome! I am coming from MSFT world and never experienced so many folks being so helpful. Makes me think I should have switched to AWS ages ago. Thanks everybody.
Hello, pretty new to building on AWS so I pretty much just threw everything in lambda for the heavy compute and have light polling on EC2. I am doing all CPU and somewhat memory intensive work that lasts around 1-7 minutes on async lambda functions, which sends a webhook back to the polling bot (free t2 micro) when it is complete. For my entire project, lambda is accruing over 50% of the total costs which seems somewhat high as I have around 10 daily users on my service.
Perhaps it is better to wait it out and see how my SaaS stabilises as we are in a volite period as we enter the market, so it's kinda hard to forecast with any precision on our expected usage over the coming months.
Am I better off having an EC2 instance do all of the computation asynchronously or is it better to just keep it in lambda? Better can mean many things, but I mean long term economic scalability. I tried to read some economics on lambda/EC2 but it wasn't that clear and I still lack the intuition of when / when not to use lambda.
It will take some time to move everything onto an ec2 instance first of all, and then configure everything to run asynchronously and scale nicely, so I imagine the learning curve is harder, but it would be cheaper as a result? .
I am working on a project, it's a pretty simple project on the face :
Background :
I have an excel file (with financial data in it), with many sheets. There is a sheet for every month.
The data is from June 2020, till now, the data is updated everyday, and new data for each day is appended into that sheet for that month.
I want to perform some analytics on that data, things like finding out the maximum/ minimum volume and value of transactions carried out in a month and a year.
Obviously I am thinking of using python for this.
The way I see it, there are two approaches :
-
store all the data of all the months in panda dfs
-
store the data in a db
My question is, what seems better for this? EC2 or Lambda?
I feel Lambda is more suited for this work load as I will be wanting to run this app in such a way that I get weekly or monthly data statistics, and the entire computation would last for a few minutes at max.
Hence I felt Lambda is much more suited, however if I wanted to store all the data in a db, I feel like using an EC2 instance is a better choice.
Sorry if it's a noob question (I've never worked with cloud before, fresher here)
PS : I will be using free tiers of both instances since I feel like the free tier services is enough for my workload.
Any suggestions or help is welcome!!
Thanks in advance
I know that there's a trend towards serverless architecture like Lambda.
I know Lambda is an on demand AWS service where you don't have to provision and allocate resources(EC2) based on forecasting. With Lambda, your compute resources can scale based on real-time demands.
In what situations would you want to use a non managed solution like EC2 vs lambda?
I'm under the impression that the less work you need to do in terms of operations the better - why AWS exists. If you use ec2 directly, you would have to define autoscaling rules, etc
So I decided AWS Lambda had been on the field enough time to make an analysis by my own.
Lambda comes with a few problems like only Python 2.7 (they will fix it in the future I guess), libraries, etc. Also I had read some blogs like Flynn's one.
But skipping those problems I wanted to know... What about money?
Please check my numbers to see if I made something wrong.
I will compare t2.nano (1 cpu + 512 MB) and Lambda 128 MB
With 2 SSH consoles opened targeting my EC2 instance, I had 139 MB left, and this was the first +1 to EC2.
So, I designed a small task which runs a regex pattern to catch the attributes within div tags from a html code. The input file's size was 322 KB and the same task run 1000 times:
import re, time
def task(code):
rep = 1000
pattern = '<div([^>]+)>'
ini = time.time()
for i in range(rep):
results = re.findall(pattern, code)
message = "{0} Repetitions. Avg = {1:.3f} ms".format(rep, (time.time()-ini)*1000/rep)
print(message)This task only took 30 MB in both cases, and here I saw the first surprise:
-
Lambda: 1000 Repetitions. Avg = 7.261ms
-
EC2: 1000 Repetitions. Avg = 0.480 ms
From AWS Cloud Watch: Duration: 7823.48 ms Billed Duration: 7900 ms Memory Size: 128 MB Max Memory Used: 30 MB
Lambda took 562 ms to get the file from S3, import the libraries and so on, and to run the task 1000 times 7.261 seconds, 15 times more.
So, for small fast tasks, you are paying 562 ms at every call, and for heavy tasks you will be 15 times slower.
I didn't test a group of EC2 machines for a long time, probably the efficiency it drops down a lot for heavy load work. In Digital Ocean I had this problem. I contact them saying that some of my machines were 5 times slower than others, and their reply was something like: "Please tell us which ones and we will move to another server". This solution is not tolerable when you are working with an auto scaling cloud.
Let's say EC2 instances needs also 562ms to get the file from S3, import libraries and so on. So I designed 2 functions to calculate the cost price:
import math
def lambda_cost(mem=128,runtime=7,num_calls=10000000, free_gbs=400000, free_calls=1000000, loading_data_time=562):
'''
runtime and loading_data_time in ms
'''
price = 0.00001667
billed_time = math.ceil( (runtime+loading_data_time)/100 ) * 100
total_seconds_runtime = num_calls*billed_time/1000
total_gbs = total_seconds_runtime * mem / 1024
charged_by_runtime = max(0, (total_gbs-free_gbs)*price)
charged_by_calls = max(0.2 * ((num_calls-free_calls)/1000000), 0)
print('Billed time: {0}'.format(billed_time))
print('GB/s: {0}'.format(total_gbs))
print('Charged by runtime: ${0:.3f}'.format(charged_by_runtime))
print('Charged by calls: ${0:.3f}'.format(charged_by_calls))
print('Total charged: ${0:.3f}'.format(charged_by_runtime + charged_by_calls))
def ec2_cost(mem=512, runtime=0.48, num_calls=10000000, free_hours=750, loading_data_time=562, cost_hour=1):
'''
runtime and loading_data_time in ms
'''
costs = {
512: [0.0065, 0.0045],
1024: [0.013, 0.009],
}
if not costs.get(mem): return False
on_demand_hour = costs[mem][0]
reserved_hour = costs[mem][1]
total_runtime_seconds = (runtime + loading_data_time) * num_calls / 1000
total_runtime_hours = total_runtime_seconds / 3600
total_cost_on_demand = (total_runtime_hours - free_hours) * on_demand_hour
total_cost_reserved = (total_runtime_hours - free_hours) * reserved_hour
print('Cost on demand: ${0:.3f}'.format(total_cost_on_demand))
print('Cost reserved: ${0:.3f}'.format(total_cost_reserved))Now, what if we want to run 10kk times this script?
-
Lambda with 400k Free GBs and 1kk Free calls: $5.551
-
Lambda without Free GBs nor Free calls: $12.419
-
EC2 with 750 free hours + on demand: $5.281
-
EC2 with 750 free hours + reserved: $3.656
-
EC2 + on demand: $10.156
-
EC2 + reserved: $7.031
And 100kk times?
-
Lambda with 400k Free GBs and 1kk Free calls: $117.320
-
Lambda without Free GBs nor Free calls: $124.188
-
EC2 with 750 free hours + on demand: $96.684
-
EC2 with 750 free hours + reserved: $66.935
-
EC2 + on demand: $101.559
-
EC2 + reserved: $70.310
EC2 750 free hours is for t2.micro, not t2.nano (1GB instead of 512MB), but only for new accounts (less than 1 year old). In the other hand, Lambda 400k free GBs are for everyone, being this one of the few positive vote it gets.
Another point is that EC2 performance may be improved by multithreads in some cases. It will not help with the regex task, for example, but it's very useful for urllib and so on. Something like this:
import os
import threading
from subprocess import Popen
import urllib2, time
def ec2_loading_data_time(url='http://www.nytimes.com/'):
ini = time.time()
response = urllib2.urlopen(url)
code = response.read().decode('utf-8')
loading_data_time = time.time() - ini
print('Loading data time: {0}'.format(loading_data_time))
return code
class TestingThreadPerformance(threading.Thread):
def run(self):
ini = time.time()
code = ec2_loading_data_time(url)
task(code)
print('Total thread time = {0:.3f}'.format(time.time()-ini))
for i in range(10): TestingThreadPerformance().start()Also EC2 can be loaded with full custom systems. Any project you already have or get will work. Learning curve + Surprises = 0.
My conclusion is that Lambda doesn't worth the cost. It's not only more expensive, but rigid, fragile and poor documented.
In my opinion you should use it only in small projects that needs less than 400k GBs and 1kk calls.
P.D. I repeat, I didn't tested this on a heavy load work for a long time, not in the EC2 nor in the Lambda block.
......................................... UPDATE ............................................
I forgot to talk about EC2's prices vs Lambda.
Lambda has no discounts by quantity. If you want 1.000.000.000 tasks, this will cost 1.000 times more than 1.000.000 tasks, and you can't reserve resources to make it cheaper. Lambda calculator
EC2 instead has 3 different options:
-
On demand: Take it when you want, and even with this you get a 20% cheaper price than with Lambda.
-
Reserved resources: You specify when you will need and how many machines, and you get a 40% cheaper price than with Lambda.
-
Spot instances: You choose the price and it will run when free resources are available. They can take it from you at any moment, not like the other 2, but it's much cheaper and If you don't finish the hour, you get that hour for free
Another thing I want to highlight again, EC2 has more resources you can exploit with multithreading. How? While you are taking a new file, process the current one. Just with this you can cut a 25%-50% the price independently of the previous discounts.
......................................... UPDATE 2 ............................................
Anyone who wants to test Lambda for free without registration can do it here.
Replying here a few new questions:
-
EC2 also can be autoescaling + autobalancing.
-
If you make a Docker from the official Ubuntu repository, you will get the latests updates everytime it compiles. Also, you can keep your EC2 instances without a public IP, so nobody can get in = NO SECURITY ISSUES.
-
With supervisord and celery, a task can check for changes in your E3, etc, so it's like a trigger.
-
You can make a svn folder for your .py scripts, and tell supervisord to run everything from there. Later syncrhonize it with your Github, SVN, etc, so you nor your team need to login in AWS.
-
You can reserve SWAP memory to avoid crashes in case of a unexpected big task. In Lambda if you make it too small it will crash, and if too big you waste money.
-
You can run any library, no need even a pure python one.As u/kakamaru pointed in his reply, you can skip this limitation in lambda with something like lambda-packages. I don't know how many limitations this has anyhow. -
EC2 Instances can be limited by permissions, for example, read-only for Database A, Write for S3 folder /result/. So even code errors or hacks will not cause problems.
-
If you don't know how to do something Google it or ask for help. Many people know Ubuntu, few know Lambda.
Hi guys, I'm extremely new to AWS. I watched a few of the Amazon AWS training modules and was introduced to Lambda and EC2. From my understanding, they are both computing services. I googled what the difference was between the two and what I read was that Lambda is used for smaller and quicker executions whereas EC2 is used for more high-performance executions. But why would you get lambda in the first place if all you're doing is computing simple tasks? Wouldn't it be easier to compute it locally? I would appreciate if someone could give examples of what kind of situation you would use each kind of resource. Sorry if the question is a bit elementary, I was just introduced to the subject.
Hi, I am building an application as a personal project for which I plan to use AWS services.
Without going into too much detail, the application is mostly just a CRUD application with the additional need to run a function on the database on the 1st of every month.
I will be using a dynamodb table for this because it is the cheapest option (A major requirement for me is low cost).
To build the application itself I have two choices:
Use API gateway and lambda to create all the endpoints I need, which I will call from my frontend which will be hosted as a static site on S3.
Build a Flask or Django app that interacts with dynamodb and deploy this on an EC2 instance. I can serve my frontend as static pages from here in this case.
Which option would you guys recommend?
I am not going to have too many users using this app. It is only going to be me. So there shouldn't be concurrent requests being made to the server.
Any help or advice would be appreciated.
Hello everyone, I was studying for my AWS CSAA exam and while I’ve sightly used both services and played around with them in lab environments, I can’t wrap my head around one question. When would I use a EC2 over a Lambda when Lambda is much cheaper and still provides scalability for a backend solution. For most simple web apps, a REST API served thru Lambda seems like a great option (esp when utilizing AWS API Gateway). When would I directly use EC2 (as well as ELB, ASG,etc)? I guess I’m trying to understand stand when a “serverless” web app makes sense. Thanks everyone for your time
EDIT: added serverless clarification
While preparing for an Architect certificate exam, I got this example question from a website:
An application allows a manufacturing site to upload files. Each uploaded 3 GB file is processed to extract metadata, and this process takes a few seconds per file. The frequency at which the uploading happens is unpredictable. For instance, there may be no upload for hours, followed by several files being uploaded concurrently. Which architecture will address this workload in the most cost-efficient manner?
The answer ended up being Store the file in an S3 bucket. Use Amazon S3 event notification to invoke a Lambda function for file processing. However, this got me thinking that wouldn't using spot instances also be a good option since the file would be stored in an S3 bucket, and then spot instances could be spun up once or twice a day.