It's interesting how AWS can keep so high prices on these. But it's just the beginning, the real money comes from when they convince you to run over a dozen vms/containers (all needing storage etc of course).
You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.
You can then get a pat on the head ("pass AWS well-architected review"). Then they innovate "serverless" stuff where you can pour developer man months and years to save on this overpriced things by making everything more complex, hard to monitor/debug and learn new aws specific techs, so you can get faux savings. Here they're buying your long term lock-in by getting AWS specialists and advocates on your long term payroll. They're doing certifications and investing in AWS careers that can carry over to the next employer too.
And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).
Just about now someone in your newly grown cloud herding engineering org champios the idea "I know what will help, let's start using Kubernetes". They've probably never costed / self administered a server in production use, or if they have, they know to keep quiet.
As an "onPrem" & "Big Data" guy I've been astonished at how fast the bill can run up.
We are in the middle of a cloud adoption where the big proponent started with "its just money, devs time costs more.. do what you gotta do" to "hey so we need to consolidate these 3 DBs, and X is on vacation so lets spin this service down, and lets not worry about multi AZ/region yet" .. all before we even have a PROD launch, lol.
In any case, was recently amazed at the cost of an RDS instance to host a meager 2B total records of vendor data. For the annual price I could buy one of the beefy servers I use to host 20% of my entire on-prem estate with 100s of vendor data sets. For my on-prem plant, 2B records is table stakes and we have datasets we receive that much data every day. Probably have something like 10T records of data across datasets if I had to speculate.
Similarly on the ops/infra folks cost save, for every on-prem infra guy they think they can RIF, they've hired a cloud infra guy who for sure costs more.
Likewise for the redundancy/backups/etc being "easy/free" in the cloud, we've already lost data in object store because of poorly execute combination of configuration & actions when try to do some permission changes/copy migration. It was completely unintentional and not noticed for weeks. No one actually executed a literal rm command at the time. Just because the object store advertises zero loss / versioning / etc, you really do still need to do backups.
Clearly theres a lot of ephemeral & burst compute / variable cost stuff that totally belongs in the cloud. However for internal apps at a firm where the usage is largely scaled with staffing levels, the amount of fixed compute makes it hard to argue every app and every use case belongs in the cloud.
That is the core argument. It can be true, but it's not always true. The larger your cloud footprint gets, the less true it often becomes. It's also not true if you are doing something that really slams one of the cloud's high cost areas like outbound bandwidth or tons of sustained compute.
If you are running a business or department it's your job to run these numbers properly and not listen to mindless cloud salespeople or developers brainwashed by them.
The cloud is great for prototyping. As you scale there usually comes a point at which the cloud cost line crosses the "hire one or two people" line. If you think you have to hire five or ten people, your architecture is probably too complex.
Of course the cloud industry really pushes complex architectures hard because they know this. Complexity works in their favor on both ends. It makes it harder to get off managed services by adding labor costs and it also means you have to run more stuff in their cloud.
It's interesting to think about the process that lead to this: "its just money, devs time costs more.. do what you gotta do"
Are cloud vendor salesmen doing jedi mind tricks? Or are these decisions just made by incompetent people? Who researches this kind of stuff? It's some kind of a management trends history subject.
There’s also a lot of politics around OpEx vs capitalEx.
In prior firms we’d have to go hat-in-hand for $3M of hardware every 3 years on an upgrade cycle. Of course few were around long enough to be on the requesting or approving side of this upgrade cycle and it would drag out painfully. Sometimes we’d try to get creative and go more just-in-time and come back for $500K every 6 months but the pain would just be more frequent.
On the other hand, $150k/mo slowly growing adds up to more in the longterm, but no senior manager ever has to approve a single $500K-$3M purchase request.
Depends on where the budgets go - salary goes into something they have to answer for - "cloud infrastructure" goes elsewhere in the budget they don't have to answer for.
There's also the thousand cuts - if you want a big server, you will have to do lots of discussion and arguing about the cost there of, but if instead you're adding a small monthly cost, the arguing isn't as much.
You keep doing that and suddenly someone notices that half the budget is AWS, at which point the "move onsite" dance begins (until the next time the big server argument happens).
I think Elon Musk got technically demoted but at any rate shuffled around for wanting to use Microsoft for the servers, Paypal ended up using Unix instead.
If RDS is expensive, you can always spin up your own DB deployment, you can use an EC2 instance, or you can just run it via ECS or EKS.
You buy those beefy servers, and where do you put them? you need a data center, where you need data center grade internet lines, you need staff to maintain all that.
> Similarly on the ops/infra folks cost save, for every on-prem infra guy they think they can RIF, they've hired a cloud infra guy who for sure costs more.
I disagree, you will need less cloud infra people, who will likely cost more, however everything will end up being more reliable than having a legion manually maintain everything.
> Likewise for the redundancy/backups/etc being "easy/free" in the cloud, we've already lost data in object store because of poorly execute combination of configuration & actions when try to do some permission changes/copy migration. It was completely unintentional and not noticed for weeks. No one actually executed a literal rm command at the time. Just because the object store advertises zero loss / versioning / etc, you really do still need to do backups.
Did you ever lose data by properly putting something to S3 or Glacier and not doing anything with it?
> However for internal apps at a firm where the usage is largely scaled with staffing levels, the amount of fixed compute makes it hard to argue every app and every use case belongs in the cloud.
It's actually easy to argue:
- the world is not going to stay stagnant, if I need 1 more server tomorrow, I can get it in minutes on the Cloud, use it and get rid of it whenever I no longer have a need for it.
- the on-prem costs are downplayed, people usually just talk about the cost of the metal and forget the upkeep and extra staff required
- the on-prem reliability can be very questionable, hardware failures can cause disasters
- cloud stacks can be easier to maintain and hand over, you have less hardware and less operating systems to worry about
This is the playbook though. It's used by everybody.
Microsoft, for example, has all of their certified developer programs. Company that use Microsoft development infrastructure like SQL Server end up getting huge discounts based on how many developers they have on staff with MS certs. This biases companies to hiring people with those certs, which reinforces the perception of value that developers have from getting the certs in the first place. Then when you have a ton of people on staff with MS certs you're NEVER going away from MS technology because at that point the company is so heavily invested in it.
It's very much an evil genius situation.
A few years back I worked for one of these companies and even though I wasn't doing much with the Microsoft stack in the company (they acquired a Rails company) I was still asked to go get an MS certification just to help with the discounts. I'm now a Microsoft certified HTML5/CSS3 developer, which after going through the IE6 years felt like the most ironic cert I could get.
The lock in mentality is very real though. As part of an architecture committee at that company, we went through many months of analysis for some of the issues that were being faced at the company. The root cause was simply limitations of SQL Server combined with horizontal scale out limitations due to licensing costs. There was absolutely nothing that could be done though, because the company wouldn't move away from SQL Server for any reason.
> And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).
Of all the AWS complaints, CDK is one of yours? I absolutely love CDK and its take on infra-as-code where you construct OO constructs imperatively in a sane language (please, please no more YAML-based DSLs...). I've found that CDK is the only one to have given us the code reusability that all of these solutions always promise while still being overtly hackable.
Debugging CDK code? What kind of wacky stuff are you trying to do? I don't think I've ever had to debug CDK outside of typical "what IAM action am I missing?" or "how do I click some checkbox in CDK?".
I'm curious what you would consider to be better alternatives to CDK.
CDK code also isn't executed at runtime. Even if it somehow is slower to develop and harder to debug (maybe if you're new to it), it's ideal that you have more of it than actual app code per line. That implies you have substantially less app code than you otherwise would, which (of course) is the entire purpose of these cloud-based abstractions.
The amount of app code involved to stitch Kinesis, Lambda, and S3 together is minimal. The amount of app code to solve the same problems without them is orders of magnitude more plentiful and complex, which is why we avoid it.
> The amount of app code to solve the same problems without them is orders of magnitude more plentiful and complex, which is why we avoid it.
The amount of application code to write to a filesystem is... zero.
The amount of application code required to setup a webserver in Java Spring is... zero.
Not sure what Kinesis does, but the amount of application code required to connect to a database is zero (spring), to connect to a rabbitmq/kafka is zero (spring).
And the best part is, it is less complex, mostly open source, and locally runnable compared to the Amazon rube goldberg machine.
There will be substantial amounts of code you are writing to provision, set up, maintain, scale, and troubleshoot all of those things. Whether you call it app code or not, it's something that you (and your team) will be on the hook for building and maintaining over time.
Getting your Java Spring web server to scale to thousands of concurrent instances will require a lot of undifferentiated work outside the scope of Java Spring. Same with the care and feeding of RabbitMQ and Kafka. And to suggest that's the approach that's less of a Rube Goldberg machine is... questionable.
What sort of application are you building where you need a thousand instances of a spring application?
Like even hosted on a low powered hardware, Java services can easily deal with order of 100 requests/second. This is home turf, this is what Java is really good at. That 100rps may be lowballing it. So you're what, processing 100k+ API requests per second? I can't imagine that would translate toward any less than a billion users.
At that point you are either making good headway toward becoming a new letter in the FAANG acronym and ought to have the budget to run your own data center, or your attention is better put toward optimizing your comically inefficient application.
What sort of app code are you writing where you can keep data, authentication and user data all on the desktop?
Just because you can write a docker-compose.yml (or do without) that describes a system that holds together, doesn't mean the system you've produced stands the chance at SOC2 or whatever standard you hold dear.
Having just finished a 27001 audit, I promise, "we rely on AWS for X" went a lot farther than "we have a homegrown system that does X", every single time. AWS - box checked, homegrown? Let's dig in for two more hours.
Thousands of instances is easy to need - if you have an active CI/CD lifecycle - you just might not need them all at once. Even just one per PR and it adds up super fast if you've got a competent developer team.
Or this Java-based Spring application is one of dozens or hundreds of ones built by different teams, for different customers, or for different purposes. Some of them may even be applications not built in Java or Spring.
> There will be substantial amounts of code you are writing to provision, set up, maintain, scale, and troubleshoot all of those things.
And the amount of infra code you have to write to do that in AWS is any less? No, it's just as much, except more arcane.
> Getting your Java Spring web server to scale to thousands of concurrent instances will require a lot of undifferentiated work outside the scope of Java Spring.
Scaling a RESTful webserver is an exercise in slapping a load balancer in front of it and running a few more instances. Not exactly rocket science.
> Same with the care and feeding of RabbitMQ and Kafka
Same on AWS, except now you are at the mercy of someone else's whims to see this works when it fails.
> And the amount of infra code you have to write to do that in AWS is any less? No, it's just as much, except more arcane.
It's substantially less, and declarative. It also includes stuff commonly pushed to phase 2 in on-prem deployments, like TLS between entities, proper IAM, and built-in metrics/logging.
> Scaling a RESTful webserver is an exercise in slapping a load balancer in front of it and running a few more instances. Not exactly rocket science.
If it's stateless, yes, though I was just responding to an example. Not all technical problems are so simple.
> Same on AWS, except now you are at the mercy of someone else's whims to see this works when it fails.
The business is always at the mercy of someone "else," whether it's you, the person who replaced you, or AWS. And AWS is way better at keeping S3 running at scale and maintaining this stuff over time than your average IT / DevOps employee. Same with Lambda, Kinesis, DynamoDB, SQS, etc. Unless you've got special requirements, you're wasting time and resources reinventing the wheel.
There are declarative ways to manage bare metal machines, such as terraform, nix and puppet.
> And AWS is way better at keeping S3 running at scale and maintaining this stuff over time than your average IT / DevOps employee
Yeah no, the outages we faced _because_ of AWS were way, way worse than what we faced with our average IT/DevOps employees at my old jobs. This keeps getting repeated and it's going to keep being wrong.
I would also be interested in what would be the alternative not in the cloud? AWS deployments can get complicated due to the IAM rules, networking rules, security groups. But if I told someone, bring up a stack like that on bare metal, and make it repeatable what would that be?
For those of us that have worked on systems pre-cloud it does feel sometimes that it was easier in the past. But if you went to explain to someone new what to do, you realize how much you are taking for granted the knowledge and effort involved. Also the long term maintenance.
I love CDK but it does have issues. Like if you wanted to launch your load balancer in a new AZ you would need to structure the code to carefully create the new infrastructure, pivot your DNS to point to the new load balancer, then you can delete the old infra. It’s more complicated then just adding more subnets unfortunately
> You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.
Why would you care about the redundancy on staging / dev?
If you're not testing az failovers you're probably just wasting money, like untested backups... But it's true that most people don't know this and aren't testing it because they don't know that az failovers don't automatically just work(tm). And the second downside of course would be asymmetry between the envs and divergence in the IaC etc resulting in more complexity and engineering work.
(Of course you're probably wasting money anyway since you don't actually business-wise need better uptime than the single AZ, and your complexity induced human fumbles will cause much more outages anyway, but this has been a main selling point of the decision to go to AWS, so the requirement needs to be defended)
Yes, you can build automation to have the redundant stuff up only sometimes, if you eat the engineering effort and complexity in your IAC and build automation... in the general vein of justifying engineering spend to offset AWS operating costs where running containers is very expensive!
TLDR: either way you end up paying for the very high markup in compute prices, it'll just be easier to excuse jumping through expensive hoops to "save money" on it.
If you’re building your own infrastructure in a data center then sure you absolutely want to test your redundancy.
But with AWS it’s a checkbox. It’s transparent to you and your applications. The infrastructure to host in multiple AZs is already in place. The only real issue with MultiAZ is the failover in RDS depending on the database you use could be seconds or 10s of seconds.
IME not so in the real world, you'll have accidental state in your distributed system outside the DBs. You'll have some stuff that actually always runs in one AZ in normal circumstances and your integration partner in another org has whitelisted only its IP. Etc etc, everything that is untested will find ways to conspire to rot. Especially if you haven't learned by seeing these bugs so you can avoid them.
Also you won't have a clear experience and understanding of what happens in the failover and you'll won't know to avoid failover breaking mistakes in your VPC configs, security groups, frontend-backend shared state etc. (And by "you" I mean "your dev team", it's not enough that one guy gets it).
Also^2 if you read the news about all the outages it's very common for failover systems to fail generally, not just AWS - the general engineering wisdom is: always test your failovers. And there's no substitute for end-to-end testing it, instead of individually testing each layer/module. (Bad: "we can skip testing db failover", good: "let's test that the whole system works when there's a az failure")
Dealing with this now for a client. Can't test Redshift AZ relocation feature because there's no way to simulate AZ failure. Only safe bet is full multi region with DNS switcheroo.
Back in colo days, I saw a lot of post-mortems that read “the thing we thought was redundant wasn’t”, leading me to call them “redundant like an appendix [rather than like a kidney]”.
We instituted quarterly “game day testing” where we forcibly turned off one of the redundant items in all of our systems. It took us about 6 such cycles before these tests didn’t turn up outages that were just waiting for us.
Thinking back on those, it’s hard for me to believe that most cloud hosted companies are prepared by checking a box without actually testing.
> Thinking back on those, it’s hard for me to believe that most cloud hosted companies are prepared by checking a box without actually testing.
We are talking about MultiAZ, Availability Zones. Not different regions. Setting up redundancy across regions is not easy. But for majority of the people using AWS a single region with MultiAZ is good enough.
About 5 or 6 years ago we had an alert in the middle of the night that our RDS instance dieded. It failed over in about 15 seconds (SQL Server so it’s a bit slow compared to PostgreSQL) but the MultiAZ worked as advertised. The downside is AWS never told us why it occurred.
I’ve seen a few AWS instance hardware failures, they happen with some regularity.
You can handle single instance failure without being multi AZ.
Testing an actual AZ failure, as in the whole AZ going offline or getting partitioned from the other AZs, is pretty much impossible.
Those are basic (don't cover flapping or glacial-speed slowdown degradation modes, some services only, etc) but a starting point at least that can be extended.
>But with AWS it’s a checkbox. It’s transparent to you and your applications. The infrastructure to host in multiple AZs is already in place. The only real issue with MultiAZ is the failover in RDS depending on the database you use could be seconds or 10s of seconds.
Have you actually seen this work on your project in practice ? Like a region go down and another region picks up automatically and it kept working just by switching a checkbox ?
Multi AZ is multiple availability zones. Not multi region. Distribution over multiple regions is obviously harder than within the same region and different zones.
And I'd test the checkbox all the same. We learned just this week that one of our setups, which checks the cloud-provider provided box to have its VMs distributed across 3 AZs, is susceptible to the loss of a single AZ. Why? Because the resulting VMs … aren't actually distributed across 3 AZs as requested. (The provider has "reasons" for this, but they're dumb, IMO. It should have been as easy as checking the box.)
They are from user pov just separate networks where you can deploy separate copies of services, replicated db cluster nodes &whatnot. Each service handles (knock on wood) an az becoming unreachable/slow/crazy independently. Which can become fun with service interdependencies, and a mix of your self implemented services + AWS provided ones.
Even if you don't want it sometimes you're forced to run multiple AZs (eg: EKS requires 2x). But that 18x figure is nuts. VPCs, AZs, subnets, IAM etc are free. The cost comes from what you deploy into them. So separate environments don't have to be as expensive as production. You can scale them to zero, use smaller computer instances, run self-managed versions of expensive stuff (like DBs) or simply run small single-instances of RDS instead of large redundant clusters. Non-prod environments they're a great place to experiment with aggressive scale-down on cost while observing performance.
Had to use a NATGW temporarly on AWS Lambda's to access the database and make remote http calls. But now it all works without the NATGW. Haven't had any other need for one.
> Why would you care about the redundancy on staging / dev?
If you deploy to multiple regions then it wouldn't make no sense at all to have a single preprod stage, specially to run integration tests.
Also, keep in mind that professional apps support localization, and localization has a significant afinity with regional deployments. I mean, if you support japanese and have a prod stage on Japan, would you think it was a good idea to run localization tests in a us deployment?
Currently have a deployment in Oregon and cloudfront servicing most of the world. With localisation. No need to deploy in Japan to support Japan. The only tricky one is China because of the firewall. Tho that hasn’t been an issue for the last few years as they haven’t been blocking cloudfront completely. (This has been running for 10 years in AWS without issue)
> Currently have a deployment in Oregon and cloudfront servicing most of the world. With localisation.
So you're serving static assets through CloudFront with a single backing service. Congrats, you managed to have a service that doesn't offer regional services.
Also, you definitely don't support clients in China, or enjoy shooting yourself in the foot.
Most professional applications with a global deployment and paying customers don't have the benefit of uploading HTML and calling it done.
So 10+ years of supporting broadcasters and creative agencies around the world from 1 region is not supporting those countries. Got it. I mean there’s services in those regions for the tasks performed but even then it doesn’t require the level of testing you’re assuming it does.
If you're deploying to multiple regions in aws then you'd presumably have to roll out the infrastructure for 3 regions yourself if you weren't? In which case I gotta assume using aws is probably a lot more straightforward straight off the bat than rolling your own solution in three different data centers?
The original machine has a base-plate of prefabulated amulite, surmounted by a malleable logarithmic casing in such a way that the two spurving bearings were in a direct line with the pentametric fan. The latter consisted simply of six hydrocoptic marzelvanes, so fitted to the ambifacient lunar waneshaft that side fumbling was effectively prevented. The main winding was of the normal lotus-o-delta type placed in panendermic semiboloid slots in the stator, every seventh conductor being connected by a non-reversible tremie pipe to the differential girdlespring on the "up" end of the grammeters.
I think this is the line that triggered the parent:
"Cloudfront can distribute static content and backends running on Lambda@Edge."
The really scary thing is that the aws products being lampooned are the mainstream ones. Amazon has, what, another couple hundred "offerings"?
Here on HN, there are people that say if you're using AWS only for the basic EC2 you're doing it wrong. They can be correct in the tactical sense, but in the strategic sense, AWS is cackling with glee.
If I were building an IT org, I'd have an Infra as code foundation that from the foundation supported both AWS and other options that are far cheaper. AWS for prototyping and DR/redundancy but it "seamlessly" (I can hear the eyeballs roll and you're all not wrong) will deploy to non-AWS.
It's crazy that the basic tooling for EC2 / google computer / azure / etc to do this is only at the quarter-state. Hey HN, isn't there a massive opportunity for a company to meta-manage this?
This seems overwhelmingly pessimistic. VMs, serverless, and Kubernetes all panned without a suggestion of an alternative. The theme is knocking on AWS, so is your suggestion to go back to colos and self-hosting? Does that really sound better than infra-as-code?
You can go from an empty cloud account to a fully-provisioned application stack in minutes with the right infra-as-code. Provisioning infrastructure at that level is mutually exclusive with self-hosting.
But true, even with self-hosting, you can use infra-as-code at various levels (Ansible, OpenStack, self-hosted Kubernetes). It's just not really comparable to having your entire stack defined and provisioned that way.
Who cares when you can accidentally bankrupt your company with the wrong keystroke?
Time to spin up is nice for prototypes, it can be very freeing.
The mistake that people make is assuming AWS can host things better than they can, since they’re only mere mortals: it leads to a sort of learned helplessness where you assume the cost of a thing is it’s true value and not an egregious markup.
My biggest gripe with these cloud providers (except maybe GCP) is that they promise less ops work and it can be true in the beginning; but over time the ops work becomes basically the same burden except esoteric to the provider.
It’s IBM mainframes again, with specialists on IBM pushing more IBM because it’s their bread.
But overall, if you know your problem then time to create an instance isn’t very valuable, believe it or not: the majority of workloads are not excessively elastic, at least the upper bound is not unlimited like many people seem to claim.
Apart from that !cloud != self-hosted. There’s plenty of hardware providers that can get you a dozen machines in under an hour; even discounting virtual host providers like vultr and Tulsa.
The "bankrupt your company with the wrong keystroke" is not entirely accurate. AWS does work with companies (or even individuals) if they genuinely made an error that wracked up a huge bill. Personally they have dropped bills of $1000's when I made a bone headed mistake and have seen companies get $100 000's of bills credited due to the same issue. They are not in the business of ripping people off in the short time who would spend a lot more than that in the long term.
> Who cares when you can accidentally bankrupt your company with the wrong keystroke?
I think it's fair to say millions of AWS customers manage to avoid bankruptcy quite successfully. If you're incapable of following basic best practices, setting billing alarms, and using IaC, I agree AWS is probably not the best choice for you.
> The mistake that people make is assuming AWS can host things better than they can, since they’re only mere mortals: it leads to a sort of learned helplessness where you assume the cost of a thing is it’s true value and not an egregious markup.
Of course they can--they do it all day every day at Internet scale. I don't want to build S3 everywhere I go. I just want to use it. You could call it a "learned helplessness" in the same way that I don't want to build my own RDBMS, either. I'd rather just use Postgres. This is just the next level of abstraction.
> My biggest gripe with these cloud providers (except maybe GCP) is that they promise less ops work and it can be true in the beginning; but over time the ops work becomes basically the same burden except esoteric to the provider.
It really depends on your team and architecture. If you try to be as cloud-agnostic as possible and abstract away AWS from your devs, then absolutely: you're going to be churning through a lot of ops work that feels like it could be done elsewhere. The more cloud-agnostic you try to be, the less value you'll get from AWS. I've seen numerous companies learning this with EKS (hosted Kubernetes).
That's not the only approach, though. The managed services (like S3, Lambda, DynamoDB, and Kinesis) really add a whole lot of value with substantially less code if your devs are willing to use them. They can even cost less, and especially so if you factor in time spent toward development and building/maintaining alternatives.
> But overall, if you know your problem then time to create an instance isn’t very valuable, believe it or not: the majority of workloads are not excessively elastic, at least the upper bound is not unlimited like many people seem to claim.
A lot of time you don't know your problem, and the minimum instance size for a dev environment in a cloud-agnostic architecture can be significant. Scale to zero is a big help not just in deployment time, but also developer productivity and autonomy.
> Apart from that !cloud != self-hosted. There’s plenty of hardware providers that can get you a dozen machines in under an hour; even discounting virtual host providers like vultr and Tulsa.
I acknowledge they exist, but I do think they're a fraction of the market for good reason. The value adds of cloud providers often outweigh the costs and complexity of the undifferentiated heavy lifting needed for necessary feature parity.
I think one of the major contributors that is often not mentioned with cloud providers are the actual setup/human maintenance savings. As is, the premium that is added to the 'managed' services is usually much less than what would cost a DevOps to maintai /run. I.e - it is safe to assume (at least the market that I'm in) that 1 hour of DevOps services will cost 150USD. 10 hours would be 1.
5K USD. That is actually enough to host a PHP based medium sized ecommerce solution by using purely managed services for about a month (<500 orders/day, using AWS ECS, RDS, ALB, OpenSearch, ElastiCache). Sure, there is still the DevOps cost attached to such solutions, but it is drastically lower than on premis. I've been using these arguments with my customers to migrate them to AWS for the past few years (since Fargate wide adoption) and its been great so far. I do not think that saving money should be the primary reason to migrate to the major cloud providers. The primary reasons should be the increased robustness, ease of maintenance and additional tools that you get.
PS. This is written entirely from the perspective of running multiple small/medium ecommerce solutions (500 - 2000 usd/mo AWS bill) using the stack mentioned above. I have no real world experience doing grand scale setups.
Perhaps I am confused or we are talking about different things and consequently talking past each other, but I can literally open a new AWS/GCP/Hetzner/DO account, plug the credentials into my local configuration (in code), and then run a command with NixOps to provision an entire network of machines with custom specifications, and to automatically install all the software I need on those machines.
Perhaps you aren't familiar with what NixOps and similar tools can give you?
It sounds like we're using different definitions of the word "self-hosted." If you have your own on-premises lab or colo rental, you're not using AWS/GCP/Hetzner/DO, and you've got a lot of undifferentiated heavy lifting before your NixOps kick in (including maintenance going forward).
If your point is that you can avoid a serverless architecture, still use cloud, and still use infra as code: of course you can. We've got to be disagreeing on what "self-hosted" means. OP criticized the cost and complexity of deploying EC2 instances and RDS databases across AZs, so presumably infra-as-code wouldn't help him here. OP didn't present an alternative solution, but reading between the lines is to not use cloud infrastructure (e.g. on-prem or colo).
That said, if you just rent dedicated servers from them, you don’t have to worry about maintenance, but don’t have to pay the ridiculous cloud markups either.
I didn't knock on VMs or infra as code generally, or suggest you go all the way to self hosting. Doing IaC of course is orthogonal to AWS. Your best alternatives depend on your needs. It might be some higher level app platform too, like various current Heroku style platforms etc. Just know what you're getting into.
You've got it backwards - which is to some extent why you have this problem:
If you're fully utilizing your instances, you're working from the bottom of the pricing up, and all these things (3x redundancy, etc) are marginal costs of doing business because you're successful. What you're opting out of is the data-center workers, the sysadmins, the operators, and the expertise that comes with if it offered you value.
Your hundreds of lines of CDK are people you're not employing to write them (it's your job I guess?), and applying the AWS systems and services to do that work for you - given you've chosen CDK, I have to assume you've bought into AWS anyhow, and someone in the chain sees the value extracted or savings approach.
Where AWS burns you is when you don't know what you're doing - you have a PHP app, so you put in on an instance, you need it reliable so you have two, you need shared storage so you put it on EFS, and so on and so forth.
If you're at the other end of the spectrum, and you want to buy the level of reliability and redundancy you get from three of the smallest instances in AWS spread in three AZ's, you're looking at a six figure investment. So yes, it's pricy if you use it like you use the desktop under you (8 hours a day, usually at 20% load), but you can also run millions of hits for pennies if you learn that most of what you're doing is overhead that gets built into, well, serverless.
The idea of deploying an app to a machine sitting next to me, and serving the few requests per second just from that, is highly appealing. The only thing I'd always want is offsite backups which are best done to the cloud. You've inspired me!
Indeed, this is often the next step. The new architect must have lots of AWS certifications. With the new architeture, the long term cost savings will be exponential, compared to the current projected cost curve!
This can optionally combine with the Kubernetes scenario.
This was a great read, thank you. I’m just now removing my application-focused blinders, but I already see the same technical, organizational, and financial issues.
> This can optionally combine with the Kubernetes scenario.
Some people like to poke fun at the expense of kubernetes, but if they did any professional work, at the first failed deployment they would sell their firstborn to have something similar to kubernete's deployment rollback feature.
We don't use kubes. Deployments sometimes fail and we deploy the old versions easily. No firstborns sold.
I have used kubernetes blue/green deployments in an old job and it was beautiful. But to say kubernetes is that much better compared to rsyncing your executables to the server and restarting the service is plain wrong. It's a bit easier and a bit more declarative, sure, but manual rollbacks were a thing fifty years ago and still are.
> You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.
Don't believe this has anything to do with the cloud, if you want the same thing with your own data center, it's going to be 18x there as well.
> You can then get a pat on the head ("pass AWS well-architected review").
Or you can be pragmatic, the cloud will allow you to do whatever you want, nobody is stopping you.
> Then they innovate "serverless" stuff
Serverless does not replace everything, there are some things it's good for, but not everything, if you use it for something that it's not useful for then that's an issue with your judgement, not with the fact that it exists as an option.
> And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).
I have thousands of lines of YAML and MD that I can give to anyone who is skilled enough to copy and paste and that person can 100% replicate the environment without me helping them once.
If a component fails, I can just rebuild it very quickly and 100% accurately by just re-importing the Cloud Formation code, if I need an environment that is 100% same as the one that I already have, I just need to re-run a few scripts and I have it.
What credible alternative can you propose that is better than infrastructure as code?
> They've probably never costed / self administered a server in production use, or if they have, they know to keep quiet.
I could make a similar snarky remark about sys-admins not understanding that the world is not static, however I would instead focus on the fact that acquisition of hardware is nowhere near the total cost of the infrastructure.
According to a 3 second Google search, sys-admins in the US can cost around 75k-140k. Sys-admins can also slow down development by having to perform more manual actions and having to be more over protective, that adds additional costs, because you will still have to pay your idle developers while they are waiting for the sys-admin to action the ticket where they requested write access to their own home folder.
You could spend less than 18 x AWS cost, because AWS compute is so expensive, depending what hosting you chose. A self owned DC is isn't the best option for most. You might also skip the overkill (for most cases) AZ redundancy because it wasn't pushed on you. You might even go with a much more managed platform, like
the Heroku like ones. Depends on what you build and for who.
I'm not knocking IaC, but unnecessarily huge amounts of IaC to manage unnecessarily complex AWS service infra after you can't afford the monolith + db model and get roped to dynamodb, lambda@edge, API gateway, step functions, etc all requiring IAM roles, security groups, observability tooling, cicd & version control complexity, config services etc etc all the downsides you read about in microservice horror stories. And you can't even ssh in and strace or tcpdump the stuff like you could with your own microservices, they're black boxes.
I also don't want separate sys admins, at least the bad kind you describe. But having your own servers, or knowing what they cost, doesn't mean you would.
You can, but I'm sure you are making a tradeoff to achieve that, sure that trade off might be worth it for your use case, but I seriously doubt you could get exactly the same thing you get from AWS for substantially less, you need to factor in the reliability and flexibility aspects, if an EC2 instance, fails, I can just get a new one, if your own server fails, hopefully you have another one on hand.
There are other things that make the AWS offering compelling: you get everything in one place and you can expect to find IT people who know how to operate AWS, whereas that is probably not true for smaller competitors who might be cheap but also have less to offer in terms of services and mindshare.
> AWS compute is so expensive
I do auto scaling, which means that if nobody is using the application, I'm paying for a single small instance and then I use ECS tasks for offline computation, which means I end up paying exactly for what I use.
If I were hosting my own on-prem solution, I would still end up building something similar.
> AZ redundancy because it wasn't pushed on you.
If you work for a client that wants AZ redundancy, likely they are a client that is more than happy to pay the premium. For some clients, reliability matters more than infrastructure cost. For those clients infrastructure cost might even look like a rounding error compared to all the other costs.
I generally have no problem implementing a requirement that I don't believe is technically needed, if the business wants it, if they believe 3x the cost is worth it for the additional resiliency.
> you can't afford the monolith + db model and get roped to dynamodb, lambda@edge, API gateway, step functions, etc all requiring IAM roles, security groups, observability tooling, cicd & version control complexity, config services etc etc all the downsides you read about in microservice horror stories.
1 instance(s) x 2.32 USD hourly x 730 hours in a month = 1693.6000 USD (Aurora PostgreSQL Compatible DB)
A single developer will cost you at minimum 10k per month.
From a business perspective, I'm more than happy to pay thousands more for RDS and potentially overpowered EC2 instances than wasting far more expensive developer time on going nuts with serverless. For small things where it makes sense, sure I'm happy to go with serverless and save a few pennies, but not at massive developer time costs.
> And you can't even ssh in and strace or tcpdump the stuff like you could with your own microservices
Then you should just stick to EC2 instances if that's your cup of tea, though the idea of debugging individual instances becomes distant when you start auto scaling or start using containers.
You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.
You can then get a pat on the head ("pass AWS well-architected review"). Then they innovate "serverless" stuff where you can pour developer man months and years to save on this overpriced things by making everything more complex, hard to monitor/debug and learn new aws specific techs, so you can get faux savings. Here they're buying your long term lock-in by getting AWS specialists and advocates on your long term payroll. They're doing certifications and investing in AWS careers that can carry over to the next employer too.
And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).
Just about now someone in your newly grown cloud herding engineering org champios the idea "I know what will help, let's start using Kubernetes". They've probably never costed / self administered a server in production use, or if they have, they know to keep quiet.