Domain-Oriented Microservice Architecture

sagichmal · on July 25, 2020

> In other words, organizations adopt microservices for an operational benefit at the expense of performance.

...no, not at all. There are a couple of operational benefits, but vastly more drawbacks, and on balance microservices are phenomenally harder to operate than monoliths.

Organizations adopt microservices when the logistical overhead of coordinating teams against a monolith becomes so large that it starts affecting product velocity. Microservices are a way to release that organizational/logistical friction, at great technical cost.

With that said, the domain-oriented bounded context is indeed the right way to think about service delineation.

scarface74 · on July 25, 2020

The last company I worked for was very much an “API first” company. We were a B2B company and our APIs were used by our relatively lightly used website. But they were also used for sporadic and large ETL jobs when files from customers came in and most importantly we sold access to our APIs that were used for our customer’s mobile apps and heavily trafficked website.

We went microservices for the canonical reason you should - to be able to release and scale independently.

sagichmal · on July 25, 2020

That reason is oft touted but experience has pretty unambiguously revealed that it never pans out in practice. I mean you do get that benefit but the price you pay for it is absolutely disproportionate if that's the only thing you want. Opting in to microservices purely for presumed technical benefits is absolutely a mistake.

scarface74 · on July 25, 2020

How is it a “presumed” benefit when we actually did sell access to our APIs to clients? Some of our APIs initially had very low usage during the day until a batch job came in, others saw a spike in usage by over 100% when we bright a new client in.

We were used by health care networks. Can you imagine the increased use post-Covid?

We even had some that were both hosted on Fargate (Serverless Docker) with lower latency, more expensive but slower scaling for online use and hosted on Lambda for internal batch use - higher latency, faster scaling, less expensive for batch use. The CI/CD pipeline deployed to both.

sagichmal · on July 25, 2020

My point is that you don't need to rearchitect your application in an entirely different style to be able to scale in response to load.

scarface74 · on July 25, 2020

If only part of your “application” has more load than others what do you suggest?

That instead of being able to granular take part of the application that had a larger load, and run it on a Firecracker micro VM (the underlying VM for lambda and Fargate) with 256MB RAM and 1 core, we add enough VMs to scale a monolithic app - even the parts that don’t need it on a full scale VM with 8GB RAM and four cores?

We did actually have to do something similar for a legacy Windows app. We scaled the entire process up based on the number of messages in the queue. It was extremely wasteful. It required at least a 4GB/2 CPU VM compared.

sagichmal · on July 25, 2020

Scale the whole thing. The cost of doing that is usually orders of magnitude less than the true CapEx and OpEx of microservices.

edit: If breaking out one service from your monolith worked for your use case, that's great. I'm not trying to deny your experience. It is atypical, however.

scarface74 · on July 25, 2020

We are talking about a 16x difference in resources. Would you also suggest scaling a database that was more read heavy than write heavy instead of splitting reads and writes when you can deal with eventual consistency and just autoscale the read replicas?

sagichmal · on July 25, 2020

The comparison is not equivalent.

cutemonster · on July 26, 2020

Agreed, I think the comparison is not equivalent,

because a read replica actually is the same database application (same "monolith") just started with different parameters, to behave as a replica

scarface74 · on July 26, 2020

The database is the same but you have to then separate your code into reader and writer services with different connection strings, you have to make sure that anything that can’t be eventually consistent uses the writer connection string, etc.

It’s not just a matter of spinning up a database.

Also, since many enterprise apps live stores procedures and putting business logic in the database, that’s another ball of wax you have to untangle.

rumanator · on July 25, 2020

> Organizations adopt microservices when the logistical overhead of coordinating teams against a monolith becomes so large that it starts affecting product velocity.

That's certainly one of the operational limits, but arguably not the most important one.

You distribute your system so that parts of it may scale independently, for example. Traffic fluctuates along time and the only option you have to scale your system is adding nodes as you go.

sagichmal · on July 25, 2020

The ability to scale services independently is not as impactful as many people believe, and changing your entire architecture for no other reason than to get that specific benefit is a mistake in the vast majority of cases.

Microservices are a solution to organizational problems, not technical problems. On balance they create far more technical issues than they solve.

Twisol · on July 24, 2020

Depending on what angle you come at this from, you could say that DOMA groups services into clusters ("domains"), as Uber has done here, or that services always should have been domain-driven, and DOMA welcomes the networking layer inside the bounded context as well.

I've spent a lot of time trying to understand when a microservices architecture makes sense, what the caveats are, and what philosophy one should take to building services. All the material I've read seems to point in the direction of services being ideally coupled to domain boundaries.

It seems to me that Uber's services proliferated beyond the framing of bounded contexts, and DOMA is their attempt to reign it back in again. I think it's an excellent strategy, and arguably a very good approach for other companies who find themselves in this position.

I don't think DOMA is a good place to stay at. The network should only be tolerated as long as it provides benefits that outweigh the costs. "Monolith" is not synonymous with "poor design". Seeing that these enclaves of services sitting within a domain depend on eachother in the way the OP describes, it really makes me think that they'd find further benefits by expelling the network from each domain.

adamkl · on July 25, 2020

> The network should only be tolerated as long as it provides benefits that outweigh the costs.

I agree. There should be no need to use network calls to enforce interface boundaries if you have a cohesive bounded context.

Or to be snarky about it, welcome to 2004 Uber! Eric Evans sends his regards!

gen220 · on July 25, 2020

Unfortunately, picking and enforcing bounded contexts is hard work in a big organization, even with strong code review processes.

There will always be people who don’t want to respect where the boundaries are drawn (not in a constructive, “it could be better” kind of way, but in a “but this works, too” kind of way). If a group of such people get together, microservices compartmentalizes their capacity to drag down the ship, so to speak. I think this risk compartmentalization is a benefit that must be weighed against the costs (in terms of latency, maintenance of shared libs, opentracing, etc). These days the costs are vanishing, as tools are quite good and becoming easier to manage.

All that said, if you’re a small team of senior engineers working with a shared mental model, a single binary with internally-bounded contexts works really well and I agree with you, having seen it done well.

adamkl · on July 25, 2020

I was lucky enough to be in the right place, at the right time, to lead a group that scaled this approach across half a dozen different development teams.

Fortunately everyone bought into the architecture, and respected the boundaries. Not everyone was senior, and not all the code was great, but we adopted the viewpoint that so long as the bad code is in the right spot (and not talking to things it shouldn’t) everything would be ok in the end. And it was.

Half a dozen teams working in one codebase was definitely pushing it though, and the need to scale much beyond that would have definitely required some service-level compartmentalization to keep the ship from sinking, as you said.

Even then, it would still be a far cry from the “microservices should be small enough to re-write in 2 weeks” approach.

coryodaniel · on July 25, 2020

It’s important to respect the boundaries, but also have the flexibility to change them as the business changes.

I’ve seen many times where people were afraid to change boundaries because they assumed the first person got the architecture exactly right.

pjmlp · on July 25, 2020

If the programming language is compiled and modules are distributed as binaries across teams, they have no means than to comply with modularity.

blunderfel · on July 25, 2020

I'm a big fan of the modular monolith pattern. I usually make domain modules as independent as possible and invert dependencies for web and persistence layers. If you design well you can break off domain-based services whenever the advantages warrant it.

gen220 · on July 25, 2020

If you ever have to start a big company from scratch, this is how to do it.

There aren’t any major drawbacks to this model when a business is young (first couple years). The downsides appear when you have different parts of your application with very different load requirements.

It also takes a lot of discipline to write code this way. Without strict code review and more experienced hands, the bounded contexts fall apart.

One of the advantages of the microservices model is it limits the damage people can do. :)

It forces a bounded context on a team of engineers and says “hey, play in this sandbox and follow these SLAs. If your internal designs are awful, good luck.”

fiddlerwoaroof · on July 25, 2020

It mostly hides the damage: instead of a code base no one understands, you have network traffic no one understands.

ct520 · on July 25, 2020

I’ll take the latter any day with good patterns of aggregating to a grey log. Having to triage production issues in a multiple application saas environment, The latter has always been easier to me. Don’t get me started on trouble shooting someone else’s crazy event queues

fiddlerwoaroof · on July 25, 2020

This is really interesting to me: I’ve always found dealing with code preferable to dealing with network communication. It might be because the languages I work with (common lisp, Clojure and Scala/Java/Kotlin) all have excellent code navigation abilities.

ct520 · on July 25, 2020

That would make sense in a perfect world to me. I would prefer it too. This company I worked for was leader in its industry and many vertices. It had legacy apps from 80s still running through today. Probably 30 SAAS based applications or so. Many many different languages used. Many using internal services and queues to communicate with each other. With 100's if not 1000's of B2B integrations, pumping millions of requests through their portal at any given time. (they also maintained their own data centers.) Anyways, given my experience and troubleshooting message queuing etc and jumping in new code bases all the time. I was more like a blend of SRE / Dev / IT / Product manager. (yeah I know). Given I worked across the SDLC I always found it easiest to, establish the problem statement and the behavior around it. The expected behavior, then dive into the gray log with a unique piece of information that should be logged and trace it from there. To each their own. Unfortunetly with architecture this way I commonly see "segmentation" between Support/Ops/Dev were a problem can end up in limbo. That's were I would hop in.

gen220 · on July 25, 2020

If you have a monorepo, you get both! (You can grep for logging messages in the services you’re calling).

fiddlerwoaroof · on July 25, 2020

What would be even better is some way to use swagger/graphql to jump from frontend api calls to backend code

theptip · on July 25, 2020

As far as I can tell from reading Evans' DDD, there's nothing forbidding network calls inside a service. For a trivial example, you make network calls from your API to your DB. And also to your Redis cache, and then if your service runs async/periodic tasks, to your task queue, say your RMQ & Celery instances.

So to me the OP reads like "we're coming up with some new terminology for a bounded context, and also defining how those contexts should be allowed to layer in order to simplify/control failure modes".

The layering stuff is more interesting than Uber's rediscovering bounded contexts, though it's definitely interesting that they have come into agreement with Evans (and the rest of the DDD community) on the "Service == Bounded Context" principle.

Twisol · on July 25, 2020

> As far as I can tell from reading Evans' DDD, there's nothing forbidding network calls inside a service.

You're quite right.

> [...] defining how those contexts should be allowed to layer [...]

This, however, seems to go against the spirit of things. There is a consistent "ubiquitous language" within a bounded context, where domain terms are concrete and unambiguous. (Or rather, the context disambiguates the language.) The concept of "layered contexts" seems to neuter the concept. Does each layer successively disambiguate the one above? Or does it add new terms that didn't exist?

The layering here sounds much more technically-motivated than domain-oriented. And my argument is that the networking internal to their "domains" is largely an artifact of having build DOMA out of a plethora of disorganized microservices. Doubtless there will be some necessary networking remaining, as you remarked on, such as between processors and databases. But the origin here suggests most of it is left over from what came before.

theptip · on July 27, 2020

Playing devil's advocate for a second, I'm wondering if, at Uber's scale (thousands of microservices, perhaps that means hundreds of bounded contexts after applying "DORA"?), the observations that make hexagonal/layered architectures a useful design within a single service become relevant at the system level.

I'm not sure how many systems have been built using DDD with hundreds of interacting bounded contexts, but I suppose I could believe that _some_ structure would be beneficial. (If you know of any case studies here I'd love to hear of them, I've not actually seen anything published on this topic.)

In general the concept of an "infrastructure bounded context" seemed a bit weird to me from my understanding of DDD, but then I though about Kubernetes, and you could make a case that it is an example of such a bounded context; it has its own ubiquitous language, etc. It would be weird for your infrastructure to have any understanding of the domain objects running on top of it, so a hierarchy makes sense.

Likewise if you have BFFs for your different API clients; the domain services underneath them could be abstracted away from things like REST, if all your internal services use gRPC (for example). You could consider this the UI layer in DDD's layered architecture.

I'm struggling to come up with more sensical layers than that though; in DDD there's the Application and Domain layers; I don't really see how you'd pull "Application" vs. "Domain" bounded context layers together in a way that made sense.

> But the origin here suggests most of it is left over from what came before.

I'd certainly agree with this -- it seems like lots of the intra-BC complexity is excessive compared to what you'd get if you built your services with a BC in mind from the start.

I don't think I'd emulate their intra-BC structure, it's only the inter-BC organization that I think has any merit for other systems (and even then I'm not fully convinced yet).

systematical · on July 25, 2020

I'm looking forward to the article about the problems DOMA introduced for them and what they came up with next. Wonder if we will see a full circle back to monoliths.

I know I'm being snarky and I have used micro-services myself, but only when it was smacking me in the face as the best tool for the job. Is the fools-gold rush still on to do everything as a micro-service from the get-go?

Side note, I can't wait to hear my boss use "DOMA" in a meeting in the coming months. FML.

staticassertion · on July 25, 2020

I'm a fool who has done microservices first and is very happy about it. Just want to throw that out there into the sea of negativity I see towards microservices.

random_kris · on July 25, 2020

I was hired and was told I'll be building ONE of the microservises for their upcoming kubernetes platform. A year later I have written 5 microservises and I am responsible for managing all 5 of them. It is hell.

serhatozgel · on July 25, 2020

We broke a monolith to mini-services 3 years ago. We went from 3 deployments (dev, stg, prod) to over 200. I think we've gone a bit too far and will cut down a bit but could not be happier overall. The move allowed the business to massively scale.

pjmlp · on July 25, 2020

Sun RPC marketing message, "The network is the computer".

So yeah, we keep going at this, and then people discover that monoliths written in a correct modular way, with libraries, happen to be easier to debug and reason about without a network in the middle.

Main problem seems to be that not many developers bother to read about modular programming, large scale development (like Lakos books) and what features their language of choice offers for such endeavours.

eternalban · on July 25, 2020

Are you sure that was just a marketing line for "RPC"? I thought the scope was broader.

IMO Micro-Services were addressing the skill gap in designing comprehensive schemas, not so much the object layer between user and data. So not modular "programming" but rather "modular design".

pjmlp · on July 25, 2020

Might have been, I just remember it from being printed somewhere on the page on the programming manuals.

yjftsjthsd-h · on July 25, 2020

Alright, you've piqued my interest; any pointers for where to learn this stuff, or should I just Google all those terms?

pjmlp · on July 25, 2020

Here are some pointers:

"Large-Scale C++ Software Design"

https://www.amazon.com/Large-Scale-Software-Design-John-Lako...

Although oriented towards C++, many architecture tips apply to other languages as well.

John Lakos is in the process of writing updated versions of the book.

"Large-Scale C++ Volume I: Process and Architecture"

https://www.amazon.com/Large-Scale-Architecture-Addison-Wesl...

"Large-Scale C++ Volume II: Design and Implementation"

https://www.amazon.com/Large-Scale-Implementation-Addison-We...

Then going back into the old days, you have

"Software Engineering in Modula 2: An Object Oriented Approach"

https://www.amazon.de/-/en/Jill-Hewitt/dp/0333515188

"Data Structures and Program Design in Modula-2"

https://www.amazon.de/Larry-R-Nyhoff/dp/0023886218/ref=sr_1_...

"Code Complete: A Practical Handbook of Software Construction"

https://www.amazon.com/dp/0735619670/ref=sr_1_1

"AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis"

https://www.amazon.com/AntiPatterns-William-J-Brown/dp/04711...

"Component Software: Beyond Object-Oriented Programming"

https://www.amazon.com/Component-Software-Object-Oriented-Pr...

"Use Cases Combined With Booch/Omt/Uml: Process and Products"

https://www.amazon.com/Use-Cases-Combined-Booch-Omt/dp/01372...

Just some pointers to get you started.

playing_colours · on July 25, 2020

Can you recommend amytging to read? Lakos seems to be C++ oriented.

pjmlp · on July 25, 2020

It is, but many of the ideas apply to other languages as well.

I have posted some links to books on sibling comment.

javajosh · on July 25, 2020

>Is the fools-gold rush still on to do everything as a micro-service from the get-go?

Yes.

rumanator · on July 25, 2020

> Wonder if we will see a full circle back to monoliths.

For global-scale web applications? Obviously you won't. High-availability, low latency, resilience, scalability, performance. You don't get any of that by running your app on a single box. That ship has sailed two or three decades ago. Physics establishes all the limits, not software architects.

Distributed system critics, where they fixate on trendy microservices architectures or poopoo other suggestions like DOMA, should take a step back and look at themselves and what they are actually complaining about. Yes, a solution with no moving parts is simpler than a solution with some moving parts. But have you really noticed what problems are being solved by adding these pieces?

inopinatus · on July 25, 2020

Monoliths don't run on a single box. My entire current business runs on a single monolithic application, and it's running across multiple AZs with five different instance roles (consumer site, API endpoint, admin site, background jobs, and reporting), sized by memory demand and contention efficiency, then scaled horizontally according to workload demand.

They are all, however, running exactly the same code, just in different configuration. I'd say it's roughly speaking 80% core libraries and the remaining 20% varies by role.

I have a command-line control & observation tool that comes in bin/ of the same repository, and it is again wrapped around the same code besides.

That's the modern monolith in production.

mytailorisrich · on July 25, 2020

A monolith does run on a single box. If you split it in order to run on several boxes you have by definition turned your monolith into a distributed system.

inopinatus · on July 25, 2020

Deployment architecture does not define monolithic.

The empirical example is that, say, Shopify and Basecamp both described their applications as monolithic.

However, there's a more comprehensive demonstration, in that we can't define "box" without contradiction. When you consider the many shells of virtualisation we use, the fact that any web application is by definition accessed over a network (and therefore distributed), and the internal architecture of a modern server, which is practically a distributed federation, or even the coordination between threads in a single process, it leads inevitably to contradiction (if you're careful) or just a messy quagmire of conflicting definitions.

The final nail in this dichotomous coffin is that the converse is also untrue, since any microservice-based application can be deployed on one box. Whichever way you look through the scope, deployment model turns out orthogonal to the taxonomy of software architecture.

mytailorisrich · on July 25, 2020

There is no conflicting definition and I'm not quite sure what your reply is about.

Quite obviously the point is not whether a microservice-based app can be deployed on a single box. It obviously can.

The point is that you split your app in order to deploy different pieces to different boxes then you have a distributed system and no longer a monolith.

inopinatus · on July 25, 2020

My point is that this definition has nothing to do with software, and in fact would mean that the last monolithic system was a Burroughs mainframe ca. 1960. This either invalidates the definition reductio ad absurdum, or confirms it but renders the term basically unusable.

mytailorisrich · on July 26, 2020

This has everything to do with software and is not about any deployment architecture.

A monolith approach means a single program. This is a software architecture approach and implies by definition that the whole app runs as one in a single box.

I don't understand the point of arguing.

inopinatus · on July 26, 2020

Aside from my desktop calculator there is no computing device or application in operation today that meets the definition of a single program on a single box.

There is indeed no point in arguing about that.

And I’m not even sure about the calculator.

mytailorisrich · on July 26, 2020

It is quite clear what a monolith approach means. It is well defined. If you build a webapp as, e.g. a Ruby on Rails app you have for practical purposes the whole of your app in a single program. You cannot have a few functions running here and other functions running there. It's all or nothing in the same place. This is not a judgement on the merits of different approach, just a simple statement of fact and I am surprised that this should trigger so much nitpicking and sarcastic put-downs.

inopinatus · on July 26, 2020

It is a Ruby on Rails app.

rumanator · on July 25, 2020

It seems you should revise your concept of monolith because your description is anything but it.

I mean, you yourself talk about "different instance roles".

You should pay attention to your own claims: if you have a distributed deployment comprised of different nodes, and you have specialized nodes that you yourself state that are ran to handle limited and very specific responsibilities, then just because you decided, for any reason that only you can think about, to bundle everything in a single project... That doesn't make it a monolith, does it?

Just to be absolutely clear, "monolith" is not a reference to how you chose to organize your source tree. Monolith is a software architecture concept that defines how your whole application is organized and deployed. A distributed system comprised of multiple specialized processeses running independently is not a monolith, even if you somehow believed it was a good idea to pick which role you run through configuration.

inopinatus · on July 25, 2020

> you should

> you yourself

> You should pay attention

> you decided

> you yourself state

> even if you somehow believed it was a good idea

This isn't language anyone should respond to, and not merely because it's staking out a fine example of the No True Scotsman fallacy.

> you yourself state that are ran to handle limited and very specific responsibilities

I didn't.

> Just to be absolutely clear, "monolith" is not a reference to how you chose to organize your source tree

Again, that's a straw man - I never said it was. Although I can certainly see how someone who was absolutely determined to make an unnecessarily bitter remonstration as personal as possible might - through either branch of Hanlon's razor - misconstrue the words "the same repository" adversarially for the purposes of their ego trip.

It's a monolithic application because any of the instances could perform any of the roles, and they're all running exactly the same code. They're distinguished in production for the purposes of operational sanity, because only a flaming idiot would, say, run reporting workloads on the API host.

But when I stand up a demo / showcase environment, for example, it has exactly one instance that does everything, and we can (and do) develop with the whole thing running single process on our laptops.

I shouldn't need to clarify any of this, because the point being made was a rebuttal to the "single box" thesis, not whether I met some gatekeeper's opinion about my standing to discuss the topic.

mytailorisrich · on July 25, 2020

Absolutely. Having a single binary that runs different services depending on the configuration and is deployed as such creates a distributed system, not a "monolith". I am not clear about the advantages of doing that.

pjmlp · on July 25, 2020

Yes, doing distributed systems programming since around 1996.

List of stacks I have used in some form since then, raw TCP/IP for in-house RPC protocol, SUN RPC, RMI, COM/DCOM, XML-RPC, SOAP, CORBA, REST, WCF and apparently gRPC is the new fashion.

At the same time, I also done modular development with teams responsible for modules, where the language features for creating modules, defining interfaces, and use binary dependencies are taken into use.

What I usually see with most "distributed systems" is that they are used as a physical solution for teams that never written a proper module in their life.

If monoliths with total lack of modularity are hard to debug, spaghetti network calls are even less fun.

kopos · on July 25, 2020

Feels like a lot of old knowledge was discounted / ignored and the mistakes are being rediscovered.

Feels a bit like the Hexagonal Architecture being rediscovered.

Micro service with half life of 1.5 years? Does that mean that enough planning is not being done? Or leadership failure at software planning level?

Collaboration between teams at scale is very hard but that is what the leadership layer is for - to collaborate more not build more micro services.

sixdimensional · on July 25, 2020

Or how about plain old onion architecture... even older than hexagonal.

This is literally scaled/distributed domain-driven design (DDD).

I have felt strongly that folks got so caught up in the hype of yet another new thing they forgot how to extend what came before.

It feels like a reinvention of existing ideas at larger scale, and a pattern we keep repeating.

I'm not complaining, of course - new things are possible and being learned through this innovation. I do feel we should be more careful on the cutting edge, to see how it relates to where we came from.

Put another way, reinventing the wheel is not pointless if you come out with a better thing, or better wheel. But don't forget what was good about the previous wheel before you throw it out?

staticassertion · on July 25, 2020

I'm confused. You're saying that people are reinventing existing ideas at larger scale, but you also are saying that people don't know how to extend what came before? Those sound like the same exact thing.

sixdimensional · on July 25, 2020

You're not confused, but there is a nuance. I'm trying to walk a fine line of not criticizing them for discovering this late into their process (maybe they knew all along but were busy inventing), but also questioning why they couldn't see this sooner. Trying to get to the heart of what took this innovation/recognition/learning to happen.

I am making an observation - when each new "fad" or "hype cycle" tech starts, it seems as though the pattern knowledge of what came before is discarded, or, disregarded as "legacy" or possibly even just forgotten. It feels like a knowledge transfer is missing. It would be terrible if, we, as an industry aren't passing down knowledge and reinventing hard won pattern discoveries efficiently.

Did this pattern Uber discovered come from studying onion architecture, DDD, etc first, finding the limitations, and then scaling them?

Or did this arise from throwing away everything that came before (or not knowing about it), forging an undiscovered path, and then rediscovering the old patterns could be applied? If the latter, what can be learned to make this process of discovery and linkage to existing patterns more efficient?

I think this article shows innovation is tricky, or, the risk (and potential reward) at the bleeding edge. Leaving behind design constraints of what came before might be necessary.

Maybe I'm trying to say, as an industry we need to balance exploitation of previous knowledge and our attitudes about how we feel about "legacy", with the unquenchable thirst for the next new innovation?

pjmlp · on July 25, 2020

It means CV driven development for selling conference talks, blog posts,....

tannhaeuser · on July 25, 2020

I have to agree. Articles about software architecture, figuratively speaking, read like the result of playing connect-the-dots using the buzzwords du jour, as thrown into the mix by cloud providers to advertise solutions to problems no one is having. It's not an entirely new phenomenon, but consumerism, big media dynamics, advertising to clueless decision makers, and self-fulfilling resume padding in IT seems to be the norm this decade.

SergeAx · on July 25, 2020

I felt bad for Uber engineering force while reading whole article. They swamped themselves into that microservices hell due to pressure for teams velocity and TTM, and created enormous tech debt in the process. Now they are trying to patch it up with ridiculous means like putting JSON inside of protobuf messages. I really hope this article could be a warning for management of growing startups for what could happen if you squeeze features out of your team, but managers usually speed-reading those articles just to order "do like Uber did".

harel · on July 25, 2020

I'm not entirely sure why this is sold as innovative new approach. Over the last few years, I've seen and developed micro services that are indeed cut down to specific functionality, and I've done ones where the domain was the scope. The latter has proven to produce bigger services, but a more maintainable ecosystem. Also, the division for micro services should also be dictated by the context of the business, and not just the current whims of the devs.

xcskier56 · on July 25, 2020

While I work on much smaller codebases and so don’t have any particular insight into how DOMA will work for such large organizations, the piece that I really liked here is how the separate their different general layers of services.

The key to any service being usable outside of the exact context it was first written in is to ensure no product specific business logic is added to it.

By splitting out infrastructure from general business from product specific services, you can do a much better job of understanding, and therefore controlling where where product specific logic is allowed. This in turn will make your lower level services far less coupled to the exact context they are first used in.

For smaller orgs, this is by far the more useful information, rather than how to deal with 2200 microservices.

levischoen · on July 25, 2020

Yup. So micro services of micro services. Clearly defined interfaces. Single entry point for other services/domains to interact with. seperation of concerns. The fundamental law will always be true : https://en.m.wikipedia.org/wiki/Fundamental_theorem_of_softw...

dkersten · on July 25, 2020

Somewhat off topic but maybe someone can enlighten me:

> Uber has grown to around 2,200 critical microservices

Even thinking about the very largest systems I’ve worked on, I can’t think of what could possibly be split into 2000 separate individual services. What are these thousands of microservices and how micro are they?

drol3 · on July 25, 2020

I am working for a company that has moved from 3 to 15 backend developers and 25 to 50 services over the last year.

The amount of services is really quite meaninglessness compared to the amount of developers. As with anything the efficiency good down as the amount of people goes up. There is no way that 2000 people can agree on anything. So most likely there is lots and lots of overlap.

You could probably refactor some of this duplication out, but by the time you would be done new duplication would have emerged. Keeping many people in sync is just difficult :)

Anyway the point is that 2200 services is meaningless without telling how many employees are working on said services :)

daxfohl · on July 25, 2020

For every important service there is the current one, the old one that is still in use, the previous previous version that one dumb team still hasn't migrated from, the next gen prototype, the service that deploys the service, the service that monitors it, the data analysis tool, data migration services, the intern project, several upstream services that continuously test the integration points, blah blah blah.

Then you look at the verticals and there's maps, payments, analytics, hosting, marketing, security, identity, partnerships, third party integrations, and a million other things you don't think about.

It's pretty easy to get into those numbers if you want zero downtime and code your own verticals, and have the resources to do it.

jmchuster · on July 25, 2020

Hmm, would it be easier to imagine PayPal or Stripe requiring a lot of services? Given its volume of payments, Uber has basically had to re-implement PayPal internally, custom-tailored to its own flow. Companies at this size/scale basically contain multiple whole companies within them. Think of like all the SAAS services that your company relies on, and then imagine that your company now owns all of them and all the services that come with them.

clintonb · on July 25, 2020

Uber uses Stripe and PayPal to process payments. They don’t need to reimplement either, and doing so would be a bit of a waste.

It’s worth pointing out that not all of the 2200 services may be user-facing. Some may be internal, such as admin tooling or CI services. That said, 2200 seems like a lot!

jmchuster · on July 25, 2020

Sorry, I think you are a bit mistaken about how payment platforms work. Stripe and PayPal provide an interface to a country's banks. In exchange for abstracting away the underlying bank infrastructure, you pay them a fee for every transaction. Stripe and PayPal support a limited number of countries and a subset of the bank's functionality. So if you want to interact with any new countries or access any non-supported financial instruments, you'll have to directly integrate with the banks and implement those yourselves. To give you a sense of scale, Uber's payments volume is like 10% of Stripe's.

wpnx · on July 25, 2020

Curious; where did you get the 10% number from

jmchuster · on July 25, 2020

Here's an estimate of Stripe's yearly transaction volume at $200 billion https://www.forbes.com/sites/christianowens/2019/09/21/strip...

Here's Uber's announced results of $18 billion in gross bookings https://investor.uber.com/news-events/news/press-release-det...

Round that to significant figures and you get 10%.

nevir · on July 25, 2020

I'd be curious how that compares to Amazon's count of services (my guess is 10k+)

beastcoast · on July 25, 2020

I’d say we’re closer to an order of magnitude larger When you consider all the different businesses and domains. My org alone has about 1000 services. But we have massive scale and the employees to support that number of services. You also won’t find one architecture; it runs the gamut from lambda to REST/RPC to pubsub, event bus and workflow architectures. Typically new orgs start out with a few larger services that get broken out over time, as the complexity and number of employees continues to grow.

jeswin · on July 25, 2020

I don't find this surprising at all. Uber operates across 5 dozen countries (or more) each with different regulations and offerings.

For instance, in India you get autorickshaws (called "Autos" sometimes) in addition to cabs as a seperate option. Autos have different rules, billing, driver compliance and safety standards. There are probably a bunch of services around Autos.

Similarly, safety regulatons differ in each country, and in some cases teams are forced to act quickly. Having them in separate microservices again makes sense. Same goes for offers, eats etc.

2000 services is not inconceivable for a company operating across the world.

Ozzie_osman · on July 25, 2020

Not at Uber, but I've heard a service there can be as small as a database table or two with some CRUD. Or a lambda-style function or two that provides some simple logic when queried or aggregates from a couple other services.

Not saying that's a good way to build systems but it's definitely one way.

dkersten · on July 25, 2020

Who has 2000 database tables though? Or 2000 lambda functions for that matter.

jessaustin · on July 25, 2020

Any actual business that's been around a few years and enforces 3NF is going to have close to 500 rows in the DB. Stretch that business to multiple continents and service types and one would expect 2000.

jessaustin · on July 25, 2020

Haha of course I should have said "tables" not "rows".

mekael · on July 25, 2020

Multi-line insurance systems generally start at 2000 tables.

Heck, simple auto insurance systems with multiple products in a single large state can hit 5k really easy.

coryodaniel · on July 25, 2020

pico services.

eikenberry · on July 25, 2020

Remember that those aren't 2200 separate things going on, but 2200 instances of the services. I'm sure that includes very large clusters running many copies of the same service.

wikibob · on July 25, 2020

That is an incorrect interpretation.

I assure you there are /significantly/ more than 2200 instances of the services running.

They do indeed mean 2200 separate deployable services.

exnuber · on July 25, 2020

They say they draw inspiration from DDD and CQRS, yet seem to miss one crucial factor. And no, I'm not talking about bounded contexts. They still focus in re-use of commodity rather than addressing why their microservice approach didn't work. Their strategy for defining how functionality moves down the stack is not bad, but it's not anchored towards the right thing. It's the consistency boundary (or aggregate root if you will). It's the circle you draw around your business rules and say "this group needs to remain consistently enforced". This defines your unit of scale, not re-use. So while they are headed in the right direction, I fear that they are still missing a fundamental piece of information to steer on.

I was employed by Uber (I quit after a couple months), and the idea's that they now hint towards were largely rejected by the engineers, and that was less than a year ago. Uber is just in the position to throw a large sum of money at making wrong decisions and getting away with it, because it's not their money. It's VC funny money.

DISCLAIMER: Yes, I did indeed create this account to be able to reply anonymously.

vmurthy · on July 25, 2020

I recently read Domain Modelling made Functional [0] and one of the under-recognised benefits of using Domain Driven or oriented designs is that (if done right), business workflows and processes become well explained for developers and minimises surprises from business closer to release dates. For e.g., an order processing system will have lesser assumptions by developers and tech leads because someone with enough knowledge of the business process from order processing will be able to guide you.

[0]https://pragprog.com/titles/swdddf/

nfoz · on July 25, 2020

I don't like that they define microservices as components communicating via RPC, because this precludes other models like pub-sub.

ianamartin · on July 25, 2020

I despise Uber as a company, but I've always loved them as a technology group ever since I used to go to their meetups in NYC. You can always count on Uber to throw a great party and also come up with the worst possible technology.

If you want to know how not to do things, Uber is a very good place to look.

hinkley · on July 27, 2020

And all the people who worked there will consider the fact that they did 1) something and 2) were successful means that that 'something' had anything to do with it, and therefore we should keep doing it.

Not that this is unique to Uber, mind. I just described half of my coworkers too.

coryodaniel · on July 25, 2020

I’m honestly not trying to be snarky here but I didn’t know there was any other way to design microservices (with intention) that weren’t domain oriented.

timclark · on July 25, 2020

I’ve seen one service per database table, and have had architects seriously think that a table equals a bounded context. Then they try to work out how to do two-phase commit across micro services.

coryodaniel · on July 25, 2020

Oh that. I’ve had to merge services like that.

People are always grabbing nouns and thinking they’re microservices.

fma · on July 25, 2020

I'm on the same boat.

"Previously product teams would have to call numerous downstream services to leverage a domain; they now have to call just one....Furthermore, we were able to classify 2200 microservices into 70 domains."

So they went from 2200 to 70 microservices? One extreme to the other. The answer is somewhere in the middle.

sdan · on July 24, 2020

Basically clumping microservices

deltaeerie · on July 25, 2020

I believe this is only the first point. It's also establishing a hierarchy of these "clumps" (layers), strict APIs representing a single clump for other clumps to consume (gateways), and a pattern for clumps to extend functionality of other clumps without polluting their models (extension architecture).

keyraycheck · on July 25, 2020

> as Uber grew from 10s to 100s of engineers with multiple teams owning pieces of the tech stack, the monolithic architecture tied the fate of teams together and made it difficult to operate independently.

I think this is key sentence. Microservices does not make a lot of sense for teams of 10s, while are a great tool for team of 100s.

recursivedoubts · on July 24, 2020

Seems to me that big progress will be made when we can decouple system boundaries from network calls.

bob1029 · on July 25, 2020

I still think the approach of baking the entire enterprise into 1 gigantic binary is the best. Stack traces are so much easier to work with compared to distributed logs and side effects.

We are already at a point where you could hypothetically do this for a reasonably-sized organization. A 64 core CPU can support a huge number of clients. Stuff like .NET Core scales really well if you want to build something complex like this. One big binary that occupies an entire physical host is an extremely compelling development model. Literally everything becomes a direct method invocation. You can also have type enforcement and atomic releases for the entire enterprise. Also makes a monorepo an obvious choice for source code management.

jzoch · on July 25, 2020

This falls apart when you care a lot about availability and therefore want a particular service to be stateless. Then you add some feature that needs a tiny bit of state and suddenly you can't scale up and down as easily and need to start paying attention. This rolls forward into a big ball of mud - a monolith can be tricky too at scale

bob1029 · on July 25, 2020

One trick is to leverage the fact that you have 1 type domain, and to develop a common persistence abstraction for the entire enterprise. Then, all of your business entities can be pushed through it for a consolidated replication flow. This can be [a]synchronously replicated to additional node(s) as required, potentially with replication rulesets defined in application code as well.

It sounds like a complex monster until you build it one time. Then you are basically done. The leverage you get when you have 1 way to do everything is extremely powerful. I do recognize there are scenarios where you cant force one persistence abstraction on all use cases, but there's no reason you couldn't have a TimeSeriesEntity (keyed by time) in addition to a typical BusinessEntity (keyed by a unique integer). Both could have unique replication implementations, but it would still be standardized and all nodes would be speaking the same protocol because they all derive from the same source.

The monolith only remains a monolith in operational terms until the engineering team develops some imagination.

jen20 · on July 25, 2020

Indeed - this is the premise of the "4+1 View Model of Software Architecture" paper from 1995 [1].

[1]: https://www.cs.ubc.ca/~gregor/teaching/papers/4+1view-archit...

Twisol · on July 25, 2020

Thanks for the paper. Where do you find that it discusses decoupling the concepts of network calls and system boundaries?

I must be honest: aside from the context in which it was linked, I came away terribly unimpressed by this paper. I found myself disagreeing on a number of points. Having only spent half an hour on it, I can't claim to have a major problem with it, but I wouldn't couch my architectural work against it.

kerng · on July 25, 2020

Very interesting and seems like a logical next step with things get a bit too messy with micro services.

One thing that was not touched on - how is security done in DOMA?

* Is auth happening at domain gateways or at each service?

* Similarly for encryption, does it terminate at Domain GW or at each service?

alpmeow · on Aug 4, 2020

Hopefully both

stevenalowe · on July 25, 2020

Nice to see Uber rediscovering domain-driven design principles.

deterministic · on July 27, 2020

If you can't make a monolith work, then you will fail even harder trying to make a Microservice Architecture work.

pgt · on July 25, 2020

Microservices are about scaling teams, not tech.

sroussey · on July 25, 2020

Uber thinks they only have that many services? Yikes, double, or triple the number... several years ago. Maybe they have fewer now?