Decomposing a Sitecore monolith to microservices - the experience so far and what we’ve learnt.

Over a year ago we embarked on a journey to re-architect a large Sitecore monolith.

Before we debate the definition of a microservice I should clarify - we are actually decomposing to small(ish) vertical web applications bounded by cohesive business functionality. That title didn’t have quite the same ring to it!

We look to utilise the benefits of a microservice architecture but do accept some constraints.

Why did we do it?

Our Sitecore monolith has grown large and cumbersome over many years of optimising for time to market rather than the long term health of the platform.

It’s easy to jump to microservices and not all monoliths are a bad design choice, however ours has grown to exhibit two key constraints.

The first is how well the application supports our ability to deliver new functionality to customers. We must release everything together as one large unit of deployment. This process is slow and therefore we tend to release infrequently.
The other constraint is development scale. As teams write more code the solution becomes more difficult to maintain, and cost of change gradually increases. Importantly we limit the number of teams who are able to work in parallel.

What were we looking to achieve?

We looked to solve these problems by unlocking two main capabilities of the platform:

Move towards smaller, quicker, incremental isolated units of deployment.
Partition the platform so that cross functional teams are able to work and release code independently of each other.

The distributed monolith anti-pattern

Team autonomy is crucial in a distributed architecture. A release should have no impact on other teams. If we need to release Application B due to updates to Application A we expose ourselves to the drawbacks of the original monolith but also have all the challenges of a more distributed architecture.

How quickly do we start realising our objectives?

We use new business initiatives to determine which areas of the application to partition. In deciding which area of the platform to focus on we then determine how to partition, i.e. how big should each application be, which business functionality should each cover.

Usually we would partition by bounded context unless there was a technical capability we were looking to leverage for a particular application. For example an advantage of smaller units is that each application is independently scalable and fault tolerant. We might decide high volume areas of the site should be isolated. It can also be prudent to consider how often different areas of the site change, and which areas change together.

It is not necessary to decompose the whole monolith to start realising value from the new architecture. After releasing one or two applications in production we were able to assign ownership to individual teams. Applications are able be released quickly and independently of the monolith. Teams are able to make incremental updates to those applications.

In starting to realise these outcomes we allow ourselves time to iterate and learn. The architecture can evolve and we can improve based on past learnings.

We are able to further decompose (a now smaller monolith) when the business is ready, again using business initiatives as a catalyst to break out new applications.

What does this look like?

There are a couple of approaches to decomposition, we opted for a common technique known as the strangler pattern.

Effectively the original monolith remains unchanged at first. We build new applications in separate codebases and utilise a reverse proxy (CDN in our case) to redirect requests to the new application. The proxy also provides a convenient mechanism for rollback and risk management via traffic splitting.

Each vertical slice of functionality has two key components:

First a micro frontend utilising a Headless CMS (Sitecore JSS, which was the enabler for this architecture).
Each Micro frontend is served by a lightweight Restful API (effectively a BFF variation) and proxies commands and queries into backend service layers (not maintained by our teams).

Continuous deployment and automation are the backbone of this architecture.

architecture then architecture now

Notice those unpleasant red lines? Dependencies on the monolith are not ideal, but we try to treat Sitecore as a service (headless CMS) in this model. This is one of the constraints vs true microservices and something we will discuss next.

What we learnt

No architecture is perfect, we make a compromise somewhere and hope to not close the door on modifying the direction based on what we learn.

In this scenario we have both the challenge which comes with a distributed platform whilst also working around some of the constraints that Sitecore forces upon us, most notably the shared database. In a distributed architecture a shared database is usually a big red flag as it couples applications together. True independence and isolation are much more difficult to achieve.

Sitecore is the constraint here, however there are short and long term techniques we can leverage to solve this problem.

Short term we did a few things to enable release independence:

Ensure all content updates are backwards compatible.
Partitioned the content so each app owned an area of the Sitecore tree. We have an automated content deployment process and run automated integration tests on a staging environment.
To expose new data to the frontend applications via JSS’s Layout Service API you usually need to write code in your Sitecore solution (the monolith in the visualisation above). This obviously creates a release dependency on the monolith. To solve this we created an dynamic content based resolver more powerful and flexible than those Sitecore provides out of the box. There are also options to leverage GraphQL.

Longer term we have other options too.

Applications which don’t need to be content editable (the content changes infrequently) are able to consume their content from another source if it makes sense to simplify in this manner. We can also look to a future Sitecore SaaS offering (potentially looking to move further towards a more MACH esque architecture), utilise techniques such as Static Site Generation, or a content cache close to the application. Any techniques we utilise can vary per app based on the context and business capability exposed via that application.

Next architecture

Other challenges

There are truly shared areas of concern in both API and Frontends such as authorisation which need consideration.

There are challenges in ensuring areas of the code and patterns remain consistent where you want consistency between codebases and therefore between teams. In a true microservice ecosystem autonomy would usually include teams having ownership of technology choices.

The frontend architecture becomes more complex. I will likely follow up with another post on frontend architecture as it can be easy to become stuck. There will be code you will want to package and version to share between apps which can be a challenge when a primary concern is ensuring applications remain loosely coupled. There will also be UI that teams will want to reuse between apps.

It’s all too easy to abstract too early and bake the business logic of multiple applications into a UI component. Where do you draw the line between DRY and prudent duplication? One approach can be to utilise atomic design principles and treat shared UI as ‘dumb’ Lego which can be consumed by applications who then bake in their own domain logic within the application.

Other areas of concern are source control, continuous integration and deployment (A container based deployment strategy can be extremely powerful). There is value in utilising the Monorepo pattern but also reasons why this approach might not be right for your context.

If you got this far I hope some of this was useful, the journey for this architecture is far from over.

Menu