Performance Optimization in AEM (Part 1): Root Cause Analysis and Strategies

This article looks at the organizational causes of performance problems in enterprise Web projects. It is the first part of a series on performance optimization. In the following parts, we will discuss technical causes and measurement techniques, as well as some possible solutions.

 

How it all begins

In the beginning, the customer created the project. And the project was desolate and empty. But it didn't stay that way for long, as the smartest and most capable people from all over the world came together to help the customer build the project - and then to hold out their hands.

 

However, as with many other major projects, care was not always taken to ensure that all the talented craftsmen and craftswomen spoke the same language. In addition, the customer sometimes took the liberty of exchanging personnel here and there, which, in addition to the positive effects of innovation and change, also brings with it the negative effect of new, necessary efforts to ensure that everyone involved in the project speaks "the same language". And so - more often than one would like - "desolate and empty" ends up being a hullabaloo.

 

Don't panic. This is not about the Tower of Babel. And I'll spare myself the Bible metaphors from now on, after all, it's only about software. However, the image of the "same language" is almost obvious in the problem we are discussing here.

Dreamboat Enterprise

Enterprise software, some would say, is software with a Java version less than or equal to 8 and a front-end stack whose origins must be at least 4 years old. Less wicked tongues, like Wikipedia, define enterprise software as one that meets the needs of an enterprise rather than the needs of individual users. Even with user-oriented applications, such as web content management systems, user requirements are rarely collected and unified in an enterprise context. Rather, different views for different user groups are brought in and implemented equally by different people in charge, without the same language being spoken, or even with each other. The hierarchical structures that are common in large corporations and things that happen "for corporate policy reasons" also contribute to this. And so, more often than not, the various departments do "their thing" and prefer to ensure that "delivery is on time" before they fall behind the others.

 

But the facts that a software project is designed to last for many years and the further implications arising from the structures of personnel management and the political organization of large companies are not the only reasons that the performance of a software project declines over time. Sure, outdated technologies can be a problem. But, you might think, if all those programmers are so talented and capable, why isn't each delivery component great and good in its own right? Why aren't these various fulfilled requirements all equally maintainable, snappy, and most importantly, performant?

 

programmers_are_evil() { /* curse their frozen hearts */ };

Well, not only the evil, hierarchical big corporations, but also the developers and the designers contribute their part to the chaos. If we are talking about the same language, maybe we should also talk about talking itself - a hobby that is far too seldom cultivated in the programming community. If you don't want to talk to each other regularly, you can call as many weeklies, meta-dailies, architectural meetups or workshops as you like. In the end, everyone does what they want.

 

The factor just mentioned could be controlled. Imagine having some modern form of project organization in which some could perhaps concentrate more on the job of developing, some on that of coordinating, and some on that of communicating. But if another characteristic of programmers is added, which is as typical as it is schizophrenic, the two factors mentioned above multiply. This trait can perhaps be called "coding borderline syndrome." We either think our code is soooo great that we are firmly convinced that there must be none besides it - or we are so embarrassed about what we did "back then" (that may have been only 3 weeks ago) that we push it out of our memory as a last consequence.

The third factor that triggers the chaos in the implementation of the various program components that ultimately leads far too often to performance problems is a classic chicken-and-egg problem. The great thing about development is that you can suddenly solve things in a few lines of code that previously took hours of work and countless tools - slide rule, pen, paper, compasses, stopwatch and whatever else. And who feels motivated to write code accordingly? Sure, the lazy among us. Granted, not all programmers are lazy gamers who just hang around at home and code their pizza together. But a certain amount of comfort is - I think - part of the decision to want to learn programming.

 

Buzzword Driven Development

Let's not kid ourselves: we are in a time where marketing dominates our everyday life and the economy. We buy fizzy drinks in tall cans for 2€ - or, if we want to be a bit different, fizzy bitter tea. Banks need provocative advertising, a cool CI and fancy credit cards to attract investors - and employment offices now prefer to be called job centers.

 

Development is not immune to this either. A flood of new, cool JavaScript libraries, design concepts, database technologies, continuous integration tools, and whatever else, vie for the interest and favor of programmers on a daily basis - with cool logos, cool names, and fists waving at the competition: GraphQL and Angular (REST is dead!), React and Redux (forget jQuery!), Vue (forget React!), and so on. Add to that popular patterns and technologies: SPAs, PWAs, microservices, AI, virtualization. If you don't set clear boundaries here, but try out "a little bit of everything", you will end up with all the disadvantages of the respective technologies in addition to the advantages of all these technologies.

 

Performance as a commons

Last but not least, a small digression: a commons is a form of common property. In agriculture, for example, it is cooperative ownership away from parceled farmland. Another example of commons is roads or fish stocks.

 

It makes sense to consider the performance, for example, the maintainability of an enterprise application, as a commons. In fact, these two areas are usually considered the responsibility of feature teams. If a feature is developed by a team, this team is also responsible for maintenance. Often, however, the view of the interaction of the various parties and technologies in the overall context is missing here. While a component, CSS pattern, or JS library can be performant and maintainable on its own, if the surrounding approaches are merely antithetical enough, it can do just the opposite. And so the "tragedy of the commons" can also result from the "performance" commons: if too many owners have the (de facto) right to use the resource, no effective usage rules exist, and no one has the (de facto) right to exclude others from using it, the resource will be overused.

 

What now?

The topic of performance probably only comes to the table in most cases when the processes described above have already progressed a long way: Service providers have been replaced several times and have wildly mixed architectures, patterns and technologies - pressure from and on enterprise POs and PMs have done their part. Developers will disagree on which CSS pattern to use - and end up using three different ones. And the responsibility for the performance problems is looked for in each team individually - in vain.

 

But it wouldn't have been that hard: if you give the developers, POs, PMs and QAs a reference book, a common manifesto, to which they have to commit, the teams could be held accountable. The developer who turned a blind eye to the code review, or the PM who said "this needs to be fixed today, if in doubt we'll deliver a hotfix", could be held to performance guidelines, strict rules for CSS selectors, 3rd party JS library blacklists or whitelists. And in the optimal case, linting of frontend and backend would have prevented the worst bugs from the beginning. Whether this is even possible in enterprise projects due to the circumstances described above, I can't say. It is at least as ambitious as it is desirable.

 

Performance as a cross-cutting feature

However, if such obligatory factors are missing in the development, one must speak of architectural failures of the first hour, which can only be solved by reinterpreting the performance of an application: from a common good that everyone helps themselves to as needed to a "feature" that is equally respected by everyone and maintained by a dedicated team. Only by putting a lot of effort into first fixing the omissions, doing small quick wins and large-scale refactorings, and then continuously monitoring performance, reacting quickly, and communicating constantly can this feature be kept under control.

 

 

In part 2, we will describe how the topic of performance can be reinterpreted as a measurable factor: what does monitoring look like? What and who do we want to look at? Which KPIs are suitable for analysis? What are our basics to measure later success?

 

Part 3 is dedicated, subsequently, to the tools we can use to measure problems, competition and success. What tools do we use for analysis? How do we identify concrete problems and possible solutions? Which components of the analyses are relevant at all? And how do we communicate the successes?

 

Further articles will then discuss the concrete measures that can be taken on a technical level to celebrate the first successes: Where do the most measures usually need to be taken? How can the complex and important issue of caching be optimized? How can the load be reduced? And how can the successes be persisted?