The performance and availability of the company’s applications are critical to maintain uninterrupted business processes, and even small performance issues can entail rather costly consequences. For example, in customer-facing solutions, slow system response is very likely to influence SEO and conversion rate negatively and consequently result in hindered revenue and even damaged brand loyalty. According to the Akamai research, a 100-millisecond delay in website load time can hurt conversion rates by 7 percent.
Modern applications – large, often distributed and asynchronous – are especially vulnerable to failures and slowdowns and thus require a notably holistic approach to their health and availability maintenance. As there exists little guidance for helping companies in ensuring thorough care of the application performance in large IT environments, we present the following application performance management 101.
Application performance management ≠ application monitoring
As an application management provider, we’ve often seen misinterpreting of the relatively new concept “application performance management.” Mostly, people mistake it for application monitoring and use the terms interchangeably, but application monitoring is only a way to reveal in details how the system behaves over time. Despite all of its substantive input, pure monitoring is not enough for the needs and requirements of complex modern applications and is only a supportive activity for performance management. Application performance management is a much wider term that comprises, apart from monitoring, problem resolution, problem prevention, and continuous application improvement.
The good about proper application performance doesn’t end with improved conversion rates and investing in brand loyalty. The benefits of application performance management in practice for the organization include:
Increased business efficiency
Properly administered application performance management makes badly performing business applications a thing of the past. In particular, it helps to deal with delays in overloaded processes, downtimes, and disruptions that can significantly hinder employees’ performance and almost double the time required for this or that task.
Reduced application TCO
- Minimized costs for further improvements. Application changes become less expensive due to the ability to identify and solve code integration problems before an application goes in production.
- Lowered support costs. The data received in the course of application performance management improves the efficiency of the support staff. Support engineers can address performance and availability issues faster and solve more problems over time.
- Well-planned cloud capacity. The reduced time of request processing will hardly make a big difference to your users, but it can add up to your spending significantly. Continuous application performance management will let you estimate your optimal needs in cloud capacity and come up with more effective computing resource utilization or the introduction of dynamic consumption for the clouds.
SLA monitoring and reporting
Application performance management data helps to bring transparency into collaboration with third-party vendors (be a SaaS provider or a provider of application development and management outsourcing services) and ensure that the quality of service is maintained at the level your business expects. Application performance metrics can be turned into such KPIs as average page load time, the number of services unavailability cases, and more.
Application performance management starts with anomaly detection and localization. For that, the responsible team applies a wide set of techniques.
Application component monitoring
Component monitoring implies tracking performance metrics and availability of all application tiers and components – servers, OS, services, integration components, third-party APIs, databases.
Business transaction monitoring
Business transaction monitoring involves tracking critical business transactions across the entire application infrastructure. By that, we mean ensuring the transactions are complete, their timing is acceptable, as well as identifying weak points in the request’s journey. Transaction health monitoring is particularly relevant for complex distributed transactions across internal or external systems when the message loss is crucial.
Real user monitoring
Real user monitoring (as Google Analytics) is a passive collection of data about the performance of the application services that clients can access directly. It allows getting insights about real traffic and errors on the server and in the frontend, identifying the most popular sets of functionality and differences in performance when the application is accessed from different devices, browsers or parts of the world.
For synthetic monitoring, developers create special scripts that systematically simulate user actions in the application. This allows for finding flaws in the application work before real users are affected.
Metrics can only tell that something is wrong. To indicate the real source of a problem, we turn to logs. The application performance management team can either manually scroll log data or use specific tools for log analysis (as Logstash, Graylog, Logmatic, Splunk). To keep the log processing effective and allow for advanced log analysis methods, it’s a good practice to require developers to keep the logs structured, properly described and follow standards, e.g., ISO 8601 for date and time info.
The tools used for application performance management can be roughly grouped into several segments, each having favorable and unfavorable aspects for a certain monitoring case. Two of the key differences are the origin of the tools and the way they are implemented.
Custom or off-the-shelf tools
You can either choose from the variety available on the market (AppDynamics, Stackify, Dynatrace, etc.) or use homegrown software. The latter is increasingly popular among the companies with complex and developed IT infrastructures who know exactly where their performance soft points are and want to tackle them in a more targeted way.
Agent-based or agentless tools
Agent-based tools imply that a part of the tool is installed directly on the server or service and collects insider data. It provides more detailed info about how software performs but may require the tangible number of server resources and slow down the component performance.
Agentless tools assess software state remotely. They are easier and faster to deploy but have a limited metric tracing coverage.
It’s not the tools that are critical in application performance management but the measures you take based on the data received from them. Mature performance management runs as follows:
Two stages from above deserve special mention. Alerting should address only the relevant stakeholders and concern only serious issues to be truly effective. Reporting about the detected problems, the way they were solved and the influence they had, shouldn’t be ignored. In the long term, proper reporting allows choosing a better application development path and coming up with the right decisions on the application evolution.
The main supporter of application performance management varies from company to company. This can be a responsibility of a performance engineer, a DevOps team, or a part of the responsibility for the technical support that is inherently accountable for application health and availability. A designated responsible person or a small group of stakeholders should own application performance management as a process within the company to ensure its efficiency, consistency and focused effort.
However, none of them can have full responsibility (and required skills) for the maintenance and management of the overall application performance as the sources of performance degradation can be spread across all software layers and components. They may reside in:
- Spikes in traffic.
- Slow web pages.
- Overloaded transaction / incomplete transactions.
- Queue exceeds.
- Tangled code structures.
- Slow SQL queries / too many database queries.
- Inefficient usage of an app’s memory.
- Slow or unreliable third-party entities, failure of external HTTP web service calls.
And yet the list is not complete. Thus, for application performance management success, the first big step should be bringing together all stakeholders across the application life cycle. A full application performance management team require a part-time involvement of at least:
- QA specialists.
- The operations team.
- Application administrators.
- Business analysts.
- Application architects.
- DevOps engineers.
- The monitoring team.
- Performance engineers (if any).
- Representatives from the concerned departments.
The major costs of application performance management reside in:
- Review of the current IT infrastructure state. This will include such activities as a code health check, identification of primary candidates for application performance management and creation of a backlog, initial sizing of a future monitoring solution.
- Monitoring tools development, purchase of product licenses (annual/monthly fees) and monitoring solution setup and maintenance.
- A specifically assigned monitoring team. They’ll manage the monitoring solutions, interpret the monitoring metrics for the whole application management team as well as watch the database that stores them.
Where application performance management will have the greatest impact
If you want to experience the most significant improvements and get a larger ROI from application performance management investments:
- Start with business-critical applications that directly influence the company’s revenue (critical business processes) or influence the availability of the company's services for the clients. The examples of such applications include content management systems, customer and self-service portals, ecommerce solutions and order processing modules.
- Cover high-loaded business transactions and external interfaces.
Lastly, let’s see what problems are associated with application performance management.
When opting for SaaS or PaaS, it’s very important to monitor response time, errors and availability of the cloud services (e.g., the cloud storage service). Though you can’t identify where exactly the problem is, the data collected can be used to formulate a request for a provider’s service desk and monitor SLA compliance.
Again, you don’t have access to the source code, so big changes to improve application performance are unavailable. However, application performance management is still worthwhile. At least, you’ll be able to quickly identify performance problems, detect flaws caused by recent customizations, scale up resources or optimize its database.
IoT, big data
The problem of application performance management for IoT and big data solutions is in the abundance of monitoring data. To mitigate the issue, make sure that only needed data is collected, gets combined into batches and longer intervals between transmissions are set.
Wrapping it up
In the article, we’ve discussed what application performance is and what we see as the best-practice approach to application performance management. We advocate the creation of an extended application performance management team, favoring performance management over pure performance monitoring and keeping the latter as its thoughtfully balanced constituent – without collecting unreasonable metrics, meticulous logging, and noisy alerting.