Rethinking Market Data Monitoring: Enhanced Operational Security and Cost Reduction with Prometheus at a Swiss Bank

About the Client

Anonymous Swiss Client: Swiss Bank

Challenge

The client faced the complex challenge of monitoring a distributed and complex system. The market data platform consisted of approximately 30 components, including a central component based on the Solace Event Broker and third-party software. This led to delays in error detection and potential data loss.

Distributed Architecture: The market data platform was spread across multiple Red Hat OpenShift clusters and network zones.
Complexity: The overall system consisted of around 30 individual components, including third-party software.
Lack of Monitoring: A comprehensive, centralized monitoring system did not exist. Monitoring was primarily assigned to application management.

Solution Approach

To address these challenges, we adopted a standards-based approach that seamlessly integrated into the client’s existing OpenShift environment. Since Prometheus-based monitoring was already used on the OpenShift platform, no additional licensing costs were incurred, and the open solution provided maximum flexibility.

OpenShift Monitoring Stack (Prometheus): Built on the OpenShift-integrated monitoring stack, which includes Prometheus, Alertmanager, and Grafana, we established a centralized monitoring solution.
Helm & ArgoCD: The deployment of all monitoring components was automated and reproducible via ArgoCD and Helm.
Kubernetes & Application Monitoring: In addition to generic Kubernetes rules for infrastructure monitoring (CPU, memory, pod status, etc.), specific service monitors and Prometheus rules were defined for each individual component of the market data platform.
Business Monitoring: In addition to technical monitoring, business metrics were also integrated into Prometheus. This enabled the monitoring of business processes and alerts in case of business errors.
AlertmanagerConfig: A complex Alertmanager configuration was implemented to define detailed alerting rules and mute intervals. A total of around 30 Prometheus rules were established.

Results

The implemented monitoring solution led to measurable improvements and qualitative benefits for the client:

Increased Operational Security: Comprehensive monitoring was a critical factor in the successful introduction of the market data platform and ensured its operational security.
Standardization: By leveraging the OpenShift Monitoring Stack as the foundation, the solution is easily extendable and can serve as a blueprint for future monitoring requirements.
Central Monitoring Platform: The solution enables global integration into the existing monitoring platform.
End-to-End System Monitoring: The monitoring platform now oversees all technical and business components.
Cost Reduction: The implementation and operational costs of the monitoring solution were significantly reduced.

Client Statement

Although we are not allowed to use a formal quote, this LinkedIn post humorously expressed their satisfaction: «Du hast uns sprichwörtlich den A…. gerettet. Top Job!»

Conclusion & Lessons Learned

This project confirmed that the OpenShift Monitoring Stack provides an excellent foundation for application monitoring. Key takeaways include:

OpenShift Monitoring as a Foundation: The OpenShift Monitoring Stack offers a solid base for comprehensive application monitoring in OpenShift environments.
Domain Expertise for Business Monitoring: Integrating business metrics requires deep domain knowledge and close collaboration with business departments.
Complexity of AlertmanagerConfig: Extensive Alertmanager configurations can quickly become complex and require expertise in maintenance and rule management.
Avoiding Alert Fatigue: An iterative approach to alert configuration is essential. Alerts must be actionable, intelligent, and urgent to avoid being ignored.
Prometheus Monitoring Stack Delivers: The Prometheus Monitoring Stack has proven to be a powerful and flexible solution.

Do you face similar monitoring challenges? Contact us for a non-binding conversation: