BankCo is a leading digital-first financial institution transforming the way people interact with money. BankCo is pioneering mobile-first banking with simple, transparent products designed to eliminate unnecessary fees and put customers first. Today, BankCo serves millions of customers across multiple countries, offering a full suite of digital financial solutions powered by a proprietary cloud-based platform. BankCo's technology stack is built on modern microservices and advanced data infrastructure, enabling rapid innovation, seamless scalability, and unmatched reliability.
Finance
10,000 employees
BankCo (or “the Customer”) is one of the world’s largest digital consumer banks. It services millions of customers, offering core banking services, peer-to-peer money transfer, loan and credit card products, and other related services.
The Problem
BankCo, a digital native company, has built their Kubernetes-based application stack entirely on AWS. This is a classic cloud-native heavy microservices architecture environment. Thousands of developers push deployments into an environment with thousands of microservices. The result: thousands of alerts per day, several hundred terabytes of logs generated per day and millions of unique metrics to monitor across OpsGenie, Splunk and Prometheus, respectively, making debugging any production incident or anomaly a time-consuming, tedious process, to say the least.
The Solution
Flip AI engaged with BankCo in a proof-of-concept (or “POC”) in September 2023, with the intent of using our generative AI powered observability intelligence platform. We started with onboarding a single business unit into production as soon as they could provide us with access to the relevant observability data. Within one month of being granted access to all necessary data, we deployed Flip into production for a subset of developers. Flip saw organic growth in BankCo as it was helping developers close high severity events in minutes. The stated success criteria of the POC was to reduce MTTR for incidents that Flip would automatically debug, as well as generate positive interest and feedback from developers that were using the tool. It was clear what BankCo wanted to do next: become a licensed enterprise customer and roll Flip out to their entire developer force. Within two weeks of signing the contract, we onboarded developers from a second business unit to start the process of scaling out.
Enterprise Rollout
Having unequivocally experienced the ROI of Flip AI, BankCo was ready for an enterprise rollout. This not only meant the systematic onboarding of the remaining business units and their respective applications, but also new capabilities, including but not limited to: expanded debugging capabilities of our LLM and workflow integrations. Since the start of BankCo’s enterprise agreement, Flip AI has continued to innovate on behalf of our customer.
- Custom Database Debugging: The Customer uses an open source in-memory custom database that they have forked, and it is a critical horizontal technical component that all applications are dependent on. Flip AI was able to fine-tune our LLMs within a matter of weeks and successfully debug these incidents.
- Asynchronous Event-Driven Architecture Debugging: BankCo uses a combination of both HTTP-based synchronous service architecture and Kafka-based asynchronous event-driven service architecture. While the POC was scoped only to the former, Flip AI has delivered Kafka-oriented debugging to help BankCo developers get a more complete understanding of production incidents.
- Opsgenie Workflow Integration: BankCo uses Opsgenie for alerting. During the POC, Flip AI integrated with Opsgenie to automatically debug incidents that were within scope. To complete the workflow, we implemented a reverse integration, whereby Flip AI generated RCAs are programmatically pushed back to the Opsgenie alert that triggered them in the first place. This reduced context-switching for developers, and put the RCA back into the system that triggered them.
Shaping Tomorrow
There was a recognition at BankCo that generative AI is ushering in a totally new way of“observing” and “monitoring” systems, one where the burden of interpreting logs, metrics, traces, and other incident-related data is left to machine intelligence, leaving developers to focus on the results of such automation and the ensuing remediation efforts. There was also a recognition that, as Flip AI becomes the world’s most trusted software debugger, enterprises can start to refocus their engineering investments in customer innovation, rather than being mired in the costly activity of development operations.