DBdoctor: A Fine-grained and Non-intrusive Performance Diagnosis Platform for Databases
Published in 42nd IEEE International Conference on Data Engineering, 2026
A non-intrusive performance diagnosis platform for databases that leverages eBPF for fine-grained metric collection and white-box anomaly diagnosis.
Abstract
Database performance anomalies are prevalent in large-scale deployments and often result in substantial business losses, making online diagnosis and resolution of anomalies indispensable in production environments. Existing anomaly diagnosis frameworks rely on sample-based external monitoring tools, which suffer from two critical limitations: (1) an unfavorable granularity–overhead trade-off, where low-frequency sampling misses transient anomalies while high-frequency sampling introduces significant overhead, and (2) black-box inference due to a lack of internal database context. To address these limitations, we propose DBdoctor, a non-intrusive performance diagnosis platform applicable across different database engines. DBdoctor proposes a novel event-based metric collection framework that leverages eBPF to collect fine-grained both external and internal database metrics with low overhead. These metrics are modeled as SQL temporal resource metrics and dependency graphs to enable white-box anomaly diagnosis, supporting precise root cause identification. DBdoctor has been deployed in large enterprises such as China Unicom and Hisense to manage thousands of database instances. Experimental evaluations using popular benchmarks and real-world workloads demonstrate that DBdoctor achieves higher diagnosis accuracy than existing approaches, with comparable or lower performance overhead.
