Timely localization of the root causes of gray failure is essential for maintaining the stability of the server OS. The previous intrusive gray failure localization methods usually require modifying the source code of applications, limiting their practical deployment. In this paper, we propose \textit{GrayScope}, a method for non-intrusively localizing the root causes of gray failures based on the metric data in the server OS. Its core idea is to combine expert knowledge with causal learning techniques to capture more reliable inter-metric causal relationships. It then incorporates metric correlations and anomaly degrees, aiding in identifying potential root causes of gray failures. Additionally, it infers the gray failure propagation paths between metrics, providing interpretability and enhancing operators’ efficiency in mitigating gray failures. We evaluate \textit{GrayScope}’s performance based on 1241 injected gray failure cases and 135 ones from industrial experiments in Huawei. \textit{GrayScope} achieves the $AC@5$ of 90% and interpretability accuracy of 81%, significantly outperforming popular root cause localization methods. Additionally, we have made the code publicly available to facilitate further research.