Machine learning is at the heart of many recent developments in science and technology. Unfortunately, the critical role of humans is an often-overlooked aspect in the field. If humans use machine learning classifiers explicitly as tools or deploy models within other products, there is still a vital concern: if the users do not trust a model or a prediction, they may not use it. With the development and widespread of Deep Neural Networks (DNN), AI models’ performance has been significantly improved. While in the big-data era, the increasing impact of highly developed black-box machine learning models has drawn much recognition from several communities, the interpretability of artificial intelligence has also been studied in various contexts. Several studies of personalized agents such as recommendation systems and critical decision-making have added to the importance of machine learning explanation and AI transparency for end-users. For example, as a step towards this purpose, the legal right to explanations has been set by the European Union General Data Protection Regulation (GDPR) commission. While the present state of regulations is focused entirely on user personal data protection, more algorithmic transparency, and clarification requirements from AI systems are expected to be covered. However, the interpretability of DNN is a considerable obstacle for not only end-users but also AI scientists and engineers. The demand for predictable and accountable AI also grows as tasks with higher sensitivity and social impact are more commonly entrusted to AI services.

Explainable Artificial Intelligence (XAI) systems are born with the intention to self-explain the reasoning behind system decisions and predictions for end users. The AI explanations, either on-demand explanations or in the form of model description, could help users in many ways, such as gaining safety and fairness when relying on AI decisions.

Nevertheless, XAI methods produce results with a vague precision. Hence, an explanation is a way to verify the output decision made by an AI model or algorithm. For a cancer detection model utilizing microscopic images, an explanation can mean an input pixels’ map that contribute to the model output. For a speech recognition model, an explanation might be the power spectrum information that contributed more towards the current output decision during a particular time.

However, XAI’s definitions are often generic, and might be misleading and should integrate some form of reasoning. Moreover, there is no available general metric to measure an XAI method’s accuracy until our works. Consequently, this paper proposes a new evaluation XAI metric and introduces some recent work’s evaluation metrics.

The rest of this paper is organized to provide in the following order: a background on XAI methods, related work, our proposed methods and experiments. Our main contributions are summarized as follows:

– We propose a novel evaluation metrics for common XAI methods, namely Determining the Highest-Impact Segments (DHIS). It is applied with the idea of the K-Means clustering algorithm, which divides distinct clusters of pixels based on their proximity of color in the image plane.

– Also, we introduce Intersection Over Union (IOU) metrics, which is an evaluation method producing a numerical value based on the comparison between bounding boxes highlighted by an XAI method.

– We bring out a comparative evaluation between the three most popular XAI methods comprising LIME, SHAP and CAM, where we measure the impact of regions on the model’s prediction. We also show each XAI method’s best practice parameters in a specific well-explained problem, which helps practitioners know when and why they should apply XAI methods on a model.