explainer¶
- class virtualitics_sdk.assets.explainer.Explainer(model, training_data, mode, label, name, feature_names=None, output_names=None, explain_class=1, kernel_width=None, use_shap=False, use_lime=False, n_classes=2, description=None, version=0, metadata={}, seed=None, **kwargs)¶
Bases:
Asset
The explainer class aims to take in a model and a dataset and allow the user to easily make NLP and graphical explanations of the model’s behavior. This is primarily done through explaining the importance of different features for a specific instance’s prediction.
- Parameters:
model (
Model
) – A machine learning model. For a classifier, the model must have apredict_proba
function and for regression the model must have apredict
function. The data input type to the model is assumed to be the same as the input data itself.training_data (
Dataset
) – The training set that is used by the explainers. It is recommended to use the same dataset that the model was trained on. The dataset should also have been created with any necessary additional parameters to make sure that data conversion to'ordinal'
encoding is possible (especially if the dataset is one hot encoded).mode (
Union
[str
,ExplanationTypes
]) – The type of model and corresponding explanation. Must be either'classification'
or'regression'
.label (
str
) – Label forAsset
, see its documentation for more details.name (
str
) – Name forAsset
, see its documentation for more details.feature_names (
Optional
[List
[str
]]) – List of the names of features in the dataset training_data. If feature_names is None, they are inferred from the column names of training_data.output_names (
Optional
[List
[str
]]) – List of names of output of the model. For classifiers, output_names should be the names of each class output of the model’s predict_proba function. For regressors, output_names should be a singleton list of the name of the target. If output_names is none, a non-descriptivie output name is used.explain_class (
int
) – Only used for classification explainers. This is the index of the model’s output class which needs to be explained.kernel_width (
Optional
[float
]) – Hyperparameter for LimeTabularExplainer. Please refer to lime’s documentation for details.use_shap (
bool
) – Boolean to initialize the shap explainer.use_lime (
bool
) – Boolean to initialize the lime explainer.n_classes (
int
) – Only used for classification explainers. Number of output classes of the model’s predict_proba function.version (
int
) – Version ofAsset
, see its documentation for more details.metadata (
Optional
[dict
]) –Asset
metadata, see its documentation for more details.
EXAMPLE:
# Imports from virtualitics_sdk.assets.explainer import Explainer . . . # Example usage data_train = store_interface.get_dataset(label="example", name="data train") train_mins = data_train.get_object().min() train_maxs = data_train.get_object().max() bounds = {key: [train_mins[key], train_maxs[key]] for key in data_train.get_object().columns} data_test = store_interface.get_dataset(label="example", name="data test") data_attributes = store_interface.get_input("Data Attributes") graph_data = store_interface.get_dataset(label="example", name="graph data") kmeans_anomaly = store_interface.get_model(label="example", name="kmeans") # explain instance explainer = Explainer(model=kmeans_anomaly, training_data=data_train, label="example code", name="kmeans explainer", feature_names=data_attributes['features'], output_names=['Normal', 'Anomaly'], mode='classification', explain_class=1, kernel_width=0.5, use_shap=True)
- check_lime_usage_()¶
- check_shap_usage_()¶
- explain(data=None, indices=None, method='manual', n=10, encoding=None, titles=None, instance_sets=[], num_features_explain=3, nlp_explanation_method='shap', return_as='plots', waterfall_positive=None, waterfall_negative=None, expected_title=None, predicted_title=None, top_n=None, show_title=True, show_description=True)¶
Takes in a list of instances and returns a list of cards/images of explanations for each instance. Can also be used to perform smart instance selection and describe specific interesting subsets of the input data.
- Parameters:
data (
Union
[DataFrame
,Dataset
,None
]) – Set of data which can be explained. In the case of manual instance selection method, instances are explailned directly from this dataset. The entire dataset will be used unless ‘indices’ is also specified, in which case only a subset will be used. In the case of smart instance selection method, this dataset is used to identify relevant subsets of the data according to different criteria. If smart instance selection is performed once, this argument does not need to be specified in subsequent explanations using the smart instance selection method, defaults to None.indices (
Optional
[List
]) – Indices of ‘data’ to be used in either manual or smart instance selection methods, defaults to None.method (
Union
[str
,InstanceSelectionMethod
]) – Can be either ‘manual’ or ‘smart’. If ‘manual’, the entire ‘data’ dataframe will be explained. If ‘smart’, smart instance selection is performed. Relevant instance sets which are explaiend can be specified using the ‘instance_sets’ argument. defaults to ‘smart’.n (
int
) – Parameter for smart instance selection. Number of instances to put in each identified subset, defaults to 10.encoding (
Union
[str
,DataEncoding
,None
]) – Encoding of ‘data’, can be ‘ordinal’, ‘verbose’, or ‘one_hot’. If not specified, ‘data’ is assumed to be in the same encoding format as the model as specified in this classes constructor, defaults to None.titles (
List
[str
]) – Title of the returned images or cards. If not specified but instance_sets is specified, instance_sets are used as the title instead. Otherwise, non-descriptive titles are used., defaults to None.instance_sets (
List
[str
]) – Names of subsets to explain created by smart instance selection. Only used when method is ‘smart’, defaults to [].num_features_explain (
int
) – Number of features to include in the NLP explanation of the instance, defaults to 3.nlp_explanation_method (
Union
[str
,NLPExplanationMethod
]) – Type of NLP explanation to use. Can be either LIME or SHAP. Output explanation will be ordered by the feature importances attributed by the corresponding explainer, defaults to ‘shap’.return_as (
str
) – String to determine whether to return the output as “cards” or “images” or “plots”, defaults to “plots”.waterfall_positive (
Optional
[str
]) – The color of the positive waterfall plot bars, defaults to “#3B82F6”.waterfall_negative (
Optional
[str
]) – The color of the positive waterfall plot bars, defaults to “#EF4444”.expected_title (
Optional
[str
]) – The expected title to show for the generated plot from this dashboard, defaults to “Expected Value”.predicted_title (
Optional
[str
]) – The predicted title to show for the generated plot from this dashboard, defaults to “Final Prediction”.top_n (
Optional
[int
]) – If set, only return the top N most significant values in the waterfall plot, defaults to None.show_title (
bool
) – Whether to show the title on the page when rendered, defaults to True.show_description (
bool
) – Whether to show the description to the page when rendered, defaults to True.
- Raises:
ValueError – If top_n is an invalid number.
NotImplementedError – if return_type is not yet supported.
- Return type:
Union
[List
[Image
],List
[Card
],List
[WaterfallPlot
]]- Returns:
List of plots/cards/images of explanation of input instances.
- filter_features(X, encoding=None)¶
- Return type:
DataFrame
- get_data_likelihood(instance, encoding=None)¶
Returns a string indicating the likelihood of the given instance occuring in the training dataset.
- Parameters:
instance – The instance for which the function finds the likelihood. Should have all the same features as instances from the training dataset.
encoding (
Union
[str
,DataEncoding
,None
]) – The encoding that the given instance is in. If None, assumes that it is the same encoding as the training dataset. Defaults to None.
- Return type:
str
- Returns:
One of two strings, “Unlikely” or “Likely”
- get_feature_explanation(instance, encoding=None, explanation_method=None, include_importance=True)¶
- Returns a dictionary mapping feature names to explanations of the relationship to the rest of the training dataset.
For example, a numerical feature will be described using the quantile it falls under.
- Parameters:
instance – The instance for which the function finds the likelihood. Should have all the same features as instances from the training dataset.
encoding (
Union
[str
,DataEncoding
,None
]) – The encoding that the given instance is in. If None, assumes that it is the same encoding as the training dataset. Defaults to None.explanation_method (
Union
[str
,NLPExplanationMethod
,None
]) – The method by which to create an NLP explanation, defaults to NLPExplanationMethod.SHAP.include_importance (
bool
) – If true, adds an additional string which describes the impact the feature made on the prediction, defaults to True.
- Return type:
Dict
[str
,str
]- Returns:
A dictionary mapping feature names to their explanation strings.
- get_instance_hash(instance, encoding=None)¶
- get_lime_explanation(instance, encoding=None, n_explanation_features=10, model_name='BayesianRidge', **kwargs)¶
- get_shap_explanation(instance, encoding=None)¶
- get_text_explanation(instance, encoding=None, num_features_explain=1, explanation_method='shap', include_importance=True)¶
- Return type:
str
- static get_waterfall_explanation()¶
- initialize_dataset_model_stats()¶
- initialize_encodings(feature_names=None)¶
- Return type:
None
- initialize_explainer_lime()¶
- Return type:
None
- initialize_explainer_shap()¶
- Return type:
None
- initialize_explainers()¶
- Return type:
None
- initialize_explanation_mode(exp_type)¶
- Return type:
None
- initialize_model(model, training_data)¶
- initialize_output_names(output_names=None)¶
- Return type:
None
- lime_target_func(x)¶
- make_model_regressor(model_name, **kwargs)¶
- model_preprocess(X)¶
- Return type:
DataFrame
- pick_instance(instance_set, n_pick=1)¶
- plot_waterfall_instance(instance, encoding=None)¶
- Return type:
Figure
- shap_target_func(x, preprocess=True)¶
- show_significant_vars(instance, encoding=None, title='')¶
- smart_instance_selection(data, n=10, encoding=None)¶
- train_data_likelihood_model(retrain=False, *args, **kwargs)¶
- Return type:
None
- class virtualitics_sdk.assets.explainer.ExplainerReturnTypes(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
ExtendedEnum
- CARDS = 'cards'¶
- IMAGES = 'images'¶
- PLOTS = 'plots'¶
- class virtualitics_sdk.assets.explainer.ExplanationTypes(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
ExtendedEnum
- CLASSIFICATION = 'classification'¶
- REGRESSION = 'regression'¶
- class virtualitics_sdk.assets.explainer.InstanceSelectionMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
ExtendedEnum
- MANUAL = 'manual'¶
- SMART = 'smart'¶
- class virtualitics_sdk.assets.explainer.InstanceSet(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
Enum
- CERTAIN_HIT = 'certain_hit'¶
- CERTAIN_MISS = 'certain_miss'¶
- UNCERTAIN_HIT = 'uncertain_hit'¶
- UNCERTAIN_MISS = 'uncertain_miss'¶
- class virtualitics_sdk.assets.explainer.NLPExplanationMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
ExtendedEnum
- LIME = 'lime'¶
- SHAP = 'shap'¶