xai_utils#

predict_backend.utils.xai_utils.data_percentile(feature_name, data_val, categorical_names, X)#

For numerical features, computes the percentile of a data value. For categorical features, just returns the name of the category.

Parameters:

feature_name (str) – The name of the column that data_val belongs to.
data_val (Union[int, float, complex, number, str, object]) – The value of the feature that this function will evaluate.
categorical_names (List[str]) – The list of categorical columns in the dataset.
X (DataFrame) – The dataframe from which relevant statistics will be computed.

Return type:

Union[str, float]

Returns:

If the feature is categorical, returns a string of the data value. If the feature is numerical, calculates the percentile of the value normalized between 0 and 1.

predict_backend.utils.xai_utils.dataset_stats(feature_names, categorical_names, train_data)#

Creates a dictionary describing attributes of features in the dataset. Returns relevant descriptions of the features for both numerical and categorical data.

Parameters:

feature_names (List[str]) – The feature names to evaluate statistics for.
categorical_names (List[str]) – The columns of the dataset which are categorical.
train_data (DataFrame) – The dataframe from which statistics are calculated.

Return type:

Dict[str, Dict[str, Any]]

Returns:

The dictionaries keys are feature names, and values are dictionaries describing attributes of that feature. Every value dictionary has a key type which specifies whether the feature is numerical or categorical. Numerical features also have entries min and max which each contain a number equaling the minimum and maximum of that column. Categorical features have dictionary entries for each category name. The entries for each category consist of count and frequency describing the count of that category in the dataset and the frequency as a proportion of the whole dataset. It also contains a string description describing how it relates to the rest of the categories, i.e. the most common, least common, or relatively common/uncommon category.

predict_backend.utils.xai_utils.idx_pick_extreme(vals, idx_to_check=None, mode='low', n=50)#

Returns indices of extreme values from a list.

Parameters:

vals (array) – Values to pick from.
idx_to_check (Optional[array]) – Indices of subset of values to check to pick from. If None, then all of vals is used. Defaults to None.
mode (str) – Which type of extreme value to choose. Can be “high” or “low”, defaults to “low”. Defaults to “low”.
n (int) – number of extreme values to pick. Must be less than the length of vals. Defaults to 50.

Return type:

List[int]

Returns:

Indices of selected extreme values

predict_backend.utils.xai_utils.idx_pick_percentile(vals, idx_to_check=None, percentile_range=(0.0, 1.0), n=50)#

Returns indices of values in a given percentile range.

Parameters:

vals (array) – Values to pick from.
idx_to_check (Optional[array]) – Indices of subset of values to check to pick from. If None, then all of vals is used. Defaults to None.
percentile_range (Tuple[float, float]) – Lower and upper bound of the percentile range to choose from. Defaults to (0.0, 1.0).
n (int) – Number of values to pick. Defaults to 50.

Return type:

List[int]

Returns:

Selected extreme values.

predict_backend.utils.xai_utils.importance_string(importance)#

Returns a partial sentence describing the importance of a feature based on the importance value.

Parameters:: importance (float) – A float from 0 to 1 describing the proportion that the feature had on a model prediction.
Return type:: str
Returns:: A string describing the importance level.

predict_backend.utils.xai_utils.list_to_string(l)#

Concatenates a list of strings by separating each element using the Oxford comma style.

Parameters:: l (List[str]) – A list of strings.
Return type:: str
Returns:: A concatenated string using the Oxford comma to delineate items.

predict_backend.utils.xai_utils.make_ordinal(n)#

Convert an integer into its ordinal representation.

Return type:: str
Returns:: A string representation of the number. For instance, 0 maps to 0th, 3 maps to 3rd, and 122 maps to 122nd.

predict_backend.utils.xai_utils.model_stats(model_func, X)#

Computes statistics about a machine learning model for use in the XAI module.

Parameters:

model_func (Callable) – A model to compute statistics for.
X (DataFrame) – The dataset on which these statistics will be computed. Should be the training dataset.

Return type:

Dict[str, Any]

Returns:

A dictionary with one entry, expected, containing the average prediction value over the training dataset.

predict_backend.utils.xai_utils.quantile_bin(feature_name, data_val, categorical_names, X)#

For numerical inputs, returns a string describing a bin of where the feature falls in relation to the rest of the data. For categorical inputs, just returns the category name.

Parameters:

feature_name (str) – The name of the feature to find the bin.
data_val (Union[int, float, complex, number, str, object]) – The value of the feature for which the function will evaluate.
categorical_names (List[str]) – The categorical columns of the dataset.
X (DataFrame) – The reference dataset from which to calculate the feature value percentile. This dataframe should contain a column with name feature_name.

Return type:

str

Returns:

A string description of the feature in relation to the rest of the dataset.