schema¶
- class virtualitics_sdk.assets.schema.Schema(schema=None, exact_match=False, valid_inputs=None, label=None, metadata=None, name=None, description=None, version=None, **kwargs)¶
Bases:
Asset
A schema asset allows validation of a DataFrame according to a pre-specified schema. A schema is specified using a pd.Series object with indices names as expected columns in the dataframe and values as their expected dtypes. There are 2 levels of validation that can be performed. The first “level 1” validation ensures that the dataframe contains all the expected columns specified in the schema, and optionally ensures that no additional columns are present. It also ensures that each column’s dtype matches the schema. The optional “level 2” validation performs additional checks, ensuring that numerical columns are within a specified range of values and that categorical columns take on a value from a specified list. In the event that a check fails, an exception is raised. Otherwise the validation function returns true.
- Parameters:
schema (
Optional
[Series
]) – A schema object. Indices should be expected column names and values should be expected dtypes.exact_match (
bool
) – If true, the dataframe must not have any additional columns not expected in the schema or else an exception will be raised. If performing level 2 validation, the ‘valid_inputs’ variable must have key/value entries for each column in the schema. If false, these checks will be ignored.valid_inputs (
Optional
[Dict
[str
,List
]]) – A dictionary mapping column names (str) to lists describing their valid inputs. For columns with a numerical dtype, the value is expected to be [min, max] where min and max is the minimum and maximum possible values in the column respectively. For columns with a “object” dtype (i.e. string/categorical columns) the value is expected to be a list of all possible values in the column. If valid_inputs is None, this level 2 validation will not be performed.label (
Optional
[str
]) – Label forAsset
, see its documentation for more details.metadata (
Optional
[dict
]) – Metadata forAsset
, see its documentation for more details.name (
Optional
[str
]) – Name forAsset
, see its documentation for more details.description (
Optional
[str
]) – Description ofAsset
, see its documentation for more details.version (
Optional
[int
]) – Version forAsset
, see its documentation for more details.
- convert_strings_to_dtype(arr)¶
- validate(df)¶
- Return type:
bool
- validate_level_1(df)¶
- Return type:
bool
- validate_level_2(df)¶
- Return type:
bool