store_interface¶
- class virtualitics_sdk.store.store_interface.DatasetIndex¶
Bases:
TypedDict
Type Hint for specifying Dataset Asset Indexes in StoreInterface.save_datastore_asset
-
columns:
List
[str
]¶
-
unique:
Optional
[bool
]¶
-
columns:
- class virtualitics_sdk.store.store_interface.StoreInterface(flow_id, user=None, step_name=None, is_action=False, bucket_name=None)¶
Bases:
object
The StoreInterface class is the main interface to storing and retrieving metadata. It provides convenience methods for storing input data, assets and flow metadata and also methods for retrieving previously saved data.
EXAMPLE:
# Imports from virtualitics_sdk.store.store_interface import StoreInterface . . . # Example usage class ExStep(Step): def run(self, flow_metadata): store_interface = StoreInterface(**flow_metadata) page = store_interface.get_page() data_source = DataSource(title="Upload data here!", options=["csv"], description="Simple data upload example", required=True,) data_card = Card(title="Data Upload Card", content=[data_source]) page.add_card_to_section(data_card, "") store_interface.update_page(page)
- create_element_link(element, step_name=None)¶
- create_future_element(elem, future_step_name)¶
- db_to_pandas(query, conn_name, connection_owner=None, **kwargs)¶
NOTICE: As of version 1.23.0 this function is depreciated
Given a SQL query and a connection name of a connection stored in the connection store execute the query against the defined data store connection and return the result set as a pandas data frame
- Parameters:
query (
str
) – The SQL query to execute against the supplied data storeconn_name (
str
) – The connection name where database credentials, host, etc will be retrieved from the connection storeconnection_owner (
Optional
[str
]) – (optional) The owner of the connection being retrieved, this defaults to the current userkwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store
- Returns:
A pandas data frame
- get_asset(label=None, type=None, name=None, time_created=None, asset_id=None)¶
Retrieve a saved Asset. This function returns (at most) 1 asset, use get_assets for retrieving a list of assets that matches the supplied argument values. This function will only return an asset that the requesting user has access to
- Parameters:
label (
Optional
[str
]) – The label of the Asset, defaults to None.type (
Optional
[AssetType
]) – The type of asset, defaults to None.name (
Optional
[str
]) – The name for the asset, defaults to None.time_created (
Optional
[str
]) – The time the asset was created. This is especially optional and only necessary when you want to receive an Asset by timestamp as well as other metadata, defaults to None.asset_id (
Optional
[str
]) – The unique identifier of a specific asset, defaults to None
- Raises:
ValueError – If the asset label and type are both None.
- Return type:
- Returns:
The Asset object.
- get_asset_by_id(asset_id)¶
Retrieve a saved asset using the asset_id
- Parameters:
asset_id (
str
) – The unique identifier of a specific asset.- Return type:
- Returns:
The Asset object.
- get_assets(label=None, type=None, name=None, asset_id=None)¶
Retrieve multiple saved Assets. Providing any of the attributes will filter all available assets to retrieve only the ones which match the given label, type, name, combination. Providing none of these descriptors will retrieve all available assets.
- Parameters:
label (
Optional
[str
]) – The label of the Asset, defaults to None.type (
Optional
[AssetType
]) – The type of asset, defaults to None.name (
Optional
[str
]) – The name for the asset, defaults to None.asset_id (
Optional
[str
]) – The unique identifier of a specific asset, defaults to None
- Return type:
List
[Asset
]- Returns:
List of Asset objects.
- get_boto3_s3_client_from_connection_store(connection_id, **kwargs)¶
Using a connection stored in the connection store, create and return a boto3 client configured with the credentials stored in the connection store
- Parameters:
connection_id (
str
) – the UID of a connection stored in the connection storekwargs – additional keyword arguments to pass to the boto3.Session or boto3.client objects
- Return type:
client
- Returns:
a boto3.client(‘s3’)
- get_current_step_user_input(data_source_title)¶
Get the raw bytes that were uploaded in the current step (prior to the step action). This can be useful for doing data validation on the uploaded data in a dynamic page update function. When uploading data using the DataSource element the data is not converted into a dataframe until the step action is run (Next button) which also puts the data on the subsequent steps in-link. This means that you cannot access the uploaded object using the common methods which retrieve data from the in-link
- Parameters:
data_source_title (
str
) – the title of the DataSource element- Return type:
BytesIO
- Returns:
a BytesIO object of the data that was uploaded
- get_dataset(label=None, name=None)¶
This is a convenience method for getting Assets that have a “Dataset” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Dataset asset.
- get_element(step_name, elem_title, quiet=False)¶
Get an element from any step by its title.
- Parameters:
step_name (
str
) – the step name where the element was createdelem_title (
str
) – the title of the element to lookupquiet (
bool
) – if True, return None instead of error if the element does not exist. Defaults to False.
- Return type:
Element
- Returns:
- get_element_by_id(step_name, elem_id)¶
Get an element from any step by its auto-generated ID
- Parameters:
step_name (
str
) – the step name where the element was createdelem_id (
str
) – the element id
- Return type:
Element
- Returns:
an Element object
- get_element_value(step_name, elem_title, quiet=False)¶
Get the value of an element the user interacted with in the Virtualitics AI Platform.
- Parameters:
step_name (
str
) – The name of the step the element was in.elem_title (
str
) – The title of the element to select.quiet (
bool
) – if True, return None instead of error if the element does not exist. Defaults to False.
- Returns:
The value of that element interaction. This will differ by input element type.
- get_explainer(label=None, name=None)¶
This is a convenience method for getting Assets that have an “Explainer” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Explainer asset.
- get_input(name, step_name=None)¶
Get a previously stored input value stored with save_output.
- Parameters:
name (
str
) – The input name.step_name (
Optional
[str
]) – [Optional] The step name to retrieve the input data from
- Raises:
ValueError – If the name is not not found in the database.
- Returns:
The previously saved data.
- get_model(label=None, name=None)¶
This is a convenience method for getting Assets that have a “Model” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Model asset.
- get_page(_out_link=None)¶
Get the most up to date version of the page for this step.
- Parameters:
_out_link – optionally pass an existing outlink reference to prevent repeated calls to get_outlink() and to improve performance
- Return type:
- Returns:
The most up to date Page for this step.
- get_previous_step_name()¶
- Return type:
Optional
[str
]
- get_raw_data_source_data(step_name=None, element_id=None, element_title=None)¶
- Return type:
BytesIO
- get_s3_asset(path)¶
Retrieve an asset from a pre-specified bucket. In order to use, please initialize the store interface with the bucket_name parameter. TODO: this asset wont exist in the asset store, this function might need to be removed if possible
- Parameters:
path (
str
) – The path to the asset in the s3 bucket.- Returns:
returns the asset from s3
- get_schema(label=None, name=None)¶
This is a convenience method for getting Assets that have a “Schema” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Schema asset.
- is_pyvip_connected()¶
Check if a User has an active pyvip connection to explore
- Return type:
bool
- Returns:
bool
- static list_fixture_data(prefix=None)¶
List all of the objects stored within the deployment’s fixture path in s3://{meta-data-bucket}/fixture/{prefix}
- Parameters:
prefix (
Optional
[str
]) – Optional prefix to filter objects- Return type:
List
[str
]- Returns:
list of s3 keys
- pandas_to_db(_df, table, conn_name, connection_owner=None, if_exists='fail', **kwargs)¶
NOTICE: As of version 1.23.0 this function is depreciated
Write the contents of a dataframe to the supplied data store table. Retrieve DB connection details by supplying the connection name and the connection owner (optional)
- Parameters:
_df (
DataFrame
) – The pandas dataframe to be writtentable (
str
) – The destination table where data will be writtenconn_name (
str
) – The connection name where database credentials, host, etc will be retrieved from the connection storeconnection_owner (
Optional
[str
]) – (optional) The owner of the connection being retrieved, this defaults to the current userif_exists (
str
) – What to do if the table already exists: ‘fail’, ‘replace’, ‘append’kwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store
- Returns:
- query_datastore_asset(model, select_=None, where_=None, name=None, asset_id=None)¶
- Return type:
DataFrame
- ` Query a previously saved datastore asset (saved with StoreInterface.save_datastore_asset()). Provide a
SQLAlchemy BaseModel that describes the table where the asset is stored and optional select and where clauses. Returning a pandas dataframe that represents the ResultSet.
- save_asset(asset, overwrite=False, asset_id=None)¶
Save an Asset. This is useful for storing objects, datasets, models to be used in other apps or within the current flow. Assets are persisted until they are deleted (even if the flow they were created in is deleted)
- Parameters:
asset (
Asset
) – The asset object to save.overwrite (
bool
) – bool: Overwrite the existing asset with the same label and type if it existsasset_id (
Optional
[str
]) – str: An optional asset_id to use when overwriting
- save_datastore_asset(data, name, asset_id=None, description='', overwrite_if_exists=True, encode_columns=None, indexes=None)¶
Write a pandas dataframe to a postgres table, and create an asset record. This allows for more efficient querying of the underlying data for certain use cases. Instead of being required to read the entire dataset into a dataframe in memory and perform transform, filter, select, etc. operations on the data. Instead this enables use of the query_datastore_asset function which allows for those operations to be executed in the db returning a smaller result set
- Parameters:
data (
DataFrame
) – a pandas dataframe containing all the data to be writtenname (
str
) – a name for this dataset, this should be a unique identifier that refers to the datasetasset_id (
Optional
[str
]) – if overwriting an existing datastore asset, providing the asset_id specifies which asset will be replaceddescription (
Optional
[str
]) – a description of the asset, displayed in the Assets pageoverwrite_if_exists (
bool
) – overwrite the existing data with this name?indexes (
Optional
[DatasetIndex
]) – an optional list of indexes to specify eg. [{‘columns’: [‘column1’], ‘unique’: True}]
- Returns:
the asset_id of the saved datastore asset
EXAMPLE:
- save_output(data, name)¶
Save the intermediate value of some information for access in a later step
- Parameters:
data (
Union
[DataFrame
,Series
,dict
,int
,float
,BytesIO
]) – The data to save.name (
str
) – The label to use the access this data in a later step.
- Raises:
ValueError – If the data passed in is no pickleable.
ValueError – An invalid persistence method is used.
- static str_2_connection_type(list_connection_type)¶
Converts a list of strings representing the connection type to a list of ConnectionType
- Parameters:
list_connection_type (
List
[str
]) – list of strings representing the connection type- Return type:
List
[ConnectionType
]- Returns:
list of ConnectionType
- update_page(page)¶
Update a page. This is usually called from within a step to dynamically update content on the page as the step is running.
- Parameters:
page (
Page
) – The Page object containing updates.
- update_progress(completion, message, page_update=False)¶
Update the progress of the step as it’s running. It is recommended to use this when steps have operations that can take a long time.
- Parameters:
completion (
Union
[float
,int
]) – The progress to completion (0 to 100).message (
str
) – The message to show at this level of completion.
- Returns:
True if the progress message was sent successfully.