store_interface¶

class virtualitics_sdk.store.store_interface.DatasetIndex¶

Bases: TypedDict

Type Hint for specifying Dataset Asset Indexes in StoreInterface.save_datastore_asset

columns: List[str]¶

unique: Optional[bool]¶

class virtualitics_sdk.store.store_interface.StoreInterface(flow_id, user=None, step_name=None, is_action=False, bucket_name=None)¶

Bases: object

The StoreInterface class is the main interface to storing and retrieving metadata. It provides convenience methods for storing input data, assets and flow metadata and also methods for retrieving previously saved data.

EXAMPLE:

# Imports 
from virtualitics_sdk import StoreInterface
. . . 
# Example usage 
class ExStep(Step):
  def run(self, flow_metadata):
     store_interface = StoreInterface(**flow_metadata)
     page = store_interface.get_page()
     data_source = DataSource(title="Upload data here!",
                                 options=["csv"],
                                 description="Simple data upload example",
                                 required=True,)
     data_card = Card(title="Data Upload Card", content=[data_source])
     page.add_card_to_section(data_card, "")
     store_interface.update_page(page)

create_element_link(element, step_name=None)¶

create_future_element(elem, future_step_name)¶

db_to_pandas(query, conn_name, connection_owner=None, **kwargs)¶

NOTICE: As of version 1.23.0 this function is depreciated

Given a SQL query and a connection name of a connection stored in the connection store execute the query against the defined data store connection and return the result set as a pandas data frame

Parameters:

query (str) – The SQL query to execute against the supplied data store
conn_name (str) – The connection name where database credentials, host, etc will be retrieved from the connection store
connection_owner (Optional[str]) – (optional) The owner of the connection being retrieved, this defaults to the current user
kwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store

Returns:

A pandas data frame

get_asset(label=None, type=None, name=None, time_created=None, asset_id=None)¶

Retrieve a saved Asset. This function returns (at most) 1 asset, use get_assets for retrieving a list of assets that matches the supplied argument values. This function will only return an asset that the requesting user has access to

Parameters:

label (Optional[str]) – The label of the Asset, defaults to None.
type (Optional[AssetType]) – The type of asset, defaults to None.
name (Optional[str]) – The name for the asset, defaults to None.
time_created (Optional[str]) – The time the asset was created. This is especially optional and only necessary when you want to receive an Asset by timestamp as well as other metadata, defaults to None.
asset_id (Optional[str]) – The unique identifier of a specific asset, defaults to None

Raises:

ValueError – If the asset label and type are both None.

Return type:

Asset

Returns:

The Asset object.

get_asset_by_id(asset_id)¶

Retrieve a saved asset using the asset_id

Parameters:: asset_id (str) – The unique identifier of a specific asset.
Return type:: Asset
Returns:: The Asset object.

get_assets(label=None, type=None, name=None, asset_id=None)¶

Retrieve multiple saved Assets. Providing any of the attributes will filter all available assets to retrieve only the ones which match the given label, type, name, combination. Providing none of these descriptors will retrieve all available assets.

Parameters:

label (Optional[str]) – The label of the Asset, defaults to None.
type (Optional[AssetType]) – The type of asset, defaults to None.
name (Optional[str]) – The name for the asset, defaults to None.
asset_id (Optional[str]) – The unique identifier of a specific asset, defaults to None

Return type:

List[Asset]

Returns:

List of Asset objects.

get_boto3_s3_client_from_connection_store(connection_id, **kwargs)¶

Using a connection stored in the connection store, create and return a boto3 client configured with the credentials stored in the connection store

Parameters:

connection_id (str) – the UID of a connection stored in the connection store
kwargs – additional keyword arguments to pass to the boto3.Session or boto3.client objects

Return type:

client

Returns:

a boto3.client(‘s3’)

get_current_step_user_input(data_source_title)¶

Get the raw bytes that were uploaded in the current step (prior to the step action). This can be useful for doing data validation on the uploaded data in a dynamic page update function. When uploading data using the DataSource element the data is not converted into a dataframe until the step action is run (Next button) which also puts the data on the subsequent steps in-link. This means that you cannot access the uploaded object using the common methods which retrieve data from the in-link

Parameters:: data_source_title (str) – the title of the DataSource element
Return type:: BytesIO
Returns:: a BytesIO object of the data that was uploaded

get_dataset(label=None, name=None)¶

This is a convenience method for getting Assets that have a “Dataset” type

Parameters:

label (Optional[str]) – The label of the asset, defaults to None.
name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Dataset

Returns:

The Dataset asset.

get_element(step_name, elem_title, quiet=False)¶

Get an element from any step by its title.

Parameters:

step_name (str) – the step name where the element was created
elem_title (str) – the title of the element to lookup
quiet (bool) – if True, return None instead of error if the element does not exist. Defaults to False.

Return type:

Element

Returns:

get_element_by_id(step_name, elem_id)¶

Get an element from any step by its auto-generated ID

Parameters:

step_name (str) – the step name where the element was created
elem_id (str) – the element id

Return type:

Element

Returns:

an Element object

get_element_value(step_name, elem_title, quiet=False)¶

Get the value of an element the user interacted with in the Virtualitics AI Platform.

Parameters:

step_name (str) – The name of the step the element was in.
elem_title (str) – The title of the element to select.
quiet (bool) – if True, return None instead of error if the element does not exist. Defaults to False.

Returns:

The value of that element interaction. This will differ by input element type.

get_explainer(label=None, name=None)¶

This is a convenience method for getting Assets that have an “Explainer” type

Parameters:

label (Optional[str]) – The label of the asset, defaults to None.
name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Explainer

Returns:

The Explainer asset.

get_input(name, step_name=None)¶

Get a previously stored input value stored with save_output.

Parameters:

name (str) – The input name.
step_name (Optional[str]) – [Optional] The step name to retrieve the input data from

Raises:

ValueError – If the name is not not found in the database.

Returns:

The previously saved data.

get_model(label=None, name=None)¶

This is a convenience method for getting Assets that have a “Model” type

Parameters:

label (Optional[str]) – The label of the asset, defaults to None.
name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Model

Returns:

The Model asset.

get_page(_out_link=None)¶

Get the most up to date version of the page for this step.

Parameters:: _out_link – optionally pass an existing outlink reference to prevent repeated calls to get_outlink() and to improve performance
Return type:: Page
Returns:: The most up to date Page for this step.

get_previous_step_name()¶

Return type:: Optional[str]

get_raw_data_source_data(step_name=None, element_id=None, element_title=None)¶

Return type:: BytesIO

get_s3_asset(path)¶

Retrieve an asset from a pre-specified bucket. In order to use, please initialize the store interface with the bucket_name parameter. TODO: this asset wont exist in the asset store, this function might need to be removed if possible

Parameters:: path (str) – The path to the asset in the s3 bucket.
Returns:: returns the asset from s3

get_schema(label=None, name=None)¶

This is a convenience method for getting Assets that have a “Schema” type

Parameters:

label (Optional[str]) – The label of the asset, defaults to None.
name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Schema

Returns:

The Schema asset.

is_pyvip_connected()¶

Check if a User has an active pyvip connection to explore

Return type:: bool
Returns:: bool

static list_fixture_data(prefix=None)¶

List all of the objects stored within the deployment’s fixture path in s3://{meta-data-bucket}/fixture/{prefix}

Parameters:: prefix (Optional[str]) – Optional prefix to filter objects
Return type:: List[str]
Returns:: list of s3 keys

pandas_to_db(_df, table, conn_name, connection_owner=None, if_exists='fail', **kwargs)¶

NOTICE: As of version 1.23.0 this function is depreciated

Write the contents of a dataframe to the supplied data store table. Retrieve DB connection details by supplying the connection name and the connection owner (optional)

Parameters:

_df (DataFrame) – The pandas dataframe to be written
table (str) – The destination table where data will be written
conn_name (str) – The connection name where database credentials, host, etc will be retrieved from the connection store
connection_owner (Optional[str]) – (optional) The owner of the connection being retrieved, this defaults to the current user
if_exists (str) – What to do if the table already exists: ‘fail’, ‘replace’, ‘append’
kwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store

Returns:

query_datastore_asset(model, select_=None, where_=None, name=None, asset_id=None)¶

Return type:: DataFrame

` Query a previously saved datastore asset (saved with StoreInterface.save_datastore_asset()). Provide a: SQLAlchemy BaseModel that describes the table where the asset is stored and optional select and where clauses. Returning a pandas dataframe that represents the ResultSet.

save_asset(asset, overwrite=False, asset_id=None, serialization_method=None)¶

Save an Asset. This is useful for storing objects, datasets, models to be used in other apps or within the current flow. Assets are persisted until they are deleted (even if the flow they were created in is deleted)

Parameters:

asset (Asset) – The asset object to save.
overwrite (bool) – bool: Overwrite the existing asset with the same label and type if it exists
asset_id (Optional[str]) – str: An optional asset_id to use when overwriting
serialization_method (Optional[AssetPersistenceMethod]) – AssetPersistenceMethod: An Optional argument to force serialization using a specific method

save_datastore_asset(data, name, asset_id=None, description='', overwrite_if_exists=True, encode_columns=None, indexes=None)¶

Write a pandas dataframe to a postgres table, and create an asset record. This allows for more efficient querying of the underlying data for certain use cases. Instead of being required to read the entire dataset into a dataframe in memory and perform transform, filter, select, etc. operations on the data. Instead this enables use of the query_datastore_asset function which allows for those operations to be executed in the db returning a smaller result set

Parameters:

data (DataFrame) – a pandas dataframe containing all the data to be written
name (str) – a name for this dataset, this should be a unique identifier that refers to the dataset
asset_id (Optional[str]) – if overwriting an existing datastore asset, providing the asset_id specifies which asset will be replaced
description (Optional[str]) – a description of the asset, displayed in the Assets page
overwrite_if_exists (bool) – overwrite the existing data with this name?
indexes (Optional[DatasetIndex]) – an optional list of indexes to specify eg. [{‘columns’: [‘column1’], ‘unique’: True}]

Returns:

the asset_id of the saved datastore asset

EXAMPLE:

save_output(data, name)¶

Save the intermediate value of some information for access in a later step

Parameters:

data (Union[DataFrame, Series, dict, int, float, BytesIO]) – The data to save.
name (str) – The label to use the access this data in a later step.

Raises:

ValueError – If the data passed in is no pickleable.
ValueError – An invalid persistence method is used.

static str_2_connection_type(list_connection_type)¶

Converts a list of strings representing the connection type to a list of ConnectionType

Parameters:: list_connection_type (List[str]) – list of strings representing the connection type
Return type:: List[ConnectionType]
Returns:: list of ConnectionType

update_page(page)¶

Update a page. This is usually called from within a step to dynamically update content on the page as the step is running.

Parameters:: page (Page) – The Page object containing updates.

update_progress(completion, message, page_update=False)¶

Update the progress of the step as it’s running. It is recommended to use this when steps have operations that can take a long time.

Parameters:

completion (Union[float, int]) – The progress to completion (0 to 100).
message (str) – The message to show at this level of completion.

Returns:

True if the progress message was sent successfully.