store_interface#

When writing a flow, there are different ways to store generated data and want to use in later steps or even flows. The StoreInterface provides a convenient way to interact with predict data stores

class predict_backend.store.store_interface.StoreInterface(flow_id, user=None, step_name=None, is_action=False, bucket_name=None, profile_name='govcloud')#

Bases: object

The StoreInterface class is the flow writer’s main interface to storing and retrieving meta data. It provides convenience methods for storing input data, assets and flow metadata and also methods for retrieving previously saved data.

create_future_element(elem, future_step_name)#
db_to_pandas(query, conn_name, connection_owner=None, **kwargs)#

Given a SQL query and a connection name of a connection stored in the connection store execute the query against the defined data store connection and return the result set as a pandas data frame

Parameters:
  • query (str) – The SQL query to execute against the supplied data store

  • conn_name (str) – The connection name where database credentials, host, etc will be retrieved from the connection store

  • connection_owner (str) – (optional) The owner of the connection being retrieved, this defaults to the current user

  • kwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store

Returns:

A pandas data frame

get_asset(label=None, type=None, name=None, time_created=None, asset_id=None)#

Retrieve a saved Asset. This function returns (at most) 1 asset, use get_assets for retrieving a list of assets that matches the supplied argument values. This function will only return an asset that the requesting user has access to

Parameters:
  • label (Optional[str]) – The label of the Asset, defaults to None.

  • type (Optional[AssetType]) – The type of asset, defaults to None.

  • name (Optional[str]) – The name for the asset, defaults to None.

  • time_created (Optional[str]) – The time the asset was created. This is especially optional and only necessary when you want to receive an Asset by timestamp as well as other metadata, defaults to None.

  • asset_id (Optional[str]) – The unique identifier of a specific asset, defaults to None

Raises:

ValueError – If the asset label and type are both None.

Return type:

Asset

Returns:

The Asset object.

get_asset_by_id(asset_id)#

Retrieve a saved asset using the asset_id

Parameters:

asset_id (str) – The unique identifier of a specific asset.

Return type:

Asset

Returns:

The Asset object.

get_assets(label=None, type=None, name=None, asset_id=None)#

Retrieve multiple saved Assets. Providing any of the attributes will filter all available assets to retrieve only the ones which match the given label, type, name, combination. Providing none of these descriptors will retrieve all available assets.

Parameters:
  • label (Optional[str]) – The label of the Asset, defaults to None.

  • type (Optional[AssetType]) – The type of asset, defaults to None.

  • name (Optional[str]) – The name for the asset, defaults to None.

  • asset_id (Optional[str]) – The unique identifier of a specific asset, defaults to None

Return type:

List[Asset]

Returns:

List of Asset objects.

get_current_step_user_input(data_source_title)#

Get the raw bytes that were uploaded in the current step (prior to the step action). This can be useful for doing data validation on the uploaded data in a dynamic page update function. When uploading data using the DataSource element the data is not converted into a dataframe until the step action is run (Next button) which also puts the data on the subsequent steps in-link. This means that you cannot access the uploaded object using the common methods which retrieve data from the in-link

Parameters:

data_source_title (str) – the title of the DataSource element

Return type:

BytesIO

Returns:

a BytesIO object of the data that was uploaded

get_dataset(label=None, name=None)#

This is a convenience method for getting Assets that have a “Dataset” type

Parameters:
  • label (Optional[str]) – The label of the asset, defaults to None.

  • name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Dataset

Returns:

The Dataset asset.

get_element(step_name, elem_title, quiet=False)#

Get an element from any step by its title.

Parameters:
  • step_name (str) – the step name where the element was created

  • elem_title (str) – the title of the element to lookup

  • quiet (bool) – if True, return None instead of error if the element does not exist. Defaults to False.

Return type:

Element

Returns:

get_element_by_id(step_name, elem_id)#

Get an element from any step by its auto-generated ID

Parameters:
  • step_name (str) – the step name where the element was created

  • elem_id (str) – the element id

Return type:

Element

Returns:

an Element object

get_element_value(step_name, elem_title, quiet=False)#

Get the value of an element the user interacted with in Predict.

Parameters:
  • step_name (str) – The name of the step the element was in.

  • elem_title (str) – The title of the element to select.

  • quiet (bool) – if True, return None instead of error if the element does not exist. Defaults to False.

Returns:

The value of that element interaction. This will differ by input element type.

get_explainer(label=None, name=None)#

This is a convenience method for getting Assets that have an “Explainer” type

Parameters:
  • label (Optional[str]) – The label of the asset, defaults to None.

  • name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Explainer

Returns:

The Explainer asset.

get_flow_status()#
get_input(name, step_name=None)#

Get a previously stored input value stored with save_output.

Parameters:
  • name (str) – The input name.

  • step_name (str) – [Optional] The step name to retrieve the input data from

Raises:

ValueError – If the name is not not found in the database.

Returns:

The previously saved data.

get_model(label=None, name=None)#

This is a convenience method for getting Assets that have a “Model” type

Parameters:
  • label (Optional[str]) – The label of the asset, defaults to None.

  • name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Model

Returns:

The Model asset.

get_page()#

Get the most up to date version of the page for this step.

Return type:

Page

Returns:

The most up to date Page for this step.

get_previous_step_name()#
Return type:

Optional[str]

get_raw_data_source_data(step_name=None, element_id=None, element_title=None)#
Return type:

BytesIO

get_s3_asset(path)#

Retrieve an asset from a pre-specified bucket. In order to use, please initialize the store interface with the bucket_name parameter. TODO: this asset wont exist in the asset store, this function might need to be removed if possible

Parameters:

path (str) – The path to the asset in the s3 bucket.

Returns:

returns the asset from s3

get_schema(label=None, name=None)#

This is a convenience method for getting Assets that have a “Schema” type

Parameters:
  • label (Optional[str]) – The label of the asset, defaults to None.

  • name (Optional[str]) – The name of the asset, defaults to None.

Return type:

Schema

Returns:

The Schema asset.

has_input(name)#

Check if a step contains a link to an input

Parameters:

name (str) – the name to check for

Return type:

bool

Returns:

does the name exist within the step’s in_link

static list_fixture_data(prefix=None)#

List all of the objects stored within the deployment’s fixture path in s3://{meta-data-bucket}/fixture/{prefix}

Parameters:

prefix (Optional[str]) – Optional prefix to filter objects

Return type:

List[str]

Returns:

list of s3 keys

pandas_to_db(_df, table, conn_name, connection_owner=None, if_exists='fail', **kwargs)#

Write the contents of a dataframe to the supplied data store table. Retrieve DB connection details by supplying the connection name and the connection owner (optional)

Parameters:
  • _df (DataFrame) – The pandas dataframe to be written

  • table (str) – The destination table where data will be written

  • conn_name (str) – The connection name where database credentials, host, etc will be retrieved from the connection store

  • connection_owner (str) – (optional) The owner of the connection being retrieved, this defaults to the current user

  • if_exists (str) – What to do if the table already exists: ‘fail’, ‘overwrite’, ‘append’

  • kwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store

Returns:

save_asset(asset, overwrite=False, asset_id=None)#

Save an Asset. This is useful for storing objects, datasets, models to be used in other flows or within the current flow. Assets are persisted until they are deleted (even if the flow they were created in is deleted)

Parameters:
  • asset (Asset) – The asset object to save.

  • overwrite (Optional[bool]) – bool: Overwrite the existing asset with the same label and type if it exists

  • asset_id (Optional[str]) – str: An optional asset_id to use when overwriting

save_output(data, name)#

Save the intermediate value of some information for access in a later step

Parameters:
  • data (Union[DataFrame, Series, dict, int, float, BytesIO]) – The data to save.

  • name (str) – The label to use the access this data in a later step.

Raises:
  • ValueError – If the data passed in is no pickleable.

  • ValueError – An invalid persistence method is used.

static str_2_connection_type(list_connection_type)#

Converts a list of strings representing the connection type to a list of ConnectionType

Parameters:

list_connection_type (List[str]) – list of strings representing the connection type

Return type:

List[ConnectionType]

Returns:

list of ConnectionType

update_page(page)#

Update a page. This is usually called from within a step to dynamically update content on the page as the step is running.

Parameters:

page (Page) – The Page object containing updates.

update_progress(completion, message, page_update=False)#

Update the progress of the step as it’s running. It is recommended to use this when steps have operations that can take a long time.

Parameters:
  • completion – The progress to completion (0 to 100).

  • message – The message to show at this level of completion.

Returns:

True if the progress message was sent successfully.