store_interface#
When writing a flow, there are different ways to store generated data and want to use in later steps or even flows. The StoreInterface provides a convenient way to interact with predict data stores
- class predict_backend.store.store_interface.StoreInterface(flow_id, user=None, step_name=None, is_action=False, bucket_name=None, profile_name='govcloud')#
Bases:
object
The StoreInterface class is the flow writer’s main interface to storing and retrieving meta data. It provides convenience methods for storing input data, assets and flow metadata and also methods for retrieving previously saved data.
- create_element_link(element, step_name=None)#
- create_future_element(elem, future_step_name)#
- db_to_pandas(query, conn_name, connection_owner=None, **kwargs)#
Given a SQL query and a connection name of a connection stored in the connection store execute the query against the defined data store connection and return the result set as a pandas data frame
- Parameters:
query (
str
) – The SQL query to execute against the supplied data storeconn_name (
str
) – The connection name where database credentials, host, etc will be retrieved from the connection storeconnection_owner (
str
) – (optional) The owner of the connection being retrieved, this defaults to the current userkwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store
- Returns:
A pandas data frame
- get_asset(label=None, type=None, name=None, time_created=None, asset_id=None)#
Retrieve a saved Asset. This function returns (at most) 1 asset, use get_assets for retrieving a list of assets that matches the supplied argument values. This function will only return an asset that the requesting user has access to
- Parameters:
label (
Optional
[str
]) – The label of the Asset, defaults to None.type (
Optional
[AssetType
]) – The type of asset, defaults to None.name (
Optional
[str
]) – The name for the asset, defaults to None.time_created (
Optional
[str
]) – The time the asset was created. This is especially optional and only necessary when you want to receive an Asset by timestamp as well as other metadata, defaults to None.asset_id (
Optional
[str
]) – The unique identifier of a specific asset, defaults to None
- Raises:
ValueError – If the asset label and type are both None.
- Return type:
- Returns:
The Asset object.
- get_asset_by_id(asset_id)#
Retrieve a saved asset using the asset_id
- Parameters:
asset_id (
str
) – The unique identifier of a specific asset.- Return type:
- Returns:
The Asset object.
- get_assets(label=None, type=None, name=None, asset_id=None)#
Retrieve multiple saved Assets. Providing any of the attributes will filter all available assets to retrieve only the ones which match the given label, type, name, combination. Providing none of these descriptors will retrieve all available assets.
- Parameters:
label (
Optional
[str
]) – The label of the Asset, defaults to None.type (
Optional
[AssetType
]) – The type of asset, defaults to None.name (
Optional
[str
]) – The name for the asset, defaults to None.asset_id (
Optional
[str
]) – The unique identifier of a specific asset, defaults to None
- Return type:
List
[Asset
]- Returns:
List of Asset objects.
- get_current_step_user_input(data_source_title)#
Get the raw bytes that were uploaded in the current step (prior to the step action). This can be useful for doing data validation on the uploaded data in a dynamic page update function. When uploading data using the DataSource element the data is not converted into a dataframe until the step action is run (Next button) which also puts the data on the subsequent steps in-link. This means that you cannot access the uploaded object using the common methods which retrieve data from the in-link
- Parameters:
data_source_title (
str
) – the title of the DataSource element- Return type:
BytesIO
- Returns:
a BytesIO object of the data that was uploaded
- get_dataset(label=None, name=None)#
This is a convenience method for getting Assets that have a “Dataset” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Dataset asset.
- get_element(step_name, elem_title, quiet=False)#
Get an element from any step by its title.
- Parameters:
step_name (
str
) – the step name where the element was createdelem_title (
str
) – the title of the element to lookupquiet (
bool
) – if True, return None instead of error if the element does not exist. Defaults to False.
- Return type:
- Returns:
- get_element_by_id(step_name, elem_id)#
Get an element from any step by its auto-generated ID
- Parameters:
step_name (
str
) – the step name where the element was createdelem_id (
str
) – the element id
- Return type:
- Returns:
an Element object
- get_element_value(step_name, elem_title, quiet=False)#
Get the value of an element the user interacted with in Predict.
- Parameters:
step_name (
str
) – The name of the step the element was in.elem_title (
str
) – The title of the element to select.quiet (
bool
) – if True, return None instead of error if the element does not exist. Defaults to False.
- Returns:
The value of that element interaction. This will differ by input element type.
- get_explainer(label=None, name=None)#
This is a convenience method for getting Assets that have an “Explainer” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Explainer asset.
- get_flow_status()#
- get_input(name, step_name=None)#
Get a previously stored input value stored with save_output.
- Parameters:
name (
str
) – The input name.step_name (
str
) – [Optional] The step name to retrieve the input data from
- Raises:
ValueError – If the name is not not found in the database.
- Returns:
The previously saved data.
- get_model(label=None, name=None)#
This is a convenience method for getting Assets that have a “Model” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
- Returns:
The Model asset.
- get_page()#
Get the most up to date version of the page for this step.
- Return type:
- Returns:
The most up to date Page for this step.
- get_previous_step_name()#
- Return type:
Optional
[str
]
- get_raw_data_source_data(step_name=None, element_id=None, element_title=None)#
- Return type:
BytesIO
- get_s3_asset(path)#
Retrieve an asset from a pre-specified bucket. In order to use, please initialize the store interface with the bucket_name parameter. TODO: this asset wont exist in the asset store, this function might need to be removed if possible
- Parameters:
path (
str
) – The path to the asset in the s3 bucket.- Returns:
returns the asset from s3
- get_schema(label=None, name=None)#
This is a convenience method for getting Assets that have a “Schema” type
- Parameters:
label (
Optional
[str
]) – The label of the asset, defaults to None.name (
Optional
[str
]) – The name of the asset, defaults to None.
- Return type:
Schema
- Returns:
The Schema asset.
- has_input(name)#
Check if a step contains a link to an input
- Parameters:
name (
str
) – the name to check for- Return type:
bool
- Returns:
does the name exist within the step’s in_link
- static list_fixture_data(prefix=None)#
List all of the objects stored within the deployment’s fixture path in s3://{meta-data-bucket}/fixture/{prefix}
- Parameters:
prefix (
Optional
[str
]) – Optional prefix to filter objects- Return type:
List
[str
]- Returns:
list of s3 keys
- pandas_to_db(_df, table, conn_name, connection_owner=None, if_exists='fail', **kwargs)#
Write the contents of a dataframe to the supplied data store table. Retrieve DB connection details by supplying the connection name and the connection owner (optional)
- Parameters:
_df (
DataFrame
) – The pandas dataframe to be writtentable (
str
) – The destination table where data will be writtenconn_name (
str
) – The connection name where database credentials, host, etc will be retrieved from the connection storeconnection_owner (
str
) – (optional) The owner of the connection being retrieved, this defaults to the current userif_exists (
str
) – What to do if the table already exists: ‘fail’, ‘overwrite’, ‘append’kwargs – additional keyword arguments, for databricks connections http_path can be supplied here to override the default http_path stored in the connection store
- Returns:
- save_asset(asset, overwrite=False, asset_id=None)#
Save an Asset. This is useful for storing objects, datasets, models to be used in other flows or within the current flow. Assets are persisted until they are deleted (even if the flow they were created in is deleted)
- Parameters:
asset (
Asset
) – The asset object to save.overwrite (
Optional
[bool
]) – bool: Overwrite the existing asset with the same label and type if it existsasset_id (
Optional
[str
]) – str: An optional asset_id to use when overwriting
- save_output(data, name)#
Save the intermediate value of some information for access in a later step
- Parameters:
data (
Union
[DataFrame
,Series
,dict
,int
,float
,BytesIO
]) – The data to save.name (
str
) – The label to use the access this data in a later step.
- Raises:
ValueError – If the data passed in is no pickleable.
ValueError – An invalid persistence method is used.
- static str_2_connection_type(list_connection_type)#
Converts a list of strings representing the connection type to a list of ConnectionType
- Parameters:
list_connection_type (
List
[str
]) – list of strings representing the connection type- Return type:
List
[ConnectionType
]- Returns:
list of ConnectionType
- update_page(page)#
Update a page. This is usually called from within a step to dynamically update content on the page as the step is running.
- Parameters:
page (
Page
) – The Page object containing updates.
- update_progress(completion, message, page_update=False)#
Update the progress of the step as it’s running. It is recommended to use this when steps have operations that can take a long time.
- Parameters:
completion – The progress to completion (0 to 100).
message – The message to show at this level of completion.
- Returns:
True if the progress message was sent successfully.