sqllite#

class predict_backend.ml.nlp.handlers.sqllite.SqlLiteHandler(db_path, uri, narrative_feature, document_identifier, feature_names, components, overwrite_data=True)#

Bases: PersistenceHandler

Handle data produced by the NLP module. Every component that is SqlLiteHandler is compatible with this PersistenceHandler. This kind of handler might be deprecated soon and contains not implemented methods.

Parameters:
  • db_path (str) – DB connection string

  • uri (bool) – If the db_path is a URI

  • narrative_feature (str) – The feature containing the text.

  • document_identifier (str) – The id of the dataset.

  • feature_names (List[str]) – Features you want to save.

  • components (List[Type[SqlCompliant]]) – List of PandasCompliant class. This way we know of to work with these components.

  • overwrite_data (bool) – Whether to overwrite pre-existing data, defaults to True.

get_doc_data(doc_id)#
Parameters:

doc_id – Identifier of the doc.

Return type:

Dict

Returns:

The base doc data info extracted from the doc plus the columns of the original dataset.

get_doc_entities(doc_id)#
Parameters:

doc_id – Identifier of the doc.

Return type:

DataFrame

Returns:

The query result applied on the entities table.

get_doc_events(doc_id)#
Parameters:

doc_id – Identifier of the doc.

Return type:

DataFrame

Returns:

The query result applied on the events table.

get_doc_ids()#
Returns:

A list of document ids with the relative ingestion time

get_table(table)#
Parameters:

table (str) – Name of the table you’re interested in.

Returns:

DataFrame representing the data table produced by a component.

init_components()#

Init component persistence.

init_persistence()#

Init the persistence handler.

initialize_database()#

Init the db with the required tables.

insert_doc(doc, row_data)#

Insert a doc into the persistence.

Parameters:
  • doc (Doc) – The spacy doc object to insert into the buffer

  • row_data – The extra features of the doc

start_buffered_ingestion()#

Initialize the buffer to speed the ingestion process.

stop_buffered_ingestion()#

Consume the buffer, merge the data and delete the buffer.