pandas#

class predict_backend.ml.nlp.handlers.pandas.PandasHandler(narrative_feature, document_identifier, feature_names, components, overwrite_data=True, buffer_size=1000)#

Bases: PersistenceHandler

Handle data produced by the NLP module. Every component that is PandasCompliant is compatible with this PersistenceHandler. This is responsible for extracting the data attached to a spacy doc object.

Parameters:
  • narrative_feature (str) – The feature containing the text.

  • document_identifier (str) – The id of the dataset.

  • feature_names (List[str]) – Features you want to save.

  • components (List[Type[PandasCompliant]]) – List of PandasCompliant class. This way we know of to work with these components.

  • overwrite_data (bool) – Whether to overwrite pre-existing data, defaults to True.

  • buffer_size – Size of the buffer, defaults to 1000.

get_doc_data(doc_id)#
Parameters:

doc_id – Identifier of the doc.

Return type:

Dict

Returns:

The base doc data info extracted from the doc plus the columns of the original dataset.

get_doc_entities(doc_id)#
Parameters:

doc_id – Identifier of the doc.

Return type:

DataFrame

Returns:

The query result applied on the entities table.

get_doc_events(doc_id)#
Parameters:

doc_id – Identifier of the doc.

Return type:

DataFrame

Returns:

The query result applied on the events table.

get_doc_ids()#
Returns:

A list of document ids with the relative ingestion time

get_table(table)#
Parameters:

table (str) – Name of the table you’re interested in.

Return type:

DataFrame

Returns:

DataFrame representing the data table produced by a component.

init_persistence()#

Init the persistence handler.

insert_doc(doc, row_data)#

Insert a doc into the persistence store. It also checks if the operation has to be buffered.

Parameters:
  • doc (Doc) – The spacy doc object to insert into the buffer

  • row_data – The extra features of the doc

start_buffered_ingestion()#

Initialize the buffer to speed the ingestion process.

stop_buffered_ingestion()#

Consume the buffer, merge the data and delete the buffer.