pandas#
- class predict_backend.ml.nlp.handlers.pandas.PandasHandler(narrative_feature, document_identifier, feature_names, components, overwrite_data=True, buffer_size=1000)#
Bases:
PersistenceHandler
Handle data produced by the NLP module. Every component that is PandasCompliant is compatible with this PersistenceHandler. This is responsible for extracting the data attached to a spacy doc object.
- Parameters:
narrative_feature (
str
) – The feature containing the text.document_identifier (
str
) – The id of the dataset.feature_names (
List
[str
]) – Features you want to save.components (
List
[Type
[PandasCompliant
]]) – List of PandasCompliant class. This way we know of to work with these components.overwrite_data (
bool
) – Whether to overwrite pre-existing data, defaults to True.buffer_size – Size of the buffer, defaults to 1000.
- get_doc_data(doc_id)#
- Parameters:
doc_id – Identifier of the doc.
- Return type:
Dict
- Returns:
The base doc data info extracted from the doc plus the columns of the original dataset.
- get_doc_entities(doc_id)#
- Parameters:
doc_id – Identifier of the doc.
- Return type:
DataFrame
- Returns:
The query result applied on the entities table.
- get_doc_events(doc_id)#
- Parameters:
doc_id – Identifier of the doc.
- Return type:
DataFrame
- Returns:
The query result applied on the events table.
- get_doc_ids()#
- Returns:
A list of document ids with the relative ingestion time
- get_table(table)#
- Parameters:
table (
str
) – Name of the table you’re interested in.- Return type:
DataFrame
- Returns:
DataFrame representing the data table produced by a component.
- init_persistence()#
Init the persistence handler.
- insert_doc(doc, row_data)#
Insert a doc into the persistence store. It also checks if the operation has to be buffered.
- Parameters:
doc (
Doc
) – The spacy doc object to insert into the bufferrow_data – The extra features of the doc
- start_buffered_ingestion()#
Initialize the buffer to speed the ingestion process.
- stop_buffered_ingestion()#
Consume the buffer, merge the data and delete the buffer.