medcat.utils.ner.metrics

Attributes

logger

Functions

`metrics`(p[, return_df, plus_recall, tokenizer, ...])	Calculate metrics for a model's predictions, based off the tokenized
`_anno_within_pred_list`(label, preds)	Check if a label is within a list of predictions,
`evaluate_predictions`(true_annotations, all_preds, ...)	Evaluate predictions against sets of collected labels as collected and

Module Contents

medcat.utils.ner.metrics.logger

medcat.utils.ner.metrics.metrics(p, return_df=False, plus_recall=0, tokenizer=None, dataset=None, merged_negative={0, 1, -100}, padding_label=-100, csize=15, subword_label=1, verbose=False)

Calculate metrics for a model’s predictions, based off the tokenized output of a MedCATTrainer project.

Parameters:

p – The model’s predictions.
return_df – Whether to return a DataFrame of metrics.
plus_recall – The recall to add to the model’s predictions.
tokenizer – The tokenizer used to tokenize the texts.
dataset – The dataset used to train the model.
merged_negative – The negative labels to merge.
padding_label – The padding label.
csize – The size of the context window.
subword_label – The subword label.
verbose – Whether to print the metrics.

Returns:

Dict – A dictionary of metrics.

medcat.utils.ner.metrics._anno_within_pred_list(label, preds)

Check if a label is within a list of predictions,

Parameters:

label (Dict) – an annotation likely from a MedCATTrainer project
preds (List[Dict]) – a list of predictions likely from a cat.__call__

Returns:

bool – True if the label is within the list of predictions, False otherwise

Return type:

bool

medcat.utils.ner.metrics.evaluate_predictions(true_annotations, all_preds, texts, cui2preferred_name)

Evaluate predictions against sets of collected labels as collected and utput from a MedCATTrainer project. Counts predictions as correct if the prediction fully encloses the label.

Parameters:

true_annotations (list[list[dict]]) – Ground truth predictions by text
all_preds (list[list[dict]]) – Model predictions by text
texts (list[str]) – Original list of texts
cui2preferred_name (dict[str, str]) – Dictionary of CUI to preferred name, likely to be cat.cdb.cui2preferred_name.

Returns:

tuple[pd.DataFrame, Dict] – A tuple containing a DataFrame of evaluation metrics and a dictionary of missed annotations per CUI.

Return type:

tuple[pandas.DataFrame, dict]