medcat.stats.kfold ================== .. py:module:: medcat.stats.kfold Attributes ---------- .. autoapisummary:: medcat.stats.kfold.MedCATTrainerExportProjectInfo medcat.stats.kfold.IntValuedMetric medcat.stats.kfold.FloatValuedMetric Classes ------- .. autoapisummary:: medcat.stats.kfold.CAT medcat.stats.kfold.MedCATTrainerExport medcat.stats.kfold.MedCATTrainerExportProject medcat.stats.kfold.MedCATTrainerExportDocument medcat.stats.kfold.MedCATTrainerExportAnnotation medcat.stats.kfold.SplitType medcat.stats.kfold.FoldCreator medcat.stats.kfold.SimpleFoldCreator medcat.stats.kfold.PerDocsFoldCreator medcat.stats.kfold.PerAnnsFoldCreator medcat.stats.kfold.WeightedDocumentsCreator medcat.stats.kfold.PerCUIMetrics Functions --------- .. autoapisummary:: medcat.stats.kfold.captured_state_cdb medcat.stats.kfold.get_stats medcat.stats.kfold.count_all_annotations medcat.stats.kfold.count_all_docs medcat.stats.kfold.get_nr_of_annotations medcat.stats.kfold.iter_anns medcat.stats.kfold.iter_docs medcat.stats.kfold.get_fold_creator medcat.stats.kfold.get_per_fold_metrics medcat.stats.kfold._merge_examples medcat.stats.kfold._add_helper medcat.stats.kfold._add_weighted_helper medcat.stats.kfold.get_metrics_mean medcat.stats.kfold.get_k_fold_stats Module Contents --------------- .. py:class:: CAT(cdb, vocab = None, config = None, model_load_path = None) Bases: :py:obj:`medcat.storage.serialisables.AbstractSerialisable` This is a collection of serialisable model parts. .. py:method:: __init__(cdb, vocab = None, config = None, model_load_path = None) .. py:attribute:: cdb .. py:attribute:: vocab :value: None .. py:attribute:: config :value: None .. py:attribute:: _trainer :type: Optional[medcat.trainer.Trainer] :value: None .. py:attribute:: _pipeline .. py:attribute:: usage_monitor .. py:method:: _recreate_pipe(model_load_path = None) .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: __call__(text) .. py:method:: _ensure_not_training() Method to ensure config is not set to train. `config.components.linking.train` should only be True while training and not during inference. This aalso corrects the setting if necessary. .. py:method:: get_entities(text: str, only_cui: Literal[False] = False) -> medcat.data.entities.Entities get_entities(text: str, only_cui: Literal[True] = True) -> medcat.data.entities.OnlyCUIEntities get_entities(text: str, only_cui: bool = False) -> Union[dict, medcat.data.entities.Entities, medcat.data.entities.OnlyCUIEntities] Get the entities recognised and linked within the provided text. This will run the text through the pipeline and annotated the recognised and linked entities. :param text: The text to use. :type text: str :param only_cui: Whether to only output the CUIs rather than the entire context. Defaults to False. :type only_cui: bool, optional :Returns: **Union[dict, Entities, OnlyCUIEntities]** -- The entities found and linked within the text. .. py:method:: _mp_worker_func(texts_and_indices) .. py:method:: _generate_batches_by_char_length(text_iter, batch_size_chars, only_cui) .. py:method:: _generate_batches(text_iter, batch_size, batch_size_chars, only_cui) .. py:method:: _generate_simple_batches(text_iter, batch_size, only_cui) .. py:method:: _mp_one_batch_per_process(executor, batch_iter, external_processes) .. py:method:: get_entities_multi_texts(texts, only_cui = False, n_process = 1, batch_size = -1, batch_size_chars = 1000000) Get entities from multiple texts (potentially in parallel). If `n_process` > 1, `n_process - 1` new processes will be created and data will be processed on those as well as the main process in parallel. :param texts: The input text. Either an iterable of raw text or one with in the format of `(text_index, text)`. :type texts: Union[Iterable[str], Iterable[tuple[str, str]]] :param only_cui: Whether to only return CUIs rather than other information like start/end and annotated value. Defaults to False. :type only_cui: bool :param n_process: Number of processes to use. Defaults to 1. :type n_process: int :param batch_size: The number of texts to batch at a time. A batch of the specified size will be given to each worker process. Defaults to -1 and in this case the character count will be used instead. :type batch_size: int :param batch_size_chars: The maximum number of characters to process in a batch. Each process will be given batch of texts with a total number of characters not exceeding this value. Defaults to 1,000,000 characters. Set to -1 to disable. :type batch_size_chars: int :Yields: *Iterator[tuple[str, Union[dict, Entities, OnlyCUIEntities]]]* -- The results in the format of (text_index, entities). .. py:method:: _get_entity(ent, doc_tokens, cui) .. py:method:: get_addon_output(ent) Get the addon output for the entity. This includes a key-value pair for each addon that provides some. Sometimes same-type addons may combine their output under the same key. :param ent: The entity in quesiton. :type ent: MutableEntity :raises ValueError: If unable to merge multiple addon output. :Returns: **dict[str, dict]** -- All the addon output. .. py:method:: _doc_to_out_entity(ent, doc_tokens, only_cui) .. py:method:: _doc_to_out(doc, only_cui, out_with_text = False) .. py:property:: trainer The trainer object. .. py:method:: save_model_pack(target_folder, pack_name = DEFAULT_PACK_NAME, serialiser_type = 'dill', make_archive = True, only_archive = False, add_hash_to_pack_name = True, change_description = None) Save model pack. The resulting model pack name will have the hash of the model pack in its name if (and only if) the default model pack name is used. :param target_folder: The folder to save the pack in. :type target_folder: str :param pack_name: The model pack name. Defaults to DEFAULT_PACK_NAME. :type pack_name: str, optional :param serialiser_type: The serialiser type. Defaults to 'dill'. :type serialiser_type: Union[str, AvailableSerialisers], optional :param make_archive: Whether to make the arhive /.zip file. Defaults to True. :type make_archive: bool :param only_archive: Whether to clear the non-compressed folder. Defaults to False. :type only_archive: bool :param add_hash_to_pack_name: Whether to add the hash to the pack name. This is only relevant if pack_name is specified. Defaults to True. :type add_hash_to_pack_name: bool :param change_description: If provided, this the description will be added to the model description. Defaults to None. :type change_description: Optional[str] :Returns: **str** -- The final model pack path. .. py:method:: _get_hash() .. py:method:: _versioning(change_description) .. py:method:: attempt_unpack(zip_path) :classmethod: Attempt unpack the zip to a folder and get the model pack path. If the folder already exists, no unpacking is done. :param zip_path: The ZIP path :type zip_path: str :Returns: **str** -- The model pack path .. py:method:: load_model_pack(model_pack_path) :classmethod: Load the model pack from file. :param model_pack_path: The model pack path. :type model_pack_path: str :raises ValueError: If the saved data does not represent a model pack. :Returns: **CAT** -- The loaded model pack. .. py:method:: load_cdb(model_pack_path) :classmethod: Loads the concept database from the provided model pack path :param model_pack_path: path to model pack, zip or dir. :type model_pack_path: str :Returns: **CDB** -- The loaded concept database .. py:method:: get_model_card(as_dict: Literal[True]) -> medcat.data.model_card.ModelCard get_model_card(as_dict: Literal[False]) -> str Get the model card either a (nested) `dict` or a json string. :param as_dict: Whether to return as dict. Defaults to False. :type as_dict: bool :Returns: **Union[str, ModelCard]** -- The model card. .. py:method:: __eq__(other) .. py:method:: add_addon(addon) .. py:method:: get_strategy() .. py:method:: include_properties() :classmethod: .. py:function:: captured_state_cdb(cdb, save_state_to_disk = False) A context manager that captures and re-applies the initial CDB state. The context manager captures/copies the initial state of the CDB when entering. It then allows the user to modify the state (i.e training). Upon exit re-applies the initial CDB state. If RAM is an issue, it is recommended to use `save_state_to_disk`. Otherwise the copy of the original state will be held in memory. If saved on disk, a temporary file is used and removed afterwards. :param cdb: The CDB to use. :param save_state_to_disk: Whether to save state on disk or hold in memory. Defaults to False. :type save_state_to_disk: bool :Yields: None .. py:function:: get_stats(cat, data, epoch = 0, use_project_filters = False, use_overlaps = False, extra_cui_filter = None, do_print = True) TODO: Refactor and make nice Print metrics on a dataset (F1, P, R), it will also print the concepts that have the most FP,FN,TP. :param cat: (CAT): The model pack. :param data: The json object that we get from MedCATtrainer on export. :type data: dict :param epoch: Used during training, so we know what epoch is it. :type epoch: int :param use_project_filters: Each project in MedCATtrainer can have filters, do we want to respect those filters when calculating metrics. :type use_project_filters: bool :param use_overlaps: Allow overlapping entities, nearly always False as it is very difficult to annotate overlapping entities. :type use_overlaps: bool :param use_cui_doc_limit: If True the metrics for a CUI will be only calculated if that CUI appears in a document, in other words if the document was annotated for that CUI. Useful in very specific situations when during the annotation process the set of CUIs changed. :type use_cui_doc_limit: bool :param use_groups: If True concepts that have groups will be combined and stats will be reported on groups. :type use_groups: bool :param extra_cui_filter: This filter will be intersected with all other filters, or if all others are not set then only this one will be used. :type extra_cui_filter: Optional[set] :param do_print: Whether to print stats out. Defaults to True. :type do_print: bool :Returns: * **fps** (*dict*) -- False positives for each CUI. * **fns** (*dict*) -- False negatives for each CUI. * **tps** (*dict*) -- True positives for each CUI. * **cui_prec** (*dict*) -- Precision for each CUI. * **cui_rec** (*dict*) -- Recall for each CUI. * **cui_f1** (*dict*) -- F1 for each CUI. * **cui_counts** (*dict*) -- Number of occurrence for each CUI. * **examples** (*dict*) -- Examples for each of the fp, fn, tp. Format will be examples['fp']['cui'][]. .. py:class:: MedCATTrainerExport Bases: :py:obj:`typing_extensions.TypedDict` dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) .. py:attribute:: projects :type: list[MedCATTrainerExportProject] .. py:method:: __contains__() True if the dictionary has the specified key, else False. .. py:method:: __delattr__() Implement delattr(self, name). .. py:method:: __delitem__() Delete self[key]. .. py:method:: __dir__() Default dir() implementation. .. py:method:: __eq__() Return self==value. .. py:method:: __format__() Default object formatter. .. py:method:: __ge__() Return self>=value. .. py:method:: __getattribute__() Return getattr(self, name). .. py:method:: __getitem__() x.__getitem__(y) <==> x[y] .. py:method:: __gt__() Return self>value. .. py:method:: __init__() Initialize self. See help(type(self)) for accurate signature. .. py:method:: __ior__() Return self|=value. .. py:method:: __iter__() Implement iter(self). .. py:method:: __le__() Return self<=value. .. py:method:: __len__() Return len(self). .. py:method:: __lt__() Return self size of D in memory, in bytes .. py:method:: __str__() Return str(self). .. py:method:: __subclasshook__() Abstract classes can override this to customize issubclass(). This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached). .. py:method:: clear() D.clear() -> None. Remove all items from D. .. py:method:: copy() D.copy() -> a shallow copy of D .. py:method:: get() Return the value for key if key is in the dictionary, else default. .. py:method:: items() D.items() -> a set-like object providing a view on D's items .. py:method:: keys() D.keys() -> a set-like object providing a view on D's keys .. py:method:: pop() D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If the key is not found, return the default if given; otherwise, raise a KeyError. .. py:method:: popitem() Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. .. py:method:: setdefault() Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. .. py:method:: update() D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] .. py:method:: values() D.values() -> an object providing a view on D's values .. py:class:: MedCATTrainerExportProject Bases: :py:obj:`typing_extensions.TypedDict` dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) .. py:attribute:: name :type: str .. py:attribute:: id :type: Any .. py:attribute:: cuis :type: str .. py:attribute:: tuis :type: Optional[str] .. py:attribute:: documents :type: list[MedCATTrainerExportDocument] .. py:method:: __contains__() True if the dictionary has the specified key, else False. .. py:method:: __delattr__() Implement delattr(self, name). .. py:method:: __delitem__() Delete self[key]. .. py:method:: __dir__() Default dir() implementation. .. py:method:: __eq__() Return self==value. .. py:method:: __format__() Default object formatter. .. py:method:: __ge__() Return self>=value. .. py:method:: __getattribute__() Return getattr(self, name). .. py:method:: __getitem__() x.__getitem__(y) <==> x[y] .. py:method:: __gt__() Return self>value. .. py:method:: __init__() Initialize self. See help(type(self)) for accurate signature. .. py:method:: __ior__() Return self|=value. .. py:method:: __iter__() Implement iter(self). .. py:method:: __le__() Return self<=value. .. py:method:: __len__() Return len(self). .. py:method:: __lt__() Return self size of D in memory, in bytes .. py:method:: __str__() Return str(self). .. py:method:: __subclasshook__() Abstract classes can override this to customize issubclass(). This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached). .. py:method:: clear() D.clear() -> None. Remove all items from D. .. py:method:: copy() D.copy() -> a shallow copy of D .. py:method:: get() Return the value for key if key is in the dictionary, else default. .. py:method:: items() D.items() -> a set-like object providing a view on D's items .. py:method:: keys() D.keys() -> a set-like object providing a view on D's keys .. py:method:: pop() D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If the key is not found, return the default if given; otherwise, raise a KeyError. .. py:method:: popitem() Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. .. py:method:: setdefault() Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. .. py:method:: update() D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] .. py:method:: values() D.values() -> an object providing a view on D's values .. py:class:: MedCATTrainerExportDocument Bases: :py:obj:`typing_extensions.TypedDict` dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) .. py:attribute:: name :type: str .. py:attribute:: id :type: Any .. py:attribute:: last_modified :type: str .. py:attribute:: text :type: str .. py:attribute:: annotations :type: list[MedCATTrainerExportAnnotation] .. py:method:: __contains__() True if the dictionary has the specified key, else False. .. py:method:: __delattr__() Implement delattr(self, name). .. py:method:: __delitem__() Delete self[key]. .. py:method:: __dir__() Default dir() implementation. .. py:method:: __eq__() Return self==value. .. py:method:: __format__() Default object formatter. .. py:method:: __ge__() Return self>=value. .. py:method:: __getattribute__() Return getattr(self, name). .. py:method:: __getitem__() x.__getitem__(y) <==> x[y] .. py:method:: __gt__() Return self>value. .. py:method:: __init__() Initialize self. See help(type(self)) for accurate signature. .. py:method:: __ior__() Return self|=value. .. py:method:: __iter__() Implement iter(self). .. py:method:: __le__() Return self<=value. .. py:method:: __len__() Return len(self). .. py:method:: __lt__() Return self size of D in memory, in bytes .. py:method:: __str__() Return str(self). .. py:method:: __subclasshook__() Abstract classes can override this to customize issubclass(). This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached). .. py:method:: clear() D.clear() -> None. Remove all items from D. .. py:method:: copy() D.copy() -> a shallow copy of D .. py:method:: get() Return the value for key if key is in the dictionary, else default. .. py:method:: items() D.items() -> a set-like object providing a view on D's items .. py:method:: keys() D.keys() -> a set-like object providing a view on D's keys .. py:method:: pop() D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If the key is not found, return the default if given; otherwise, raise a KeyError. .. py:method:: popitem() Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. .. py:method:: setdefault() Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. .. py:method:: update() D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] .. py:method:: values() D.values() -> an object providing a view on D's values .. py:class:: MedCATTrainerExportAnnotation Bases: :py:obj:`MedCATTrainerExportAnnotationRequired` dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) .. py:attribute:: id :type: Union[str, int] .. py:attribute:: validated :type: Optional[bool] .. py:attribute:: start :type: int .. py:attribute:: end :type: int .. py:attribute:: cui :type: str .. py:attribute:: value :type: str .. py:method:: __contains__() True if the dictionary has the specified key, else False. .. py:method:: __delattr__() Implement delattr(self, name). .. py:method:: __delitem__() Delete self[key]. .. py:method:: __dir__() Default dir() implementation. .. py:method:: __eq__() Return self==value. .. py:method:: __format__() Default object formatter. .. py:method:: __ge__() Return self>=value. .. py:method:: __getattribute__() Return getattr(self, name). .. py:method:: __getitem__() x.__getitem__(y) <==> x[y] .. py:method:: __gt__() Return self>value. .. py:method:: __init__() Initialize self. See help(type(self)) for accurate signature. .. py:method:: __ior__() Return self|=value. .. py:method:: __iter__() Implement iter(self). .. py:method:: __le__() Return self<=value. .. py:method:: __len__() Return len(self). .. py:method:: __lt__() Return self size of D in memory, in bytes .. py:method:: __str__() Return str(self). .. py:method:: __subclasshook__() Abstract classes can override this to customize issubclass(). This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached). .. py:method:: clear() D.clear() -> None. Remove all items from D. .. py:method:: copy() D.copy() -> a shallow copy of D .. py:method:: get() Return the value for key if key is in the dictionary, else default. .. py:method:: items() D.items() -> a set-like object providing a view on D's items .. py:method:: keys() D.keys() -> a set-like object providing a view on D's keys .. py:method:: pop() D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If the key is not found, return the default if given; otherwise, raise a KeyError. .. py:method:: popitem() Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. .. py:method:: setdefault() Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. .. py:method:: update() D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] .. py:method:: values() D.values() -> an object providing a view on D's values .. py:function:: count_all_annotations(export) Count the number of annotations in a trainer export. :param export: The trainer export. :type export: MedCATTrainerExport :Returns: **int** -- The total number of annotations. .. py:function:: count_all_docs(export) Count the number of documents in a trainer export. :param export: The trainer export. :type export: MedCATTrainerExport :Returns: **int** -- The total number of documents. .. py:function:: get_nr_of_annotations(doc) Get the number of annotations for a tariner export document. :param doc: The trainer export document. :type doc: MedCATTrainerExportDocument :Returns: **int** -- The number of annotations within the document. .. py:function:: iter_anns(export) Iterate over all the annotations in a trainer export. :param export: The trainer export. :type export: MedCATTrainerExport :Yields: Iterator[tuple[MedCATTrainerExportProjectInfo, MedCATTrainerExportDocument, MedCATTrainerExportAnnotation]]: The project info, the document, and the annotation. .. py:function:: iter_docs(export) Iterate over all the docs in a trainer export. :param export: The trainer export. :type export: MedCATTrainerExport :Yields: Iterator[tuple[MedCATTrainerExportProjectInfo, MedCATTrainerExportDocument]]: The project info and the document. .. py:data:: MedCATTrainerExportProjectInfo The project name, project ID, CUIs str, and TUIs str .. py:class:: SplitType Bases: :py:obj:`enum.Enum` The split type. .. py:attribute:: DOCUMENTS Split over number of documents. .. py:attribute:: ANNOTATIONS Split over number of annotations. .. py:attribute:: DOCUMENTS_WEIGHTED Split over number of documents based on the number of annotations. So essentially this ensures that the same document isn't in 2 folds while trying to more equally distribute documents with different number of annotations. For example: If we have 6 documents that we want to split into 3 folds. The number of annotations per document are as follows: [40, 40, 20, 10, 5, 5] If we were to split this trivially over documents, we'd end up with the 3 folds with number of annotations that are far from even: [80, 30, 10] However, if we use the annotations as weights, we would be able to create folds that have more evenly distributed annotations, e.g: [[D1,], [D2], [D3, D4, D5, D6]] where D# denotes the number of the documents, with the number of annotations being equal: [ 40, 40, 20 + 10 + 5 + 5 = 40] .. py:method:: __new__(value) .. py:method:: _generate_next_value_(start, count, last_values) Generate the next value when not given. name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None .. py:method:: _missing_(value) :classmethod: .. py:method:: __repr__() .. py:method:: __str__() .. py:method:: __dir__() Returns all members and all public methods .. py:method:: __format__(format_spec) Returns format using actual value type unless __str__ has been overridden. .. py:method:: __hash__() .. py:method:: __reduce_ex__(proto) .. py:method:: name() The name of the Enum member. .. py:method:: value() The value of the Enum member. .. py:class:: FoldCreator(mct_export, nr_of_folds) Bases: :py:obj:`abc.ABC` The FoldCreator based on a MCT export. :param mct_export: The MCT export dict. :type mct_export: MedCATTrainerExport :param nr_of_folds: Number of folds to create. :type nr_of_folds: int :param use_annotations: Whether to fold on number of annotations or documents. :type use_annotations: bool .. py:method:: __init__(mct_export, nr_of_folds) .. py:attribute:: mct_export .. py:attribute:: nr_of_folds .. py:method:: _find_or_add_doc(project, orig_doc) .. py:method:: _create_new_project(proj_info) .. py:method:: _create_export_with_documents(relevant_docs) .. py:method:: create_folds() :abstractmethod: Create folds. :raises ValueError: If something went wrong. :Returns: **list[MedCATTrainerExport]** -- The created folds. .. py:attribute:: __slots__ :value: () .. py:class:: SimpleFoldCreator(mct_export, nr_of_folds, counter) Bases: :py:obj:`FoldCreator` The FoldCreator based on a MCT export. :param mct_export: The MCT export dict. :type mct_export: MedCATTrainerExport :param nr_of_folds: Number of folds to create. :type nr_of_folds: int :param use_annotations: Whether to fold on number of annotations or documents. :type use_annotations: bool .. py:method:: __init__(mct_export, nr_of_folds, counter) .. py:attribute:: _counter .. py:attribute:: total .. py:attribute:: per_fold .. py:method:: _init_per_fold() .. py:method:: _create_fold(fold_nr) :abstractmethod: .. py:method:: create_folds() Create folds. :raises ValueError: If something went wrong. :Returns: **list[MedCATTrainerExport]** -- The created folds. .. py:attribute:: mct_export .. py:attribute:: nr_of_folds .. py:method:: _find_or_add_doc(project, orig_doc) .. py:method:: _create_new_project(proj_info) .. py:method:: _create_export_with_documents(relevant_docs) .. py:attribute:: __slots__ :value: () .. py:class:: PerDocsFoldCreator(mct_export, nr_of_folds) Bases: :py:obj:`FoldCreator` The FoldCreator based on a MCT export. :param mct_export: The MCT export dict. :type mct_export: MedCATTrainerExport :param nr_of_folds: Number of folds to create. :type nr_of_folds: int :param use_annotations: Whether to fold on number of annotations or documents. :type use_annotations: bool .. py:method:: __init__(mct_export, nr_of_folds) .. py:attribute:: nr_of_docs .. py:attribute:: per_doc_simple .. py:attribute:: _all_docs .. py:method:: _create_fold(fold_nr) .. py:method:: create_folds() Create folds. :raises ValueError: If something went wrong. :Returns: **list[MedCATTrainerExport]** -- The created folds. .. py:attribute:: mct_export .. py:attribute:: nr_of_folds .. py:method:: _find_or_add_doc(project, orig_doc) .. py:method:: _create_new_project(proj_info) .. py:method:: _create_export_with_documents(relevant_docs) .. py:attribute:: __slots__ :value: () .. py:class:: PerAnnsFoldCreator(mct_export, nr_of_folds) Bases: :py:obj:`SimpleFoldCreator` The FoldCreator based on a MCT export. :param mct_export: The MCT export dict. :type mct_export: MedCATTrainerExport :param nr_of_folds: Number of folds to create. :type nr_of_folds: int :param use_annotations: Whether to fold on number of annotations or documents. :type use_annotations: bool .. py:method:: __init__(mct_export, nr_of_folds) .. py:method:: _add_target_ann(project, orig_doc, ann) .. py:method:: _targets(start_at) .. py:method:: _create_fold(fold_nr) .. py:attribute:: _counter .. py:attribute:: total .. py:attribute:: per_fold .. py:method:: _init_per_fold() .. py:method:: create_folds() Create folds. :raises ValueError: If something went wrong. :Returns: **list[MedCATTrainerExport]** -- The created folds. .. py:attribute:: mct_export .. py:attribute:: nr_of_folds .. py:method:: _find_or_add_doc(project, orig_doc) .. py:method:: _create_new_project(proj_info) .. py:method:: _create_export_with_documents(relevant_docs) .. py:attribute:: __slots__ :value: () .. py:class:: WeightedDocumentsCreator(mct_export, nr_of_folds, weight_calculator) Bases: :py:obj:`FoldCreator` The FoldCreator based on a MCT export. :param mct_export: The MCT export dict. :type mct_export: MedCATTrainerExport :param nr_of_folds: Number of folds to create. :type nr_of_folds: int :param use_annotations: Whether to fold on number of annotations or documents. :type use_annotations: bool .. py:method:: __init__(mct_export, nr_of_folds, weight_calculator) .. py:attribute:: _weight_calculator .. py:attribute:: _weighted_docs .. py:method:: create_folds() Create folds. :raises ValueError: If something went wrong. :Returns: **list[MedCATTrainerExport]** -- The created folds. .. py:attribute:: mct_export .. py:attribute:: nr_of_folds .. py:method:: _find_or_add_doc(project, orig_doc) .. py:method:: _create_new_project(proj_info) .. py:method:: _create_export_with_documents(relevant_docs) .. py:attribute:: __slots__ :value: () .. py:function:: get_fold_creator(mct_export, nr_of_folds, split_type) Get the appropriate fold creator. :param mct_export: The MCT export. :type mct_export: MedCATTrainerExport :param nr_of_folds: Number of folds to use. :type nr_of_folds: int :param split_type: The type of split to use. :type split_type: SplitType :raises ValueError: In case of an unknown split type. :Returns: **FoldCreator** -- The corresponding fold creator. .. py:function:: get_per_fold_metrics(cat, folds, use_project_filters, *args, **kwargs) Get per fold metrics for a given set of folds. This method captures the state of the before processing each fold. For each fold, it trains on all other folds, and runs metrics on the fold itself. :param cat: The model pack. :type cat: CAT :param folds: The folds. :type folds: list[MedCATTrainerExport] :param use_project_filters: Whether to use project filters. :type use_project_filters: bool :Returns: **list[tuple]** -- The metrics for each fold. .. py:function:: _merge_examples(all_examples, cur_examples) .. py:data:: IntValuedMetric .. py:data:: FloatValuedMetric .. py:class:: PerCUIMetrics(/, **data) Bases: :py:obj:`pydantic.BaseModel` Usage docs: https://docs.pydantic.dev/2.9/concepts/models/ A base class for creating Pydantic models. .. attribute:: __class_vars__ The names of the class variables defined on the model. .. attribute:: __private_attributes__ Metadata about the private attributes of the model. .. attribute:: __signature__ The synthesized `__init__` [`Signature`][inspect.Signature] of the model. .. attribute:: __pydantic_complete__ Whether model building is completed, or if there are still undefined fields. .. attribute:: __pydantic_core_schema__ The core schema of the model. .. attribute:: __pydantic_custom_init__ Whether the model has a custom `__init__` function. .. attribute:: __pydantic_decorators__ Metadata containing the decorators defined on the model. This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1. .. attribute:: __pydantic_generic_metadata__ Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these. .. attribute:: __pydantic_parent_namespace__ Parent namespace of the model, used for automatic rebuilding of models. .. attribute:: __pydantic_post_init__ The name of the post-init method for the model, if defined. .. attribute:: __pydantic_root_model__ Whether the model is a [`RootModel`][pydantic.root_model.RootModel]. .. attribute:: __pydantic_serializer__ The `pydantic-core` `SchemaSerializer` used to dump instances of the model. .. attribute:: __pydantic_validator__ The `pydantic-core` `SchemaValidator` used to validate instances of the model. .. attribute:: __pydantic_extra__ A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra] is set to `'allow'`. .. attribute:: __pydantic_fields_set__ The names of fields explicitly set during instantiation. .. attribute:: __pydantic_private__ Values of private attributes set on the model instance. .. py:attribute:: weights :type: list[Union[int, float]] :value: [] .. py:attribute:: vals :type: list[Union[int, float]] :value: [] .. py:method:: add(val, weight = 1) .. py:method:: get_mean() .. py:method:: get_std() .. py:attribute:: model_config :type: ClassVar[pydantic.config.ConfigDict] Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:attribute:: model_fields :type: ClassVar[Dict[str, pydantic.fields.FieldInfo]] Metadata about the fields defined on the model, mapping of field names to [`FieldInfo`][pydantic.fields.FieldInfo] objects. This replaces `Model.__fields__` from Pydantic V1. .. py:attribute:: model_computed_fields :type: ClassVar[Dict[str, pydantic.fields.ComputedFieldInfo]] A dictionary of computed field names and their corresponding `ComputedFieldInfo` objects. .. py:attribute:: __class_vars__ :type: ClassVar[set[str]] The names of the class variables defined on the model. .. py:attribute:: __private_attributes__ :type: ClassVar[Dict[str, pydantic.fields.ModelPrivateAttr]] Metadata about the private attributes of the model. .. py:attribute:: __signature__ :type: ClassVar[inspect.Signature] The synthesized `__init__` [`Signature`][inspect.Signature] of the model. .. py:attribute:: __pydantic_complete__ :type: ClassVar[bool] :value: False Whether model building is completed, or if there are still undefined fields. .. py:attribute:: __pydantic_core_schema__ :type: ClassVar[pydantic_core.CoreSchema] The core schema of the model. .. py:attribute:: __pydantic_custom_init__ :type: ClassVar[bool] Whether the model has a custom `__init__` method. .. py:attribute:: __pydantic_decorators__ :type: ClassVar[pydantic._internal._decorators.DecoratorInfos] Metadata containing the decorators defined on the model. This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1. .. py:attribute:: __pydantic_generic_metadata__ :type: ClassVar[pydantic._internal._generics.PydanticGenericMetadata] Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these. .. py:attribute:: __pydantic_parent_namespace__ :type: ClassVar[Dict[str, Any] | None] :value: None Parent namespace of the model, used for automatic rebuilding of models. .. py:attribute:: __pydantic_post_init__ :type: ClassVar[None | Literal['model_post_init']] The name of the post-init method for the model, if defined. .. py:attribute:: __pydantic_root_model__ :type: ClassVar[bool] :value: False Whether the model is a [`RootModel`][pydantic.root_model.RootModel]. .. py:attribute:: __pydantic_serializer__ :type: ClassVar[pydantic_core.SchemaSerializer] The `pydantic-core` `SchemaSerializer` used to dump instances of the model. .. py:attribute:: __pydantic_validator__ :type: ClassVar[pydantic_core.SchemaValidator | pydantic.plugin._schema_validator.PluggableSchemaValidator] The `pydantic-core` `SchemaValidator` used to validate instances of the model. .. py:attribute:: __pydantic_extra__ :type: dict[str, Any] | None A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra] is set to `'allow'`. .. py:attribute:: __pydantic_fields_set__ :type: set[str] The names of fields explicitly set during instantiation. .. py:attribute:: __pydantic_private__ :type: dict[str, Any] | None Values of private attributes set on the model instance. .. py:attribute:: __slots__ :value: ('__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__') .. py:method:: __init__(/, **data) Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:property:: model_extra :type: dict[str, Any] | None Get extra fields set during validation. :Returns: **A dictionary of extra fields, or `None` if `config.extra` is not set to `"allow"`.** .. py:property:: model_fields_set :type: set[str] Returns the set of fields that have been explicitly set on this model instance. :Returns: **A set of strings representing the fields that have been set,** -- i.e. that were not filled from defaults. .. py:method:: model_construct(_fields_set = None, **values) :classmethod: Creates a new instance of the `Model` class with validated data. Creates a new model setting `__dict__` and `__pydantic_fields_set__` from trusted or pre-validated data. Default values are respected, but no other validation is performed. !!! note `model_construct()` generally respects the `model_config.extra` setting on the provided model. That is, if `model_config.extra == 'allow'`, then all extra passed values are added to the model instance's `__dict__` and `__pydantic_extra__` fields. If `model_config.extra == 'ignore'` (the default), then all extra passed values are ignored. Because no validation is performed with a call to `model_construct()`, having `model_config.extra == 'forbid'` does not result in an error if extra values are passed, but they will be ignored. :param _fields_set: A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [`model_fields_set`][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the `values` argument will be used. :param values: Trusted or pre-validated data dictionary. :Returns: **A new instance of the `Model` class with validated data.** .. py:method:: model_copy(*, update = None, deep = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#model_copy Returns a copy of the model. :param update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. :param deep: Set to `True` to make a deep copy of the model. :Returns: **New model instance.** .. py:method:: model_dump(*, mode = 'python', include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A dictionary representation of the model.** .. py:method:: model_dump_json(*, indent = None, include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump_json Generates a JSON representation of the model using Pydantic's `to_json` method. :param indent: Indentation to use in the JSON output. If None is passed, the output will be compact. :param include: Field(s) to include in the JSON output. :param exclude: Field(s) to exclude from the JSON output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to serialize using field aliases. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A JSON string representation of the model.** .. py:method:: model_json_schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, schema_generator = GenerateJsonSchema, mode = 'validation') :classmethod: Generates a JSON schema for a model class. :param by_alias: Whether to use attribute aliases or not. :param ref_template: The reference template. :param schema_generator: To override the logic used to generate the JSON schema, as a subclass of `GenerateJsonSchema` with your desired modifications :param mode: The mode in which to generate the schema. :Returns: **The JSON schema for the given model class.** .. py:method:: model_parametrized_name(params) :classmethod: Compute the class name for parametrizations of generic classes. This method can be overridden to achieve a custom naming scheme for generic BaseModels. :param params: Tuple of types of the class. Given a generic class `Model` with 2 type variables and a concrete model `Model[str, int]`, the value `(str, int)` would be passed to `params`. :Returns: **String representing the new class where `params` are passed to `cls` as type variables.** :raises TypeError: Raised when trying to generate concrete names for non-generic models. .. py:method:: model_post_init(__context) Override this method to perform additional initialization after `__init__` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized. .. py:method:: model_rebuild(*, force = False, raise_errors = True, _parent_namespace_depth = 2, _types_namespace = None) :classmethod: Try to rebuild the pydantic-core schema for the model. This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails. :param force: Whether to force the rebuilding of the model schema, defaults to `False`. :param raise_errors: Whether to raise errors, defaults to `True`. :param _parent_namespace_depth: The depth level of the parent namespace, defaults to 2. :param _types_namespace: The types namespace, defaults to `None`. :Returns: * **Returns `None` if the schema is already "complete" and rebuilding was not required.** * **If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`.** .. py:method:: model_validate(obj, *, strict = None, from_attributes = None, context = None) :classmethod: Validate a pydantic model instance. :param obj: The object to validate. :param strict: Whether to enforce types strictly. :param from_attributes: Whether to extract data from object attributes. :param context: Additional context to pass to the validator. :raises ValidationError: If the object could not be validated. :Returns: **The validated model instance.** .. py:method:: model_validate_json(json_data, *, strict = None, context = None) :classmethod: Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing Validate the given JSON data against the Pydantic model. :param json_data: The JSON data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :Returns: **The validated Pydantic model.** :raises ValidationError: If `json_data` is not a JSON string or the object could not be validated. .. py:method:: model_validate_strings(obj, *, strict = None, context = None) :classmethod: Validate the given object with string data against the Pydantic model. :param obj: The object containing string data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :Returns: **The validated Pydantic model.** .. py:method:: __get_pydantic_core_schema__(source, handler, /) :classmethod: Hook into generating the model's CoreSchema. :param source: The class we are generating a schema for. This will generally be the same as the `cls` argument if this is a classmethod. :param handler: A callable that calls into Pydantic's internal CoreSchema generation logic. :Returns: **A `pydantic-core` `CoreSchema`.** .. py:method:: __get_pydantic_json_schema__(core_schema, handler, /) :classmethod: Hook into generating the model's JSON schema. :param core_schema: A `pydantic-core` CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema (`{'type': 'nullable', 'schema': current_schema}`), or just call the handler with the original schema. :param handler: Call into Pydantic's internal JSON schema generation. This will raise a `pydantic.errors.PydanticInvalidForJsonSchema` if JSON schema generation fails. Since this gets called by `BaseModel.model_json_schema` you can override the `schema_generator` argument to that function to change JSON schema generation globally for a type. :Returns: **A JSON schema, as a Python object.** .. py:method:: __pydantic_init_subclass__(**kwargs) :classmethod: This is intended to behave just like `__init_subclass__`, but is called by `ModelMetaclass` only after the class is actually fully initialized. In particular, attributes like `model_fields` will be present when this is called. This is necessary because `__init_subclass__` will always be called by `type.__new__`, and it would require a prohibitively large refactor to the `ModelMetaclass` to ensure that `type.__new__` was called in such a manner that the class would already be sufficiently initialized. This will receive the same `kwargs` that would be passed to the standard `__init_subclass__`, namely, any kwargs passed to the class definition that aren't used internally by pydantic. :param \*\*kwargs: Any keyword arguments passed to the class definition that aren't used internally by pydantic. .. py:method:: __class_getitem__(typevar_values) :classmethod: .. py:method:: __copy__() Returns a shallow copy of the model. .. py:method:: __deepcopy__(memo = None) Returns a deep copy of the model. .. py:method:: __getattr__(item) .. py:method:: _check_frozen(name, value) .. py:method:: __getstate__() .. py:method:: __setstate__(state) .. py:method:: __eq__(other) .. py:method:: __init_subclass__(**kwargs) :classmethod: This signature is included purely to help type-checkers check arguments to class declaration, which provides a way to conveniently set model_config key/value pairs. ```py from pydantic import BaseModel class MyModel(BaseModel, extra='allow'): ... ``` However, this may be deceiving, since the _actual_ calls to `__init_subclass__` will not receive any of the config arguments, and will only receive any keyword arguments passed during class initialization that are _not_ expected keys in ConfigDict. (This is due to the way `ModelMetaclass.__new__` works.) :param \*\*kwargs: Keyword arguments passed to the class definition, which set model_config .. note:: You may want to override `__pydantic_init_subclass__` instead, which behaves similarly but is called *after* the class is fully initialized. .. py:method:: __iter__() So `dict(model)` works. .. py:method:: __repr__() .. py:method:: __repr_args__() .. py:attribute:: __repr_name__ .. py:attribute:: __repr_str__ .. py:attribute:: __pretty__ .. py:attribute:: __rich_repr__ .. py:method:: __str__() .. py:property:: __fields__ :type: dict[str, pydantic.fields.FieldInfo] .. py:property:: __fields_set__ :type: set[str] .. py:method:: dict(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False) .. py:method:: json(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, encoder = PydanticUndefined, models_as_dict = PydanticUndefined, **dumps_kwargs) .. py:method:: parse_obj(obj) :classmethod: .. py:method:: parse_raw(b, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False) :classmethod: .. py:method:: parse_file(path, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False) :classmethod: .. py:method:: from_orm(obj) :classmethod: .. py:method:: construct(_fields_set = None, **values) :classmethod: .. py:method:: copy(*, include = None, exclude = None, update = None, deep = False) Returns a copy of the model. !!! warning "Deprecated" This method is now deprecated; use `model_copy` instead. If you need `include` or `exclude`, use: ```py data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) ``` :param include: Optional set or mapping specifying which fields to include in the copied model. :param exclude: Optional set or mapping specifying which fields to exclude in the copied model. :param update: Optional dictionary of field-value pairs to override field values in the copied model. :param deep: If True, the values of fields that are Pydantic models will be deep-copied. :Returns: **A copy of the model with included, excluded and updated fields as specified.** .. py:method:: schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE) :classmethod: .. py:method:: schema_json(*, by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, **dumps_kwargs) :classmethod: .. py:method:: validate(value) :classmethod: .. py:method:: update_forward_refs(**localns) :classmethod: .. py:method:: _iter(*args, **kwargs) .. py:method:: _copy_and_set_values(*args, **kwargs) .. py:method:: _get_value(*args, **kwargs) :classmethod: .. py:method:: _calculate_keys(*args, **kwargs) .. py:function:: _add_helper(joined, single) .. py:function:: _add_weighted_helper(joined, single, cui2count) .. py:function:: get_metrics_mean(metrics, include_std) The the mean of the provided metrics. :param metrics: The metrics. :type metrics: list[tuple[dict, dict, dict, dict, dict, dict, dict, dict] :param include_std: Whether to include the standard deviation. :type include_std: bool :Returns: * **fps** (*dict*) -- False positives for each CUI. * **fns** (*dict*) -- False negatives for each CUI. * **tps** (*dict*) -- True positives for each CUI. * **cui_prec** (*dict*) -- Precision for each CUI. * **cui_rec** (*dict*) -- Recall for each CUI. * **cui_f1** (*dict*) -- F1 for each CUI. * **cui_counts** (*dict*) -- Number of occurrence for each CUI. * **examples** (*dict*) -- Examples for each of the fp, fn, tp. Format will be examples['fp']['cui'][]. .. py:function:: get_k_fold_stats(cat, mct_export_data, k = 3, use_project_filters = False, split_type = SplitType.DOCUMENTS_WEIGHTED, include_std = False, *args, **kwargs) Get the k-fold stats for the model with the specified data. First this will split the MCT export into `k` folds. You can do this either per document or per-annotation. For each of the `k` folds, it will start from the base model, train it with with the other `k-1` folds and record the metrics. After that the base model state is restored before doing the next fold. After all the folds have been done, the metrics are averaged. :param cat: The model pack. :type cat: CAT :param mct_export_data: The MCT export. :type mct_export_data: MedCATTrainerExport :param k: The number of folds. Defaults to 3. :type k: int :param use_project_filters: Whether to use per project filters. Defaults to `False`. :type use_project_filters: bool :param split_type: Whether to use annodations or docs. Defaults to DOCUMENTS_WEIGHTED. :type split_type: SplitType :param include_std: Whether to include stanrdard deviation. Defaults to False. :type include_std: bool :param \*args: Arguments passed to the `CAT.train_supervised_raw` method. :param \*\*kwargs: Keyword arguments passed to the `CAT.train_supervised_raw` method. :Returns: **tuple** -- The averaged metrics. Potentially with their corresponding standard deviations.