medcat.components.addons.meta_cat ================================= .. py:module:: medcat.components.addons.meta_cat Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/medcat/components/addons/meta_cat/data_utils/index /autoapi/medcat/components/addons/meta_cat/mctokenizers/index /autoapi/medcat/components/addons/meta_cat/meta_cat/index /autoapi/medcat/components/addons/meta_cat/ml_utils/index /autoapi/medcat/components/addons/meta_cat/models/index Attributes ---------- .. autoapisummary:: medcat.components.addons.meta_cat.__all__ medcat.components.addons.meta_cat._EXTRA_NAME Classes ------- .. autoapisummary:: medcat.components.addons.meta_cat.MetaCAT medcat.components.addons.meta_cat.MetaCATAddon medcat.components.addons.meta_cat.MetaAnnotationValue Functions --------- .. autoapisummary:: medcat.components.addons.meta_cat.ensure_optional_extras_installed medcat.components.addons.meta_cat.get_meta_annotations Package Contents ---------------- .. py:function:: ensure_optional_extras_installed(package_name, extra_name) Ensure that an optional dependency set is installed. :param package_name: The base package name. :type package_name: str :param extra_name: The name of the extra dependency. :type extra_name: str :raises MissingDependenciesError: If the extra dependency isn't provided. .. py:class:: MetaCAT(tokenizer = None, embeddings = None, config = None, _model_state_dict = None) Bases: :py:obj:`medcat.storage.serialisables.AbstractSerialisable` The MetaCAT class used for training 'Meta-Annotation' models, i.e. annotations of clinical concept annotations. These are also known as properties or attributes of recognise entities sin similar tools such as MetaMap and cTakes. This is a flexible model agnostic class that can learns any meta-annotation task, i.e. any multi-class classification task for recognised terms. :param tokenizer: The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from a BERT-style model, or trained from scratch for the Bi-LSTM (w. attention) model that is currentl used in most deployments. :type tokenizer: TokenizerWrapperBase :param embeddings: embedding mapping (sub)word input id n-dim (sub)word embedding. :type embeddings: Tensor, numpy.ndarray :param config: the configuration for MetaCAT. Param descriptions available in ConfigMetaCAT docs. :type config: ConfigMetaCAT .. py:attribute:: name :value: 'meta_cat' .. py:attribute:: _component_lock .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:property:: _model_state_dict .. py:method:: __init__(tokenizer = None, embeddings = None, config = None, _model_state_dict = None) .. py:attribute:: config :value: None .. py:attribute:: tokenizer :value: None .. py:attribute:: embeddings .. py:attribute:: model .. py:method:: _reset_tokenizer_info() .. py:method:: get_model(embeddings) Get the model :param embeddings: The embedding densor :type embeddings: Optional[Tensor] :raises ValueError: If the meta model is not LSTM or BERT :Returns: **nn.Module** -- The module .. py:method:: get_hash() A partial hash trying to catch differences between models. :Returns: **str** -- The hex hash. .. py:method:: train_from_json(json_path, save_dir_path = None, data_oversampled = None, overwrite = False) Train or continue training a model give a json_path containing a MedCATtrainer export. It will continue training if an existing model is loaded or start new training if the model is blank/new. :param json_path: Path/Paths to a MedCATtrainer export containing the meta_annotations we want to train for. :type json_path: Union[str, list] :param save_dir_path: In case we have aut_save_model (meaning during the training the best model will be saved) we need to set a save path. Defaults to `None`. :type save_dir_path: Optional[str] :param data_oversampled: In case of oversampling being performed, the data will be passed in the parameter allowing the model to be trained on original + synthetic data. :type data_oversampled: Optional[list] :param overwrite: Whether to allow overwriting the file if/when appropriate. :type overwrite: bool :Returns: **dict** -- The resulting report. .. py:method:: train_raw(data_loaded, save_dir_path = None, data_oversampled = None, overwrite = False) Train or continue training a model given raw data. It will continue training if an existing model is loaded or start new training if the model is blank/new. The raw data is expected in the following format: { 'projects': [ # list of projects { 'name': '', 'documents': [ # list of documents { 'name': '', 'text': '', 'annotations': [ # list of annotations { # start index of the annotation 'start': -1, 'end': 1, # end index of the annotation 'cui': 'cui', 'value': '' }, ... ], }, ... ] }, ... ] } :param data_loaded: The raw data we want to train for. :type data_loaded: dict :param save_dir_path: In case we have aut_save_model (meaning during the training the best model will be saved) we need to set a save path. Defaults to `None`. :type save_dir_path: Optional[str] :param data_oversampled: In case of oversampling being performed, the data will be passed in the parameter allowing the model to be trained on original + synthetic data. The format of which is expected: [[['text','of','the','document'], [index of medical entity], "label" ], ['text','of','the','document'], [index of medical entity], "label" ]] :type data_oversampled: Optional[list] :param overwrite: Whether to allow overwriting the file if/when appropriate. :type overwrite: bool :Returns: **dict** -- The resulting report. :raises Exception: If no save path is specified, or category name not in data. :raises AssertionError: If no tokeniser is set :raises FileNotFoundError: If phase_number is set to 2 and model.dat file is not found :raises KeyError: If phase_number is set to 2 and model.dat file contains mismatched architecture .. py:method:: eval(json_path) Evaluate from json. :param json_path: The json file ath :type json_path: str :Returns: **dict** -- The resulting model dict :raises AssertionError: If self.tokenizer :raises Exception: If the category name does not exist .. py:method:: get_ents(doc) .. py:method:: prepare_document(doc, input_ids, offset_mapping, lowercase) Prepares document. :param doc: The document :type doc: Doc :param input_ids: Input ids :type input_ids: list :param offset_mapping: Offset mappings :type offset_mapping: list :param lowercase: Whether to use lower case replace center :type lowercase: bool :Returns: **tuple[dict, list]** -- Entity id to index mapping and Samples .. py:method:: batch_generator(stream, batch_size_chars) :staticmethod: Generator for batch of documents. :param stream: The document stream :type stream: Iterable[MutableDocument] :param batch_size_chars: Number of characters per batch :type batch_size_chars: int :Yields: *list[MutableDocument]* -- The batch of documents. .. py:method:: _set_meta_anns(doc, id2category_value) .. py:method:: __call__(doc) Process one document, used in the spacy pipeline for sequential document processing. :param doc: A spacy document :type doc: Doc :Returns: **Doc** -- The same spacy document. .. py:method:: get_model_card(as_dict = False) A minimal model card. :param as_dict: Return the model card as a dictionary instead of a str. Defaults to `False`. :type as_dict: bool :Returns: **Union[str, dict]** -- An indented JSON object. OR A JSON object in dict form. .. py:method:: __repr__() Prints the model_card for this MetaCAT instance. :Returns: * **the 'Model Card' for this MetaCAT instance. This includes NER+L** * **config and any MetaCATs** .. py:method:: get_strategy() .. py:method:: __eq__(other) .. py:class:: MetaCATAddon(config, base_tokenizer, meta_cat) Bases: :py:obj:`medcat.components.addons.addons.AddonComponent` Base/abstract addon component class. .. py:attribute:: addon_type :value: 'meta_cat' .. py:attribute:: output_key :value: 'meta_anns' .. py:attribute:: config :type: medcat.config.config_meta_cat.ConfigMetaCAT .. py:method:: __init__(config, base_tokenizer, meta_cat) .. py:attribute:: base_tokenizer .. py:attribute:: _mc .. py:attribute:: _name .. py:property:: mc :type: MetaCAT .. py:method:: create_new(config, base_tokenizer, tknzer_preprocessor = None) :classmethod: Factory method to create a new MetaCATAddon instance. .. py:method:: create_new_component(cnf, tokenizer, cdb, vocab, model_load_path) :classmethod: Create a new component or load one off disk if load path presented. This may raise an exception if the wrong type of config is provided. :param cnf: The config relevant to this components. :type cnf: ComponentConfig :param tokenizer: The base tokenizer. :type tokenizer: BaseTokenizer :param cdb: The CDB. :type cdb: CDB :param vocab: The Vocab. :type vocab: Vocab :param model_load_path: Model load path (if present). :type model_load_path: Optional[str] :Returns: **Self** -- The new components. .. py:method:: load_existing(cnf, base_tokenizer, load_path) :classmethod: Factory method to load an existing MetaCATAddon from disk. .. py:property:: name :type: str The name of the component. .. py:method:: __call__(doc) .. py:method:: load(folder_path) .. py:method:: _load_tokenizer(config, tokenizer_folder) :classmethod: .. py:method:: _get_meta_cat_and_tokenizer_paths(folder_path) :classmethod: .. py:method:: save(folder_path) .. py:method:: _init_data_paths() .. py:property:: include_in_output :type: bool .. py:method:: get_output_key_val(ent) .. py:method:: serialise_to(folder_path) .. py:method:: deserialise_from(folder_path, **init_kwargs) :classmethod: .. py:method:: get_strategy() .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:method:: get_hash() .. py:attribute:: NAME_PREFIX :type: str :value: 'addon_' .. py:attribute:: NAME_SPLITTER :type: str :value: '.' .. py:method:: is_core() Whether the component is a core component or not. :Returns: **bool** -- Whether this is a core component. .. py:method:: get_folder_name_for_addon_and_name(addon_type, name) :classmethod: .. py:method:: get_folder_name() .. py:property:: full_name :type: str Name with the component type (e.g ner, linking, meta). .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:function:: get_meta_annotations(entity) .. py:class:: MetaAnnotationValue Bases: :py:obj:`TypedDict` dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) .. py:attribute:: name :type: str .. py:attribute:: value :type: str .. py:attribute:: confidence :type: float .. py:method:: __contains__() True if the dictionary has the specified key, else False. .. py:method:: __delattr__() Implement delattr(self, name). .. py:method:: __delitem__() Delete self[key]. .. py:method:: __dir__() Default dir() implementation. .. py:method:: __eq__() Return self==value. .. py:method:: __format__() Default object formatter. .. py:method:: __ge__() Return self>=value. .. py:method:: __getattribute__() Return getattr(self, name). .. py:method:: __getitem__() x.__getitem__(y) <==> x[y] .. py:method:: __gt__() Return self>value. .. py:method:: __init__() Initialize self. See help(type(self)) for accurate signature. .. py:method:: __ior__() Return self|=value. .. py:method:: __iter__() Implement iter(self). .. py:method:: __le__() Return self<=value. .. py:method:: __len__() Return len(self). .. py:method:: __lt__() Return self size of D in memory, in bytes .. py:method:: __str__() Return str(self). .. py:method:: __subclasshook__() Abstract classes can override this to customize issubclass(). This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached). .. py:method:: clear() D.clear() -> None. Remove all items from D. .. py:method:: copy() D.copy() -> a shallow copy of D .. py:method:: get() Return the value for key if key is in the dictionary, else default. .. py:method:: items() D.items() -> a set-like object providing a view on D's items .. py:method:: keys() D.keys() -> a set-like object providing a view on D's keys .. py:method:: pop() D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If the key is not found, return the default if given; otherwise, raise a KeyError. .. py:method:: popitem() Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. .. py:method:: setdefault() Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. .. py:method:: update() D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] .. py:method:: values() D.values() -> an object providing a view on D's values .. py:data:: __all__ :value: ['MetaCAT', 'MetaCATAddon', 'get_meta_annotations', 'MetaAnnotationValue'] .. py:data:: _EXTRA_NAME :value: 'meta-cat'