medcat2.utils.legacy.conversion_all =================================== .. py:module:: medcat2.utils.legacy.conversion_all Attributes ---------- .. autoapisummary:: medcat2.utils.legacy.conversion_all.logger Classes ------- .. autoapisummary:: medcat2.utils.legacy.conversion_all.CAT medcat2.utils.legacy.conversion_all.CoreComponentType medcat2.utils.legacy.conversion_all.AvailableSerialisers medcat2.utils.legacy.conversion_all.NoActionLinker medcat2.utils.legacy.conversion_all.Converter Functions --------- .. autoapisummary:: medcat2.utils.legacy.conversion_all.get_cdb_from_old medcat2.utils.legacy.conversion_all.get_config_from_old medcat2.utils.legacy.conversion_all.get_vocab_from_old medcat2.utils.legacy.conversion_all.get_meta_cat_from_old medcat2.utils.legacy.conversion_all.get_trf_ner_from_old medcat2.utils.legacy.conversion_all.unpack Module Contents --------------- .. py:class:: CAT(cdb, vocab = None, config = None, model_load_path = None) Bases: :py:obj:`medcat2.storage.serialisables.AbstractSerialisable` This is a collection of serialisable model parts. .. py:method:: __init__(cdb, vocab = None, config = None, model_load_path = None) .. py:attribute:: cdb .. py:attribute:: vocab :value: None .. py:attribute:: config :value: None .. py:attribute:: _trainer :type: Optional[medcat2.trainer.Trainer] :value: None .. py:attribute:: _pipeline .. py:method:: _recrate_pipe(model_load_path = None) .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: __call__(text) .. py:method:: _ensure_not_training() Method to ensure config is not set to train. `config.components.linking.train` should only be True while training and not during inference. This aalso corrects the setting if necessary. .. py:method:: get_entities(text, only_cui = False) Get the entities recognised and linked within the provided text. This will run the text through the pipeline and annotated the recognised and linked entities. :param text: The text to use. :type text: str :param only_cui: Whether to only output the CUIs rather than the entire context. Defaults to False. :type only_cui: bool, optional :Returns: **Union[dict, Entities, OnlyCUIEntities]** -- The entities found and linked within the text. .. py:method:: _get_entity(ent, doc_tokens, cui) .. py:method:: _doc_to_out_entity(ent, doc_tokens, only_cui) .. py:method:: _doc_to_out(doc, only_cui, out_with_text = False) .. py:property:: trainer The trainer object. .. py:method:: save_model_pack(target_folder, pack_name = DEFAULT_PACK_NAME, serialiser_type = 'dill', make_archive = True) Save model pack. :param target_folder: The folder to save the pack in. :type target_folder: str :param pack_name: The model pack name. Defaults to DEFAULT_PACK_NAME. :type pack_name: str, optional :param serialiser_type: The serialiser type. Defaults to 'dill'. :type serialiser_type: Union[str, AvailableSerialisers], optional :param make_archive: Whether to make the arhive /.zip file. Defaults to True. :type make_archive: bool :Returns: **str** -- The final model pack path. .. py:method:: load_model_pack(model_pack_path) :classmethod: Load the model pack from file. :param model_pack_path: The model pack path. :type model_pack_path: str :raises ValueError: If the saved data does not represent a model pack. :Returns: **CAT** -- The loaded model pack. .. py:method:: __eq__(other) .. py:method:: add_addon(addon) .. py:method:: get_strategy() .. py:method:: include_properties() :classmethod: .. py:class:: CoreComponentType Bases: :py:obj:`enum.Enum` Generic enumeration. Derive from this class to define new enumerations. .. py:attribute:: tagging .. py:attribute:: token_normalizing .. py:attribute:: ner .. py:attribute:: linking .. py:method:: __new__(value) .. py:method:: _generate_next_value_(start, count, last_values) Generate the next value when not given. name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None .. py:method:: _missing_(value) :classmethod: .. py:method:: __repr__() .. py:method:: __str__() .. py:method:: __dir__() Returns all members and all public methods .. py:method:: __format__(format_spec) Returns format using actual value type unless __str__ has been overridden. .. py:method:: __hash__() .. py:method:: __reduce_ex__(proto) .. py:method:: name() The name of the Enum member. .. py:method:: value() The value of the Enum member. .. py:class:: AvailableSerialisers Bases: :py:obj:`enum.Enum` Describes the available serialisers. .. py:attribute:: dill .. py:method:: write_to(file_path) .. py:method:: from_file(file_path) :classmethod: .. py:method:: __new__(value) .. py:method:: _generate_next_value_(start, count, last_values) Generate the next value when not given. name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None .. py:method:: _missing_(value) :classmethod: .. py:method:: __repr__() .. py:method:: __str__() .. py:method:: __dir__() Returns all members and all public methods .. py:method:: __format__(format_spec) Returns format using actual value type unless __str__ has been overridden. .. py:method:: __hash__() .. py:method:: __reduce_ex__(proto) .. py:method:: name() The name of the Enum member. .. py:method:: value() The value of the Enum member. .. py:class:: NoActionLinker Bases: :py:obj:`medcat2.components.types.AbstractCoreComponent` Base class for protocol classes. Protocol classes are defined as:: class Proto(Protocol): def meth(self) -> int: ... Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:: class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:: class GenProto(Protocol[T]): def meth(self) -> T: ... .. py:attribute:: name :value: 'no_action' The name of the component. .. py:method:: get_type() .. py:method:: __call__(doc) .. py:method:: get_init_args(tokenizer, cdb, vocab, model_load_path) :classmethod: Get the init arguments for the component. :param tokenizer: The tokenizer. :type tokenizer: BaseTokenizer :param cdb: The CDB. :type cdb: CDB :param vocab: The Vocab. :type vocab: Vocab :param model_load_path: The model load path (or None). :type model_load_path: Optional[str] :Returns: **list[Any]** -- The list of init arguments. .. py:method:: get_init_kwargs(tokenizer, cdb, vocab, model_load_path) :classmethod: Get init keyword arguments for the component. :param tokenizer: The tokenizer. :type tokenizer: BaseTokenizer :param cdb: The CDB. :type cdb: CDB :param vocab: The Vocab. :type vocab: Vocab :param model_load_path: The model load path (or None). :type model_load_path: Optional[str] :Returns: **dict[str, Any]** -- The keywrod arguments. .. py:attribute:: NAME_PREFIX :value: 'core_' .. py:property:: full_name :type: str Name with the component type (e.g ner, linking, meta). .. py:method:: is_core() Whether the component is a core component or not. :Returns: **bool** -- Whether this is a core component. .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:function:: get_cdb_from_old(old_path) Get the v2 CDB from a v1 CDB path. :param old_path: The v1 CDB path. :type old_path: str :Returns: **CDB** -- The v2 CDB. .. py:function:: get_config_from_old(path) Convert the saved v1 config into a v2 Config. :param path: The v1 config path. :type path: str :Returns: **Config** -- The v2 config. .. py:function:: get_vocab_from_old(old_path) Convert a v1 vocab file to a v2 Vocab. :param old_path: The v1 vocab file path. :type old_path: str :Returns: **Vocab** -- The v2 Vocab. .. py:function:: get_meta_cat_from_old(old_path, tokenizer) Convert a v1 MetaCAT folder to a v2 MetaCAT. :param old_path: The v1 MetaCAT file path. :type old_path: str :param tokenizer: The tokenizer. :type tokenizer: BaseTokenizer :Returns: **MetaCATAddon** -- The v2 MetaCAT. .. py:function:: get_trf_ner_from_old(old_path, tokenizer) .. py:data:: logger .. py:class:: Converter(medcat1_model_pack_path, new_model_pack_path, ser_type = AvailableSerialisers.dill) Converts v1 models to v2 models. .. py:attribute:: cdb_name :value: 'cdb.dat' .. py:attribute:: vocab_name :value: 'vocab.dat' .. py:attribute:: config_name :value: 'config.json' .. py:method:: __init__(medcat1_model_pack_path, new_model_pack_path, ser_type = AvailableSerialisers.dill) .. py:attribute:: old_model_folder .. py:attribute:: new_model_folder .. py:attribute:: ser_type .. py:property:: expected_files_in_folder The base names of the required files in a folder for a v1 model. .. py:method:: _validate() .. py:method:: convert() Use the gathered information to convert to a v2 model. This converts the CDB, Vocab, and Config, in order and then created the model pack. If `self.new_model_folder` is set, the model will be saved as well. :Returns: **CAT** -- The model pack. .. py:function:: unpack(model_zip_path, target_folder) Unpack v1 model into target folder. :param model_zip_path: ZIP path. :type model_zip_path: str :param target_folder: Target folder. :type target_folder: str