medcat2.utils.legacy.conversion_all
Attributes
Classes
This is a collection of serialisable model parts. |
|
Generic enumeration. |
|
Describes the available serialisers. |
|
Base class for protocol classes. |
|
Converts v1 models to v2 models. |
Functions
|
Get the v2 CDB from a v1 CDB path. |
|
Convert the saved v1 config into a v2 Config. |
|
Convert a v1 vocab file to a v2 Vocab. |
|
Convert a v1 MetaCAT folder to a v2 MetaCAT. |
|
|
|
Unpack v1 model into target folder. |
Module Contents
- class medcat2.utils.legacy.conversion_all.CAT(cdb, vocab=None, config=None, model_load_path=None)
Bases:
medcat2.storage.serialisables.AbstractSerialisableThis is a collection of serialisable model parts.
- Parameters:
cdb (medcat2.cdb.CDB)
vocab (Union[medcat2.vocab.Vocab, None])
config (Optional[medcat2.config.config.Config])
model_load_path (Optional[str])
- __init__(cdb, vocab=None, config=None, model_load_path=None)
- Parameters:
cdb (medcat2.cdb.CDB)
vocab (Union[medcat2.vocab.Vocab, None])
config (Optional[medcat2.config.config.Config])
model_load_path (Optional[str])
- Return type:
None
- cdb
- vocab = None
- config = None
- _trainer: medcat2.trainer.Trainer | None = None
- _pipeline
- _recrate_pipe(model_load_path=None)
- Parameters:
model_load_path (Optional[str])
- Return type:
- classmethod get_init_attrs()
- Return type:
list[str]
- classmethod ignore_attrs()
- Return type:
list[str]
- __call__(text)
- Parameters:
text (str)
- Return type:
- _ensure_not_training()
Method to ensure config is not set to train.
config.components.linking.train should only be True while training and not during inference. This aalso corrects the setting if necessary.
- Return type:
None
- get_entities(text, only_cui=False)
Get the entities recognised and linked within the provided text.
This will run the text through the pipeline and annotated the recognised and linked entities.
- Parameters:
text (str) – The text to use.
only_cui (bool, optional) – Whether to only output the CUIs rather than the entire context. Defaults to False.
- Returns:
Union[dict, Entities, OnlyCUIEntities] – The entities found and linked within the text.
- Return type:
Union[dict, medcat2.data.entities.Entities, medcat2.data.entities.OnlyCUIEntities]
- _get_entity(ent, doc_tokens, cui)
- Parameters:
doc_tokens (list[str])
cui (str)
- Return type:
- _doc_to_out_entity(ent, doc_tokens, only_cui)
- Parameters:
doc_tokens (list[str])
only_cui (bool)
- Return type:
tuple[int, Union[medcat2.data.entities.Entity, str]]
- _doc_to_out(doc, only_cui, out_with_text=False)
- Parameters:
only_cui (bool)
out_with_text (bool)
- Return type:
Union[medcat2.data.entities.Entities, medcat2.data.entities.OnlyCUIEntities]
- property trainer
The trainer object.
- save_model_pack(target_folder, pack_name=DEFAULT_PACK_NAME, serialiser_type='dill', make_archive=True)
Save model pack.
The resulting model pack name will have the hash of the model pack in its name if (and only if) the default model pack name is used.
- Parameters:
target_folder (str) – The folder to save the pack in.
pack_name (str, optional) – The model pack name. Defaults to DEFAULT_PACK_NAME.
serialiser_type (Union[str, AvailableSerialisers], optional) – The serialiser type. Defaults to ‘dill’.
make_archive (bool) – Whether to make the arhive /.zip file. Defaults to True.
- Returns:
str – The final model pack path.
- Return type:
str
- _versioning()
- Return type:
str
- classmethod load_model_pack(model_pack_path)
Load the model pack from file.
- Parameters:
model_pack_path (str) – The model pack path.
- Raises:
ValueError – If the saved data does not represent a model pack.
- Returns:
CAT – The loaded model pack.
- Return type:
- get_model_card(as_dict: Literal[True]) medcat2.data.model_card.ModelCard
- get_model_card(as_dict: Literal[False]) str
Get the model card either a (nested) dict or a json string.
- Parameters:
as_dict (bool) – Whether to return as dict. Defaults to False.
- Returns:
Union[str, ModelCard] – The model card.
- __eq__(other)
- Parameters:
other (Any)
- Return type:
bool
- add_addon(addon)
- Parameters:
- Return type:
None
- get_strategy()
- Return type:
- classmethod include_properties()
- Return type:
list[str]
- class medcat2.utils.legacy.conversion_all.CoreComponentType
Bases:
enum.EnumGeneric enumeration.
Derive from this class to define new enumerations.
- tagging
- token_normalizing
- ner
- linking
- __new__(value)
- _generate_next_value_(start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None
- classmethod _missing_(value)
- __repr__()
- __str__()
- __dir__()
Returns all members and all public methods
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- __hash__()
- __reduce_ex__(proto)
- name()
The name of the Enum member.
- value()
The value of the Enum member.
- class medcat2.utils.legacy.conversion_all.AvailableSerialisers
Bases:
enum.EnumDescribes the available serialisers.
- dill
- write_to(file_path)
- Parameters:
file_path (str)
- Return type:
None
- classmethod from_file(file_path)
- Parameters:
file_path (str)
- Return type:
- __new__(value)
- _generate_next_value_(start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None
- classmethod _missing_(value)
- __repr__()
- __str__()
- __dir__()
Returns all members and all public methods
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- __hash__()
- __reduce_ex__(proto)
- name()
The name of the Enum member.
- value()
The value of the Enum member.
- class medcat2.utils.legacy.conversion_all.NoActionLinker
Bases:
medcat2.components.types.AbstractCoreComponentBase class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto(Protocol[T]): def meth(self) -> T: ...
- name = 'no_action'
The name of the component.
- get_type()
- __call__(doc)
- Parameters:
- Return type:
- classmethod get_init_args(tokenizer, cdb, vocab, model_load_path)
Get the init arguments for the component.
- Parameters:
tokenizer (BaseTokenizer) – The tokenizer.
cdb (CDB) – The CDB.
vocab (Vocab) – The Vocab.
model_load_path (Optional[str]) – The model load path (or None).
- Returns:
list[Any] – The list of init arguments.
- Return type:
list[Any]
- classmethod get_init_kwargs(tokenizer, cdb, vocab, model_load_path)
Get init keyword arguments for the component.
- Parameters:
tokenizer (BaseTokenizer) – The tokenizer.
cdb (CDB) – The CDB.
vocab (Vocab) – The Vocab.
model_load_path (Optional[str]) – The model load path (or None).
- Returns:
dict[str, Any] – The keywrod arguments.
- Return type:
dict[str, Any]
- NAME_PREFIX = 'core_'
- property full_name: str
Name with the component type (e.g ner, linking, meta).
- Return type:
str
- is_core()
Whether the component is a core component or not.
- Returns:
bool – Whether this is a core component.
- Return type:
bool
- __slots__ = ()
- _is_protocol = True
- _is_runtime_protocol = False
- classmethod __init_subclass__(*args, **kwargs)
- classmethod __class_getitem__(params)
- medcat2.utils.legacy.conversion_all.get_cdb_from_old(old_path)
Get the v2 CDB from a v1 CDB path.
- Parameters:
old_path (str) – The v1 CDB path.
- Returns:
CDB – The v2 CDB.
- Return type:
- medcat2.utils.legacy.conversion_all.get_config_from_old(path)
Convert the saved v1 config into a v2 Config.
- Parameters:
path (str) – The v1 config path.
- Returns:
Config – The v2 config.
- Return type:
- medcat2.utils.legacy.conversion_all.get_vocab_from_old(old_path)
Convert a v1 vocab file to a v2 Vocab.
- Parameters:
old_path (str) – The v1 vocab file path.
- Returns:
Vocab – The v2 Vocab.
- Return type:
- medcat2.utils.legacy.conversion_all.get_meta_cat_from_old(old_path, tokenizer)
Convert a v1 MetaCAT folder to a v2 MetaCAT.
- Parameters:
old_path (str) – The v1 MetaCAT file path.
tokenizer (BaseTokenizer) – The tokenizer.
- Returns:
MetaCATAddon – The v2 MetaCAT.
- Return type:
- medcat2.utils.legacy.conversion_all.get_trf_ner_from_old(old_path, tokenizer)
- Parameters:
old_path (str)
tokenizer (medcat2.tokenizing.tokenizers.BaseTokenizer)
- Return type:
- medcat2.utils.legacy.conversion_all.logger
- class medcat2.utils.legacy.conversion_all.Converter(medcat1_model_pack_path, new_model_pack_path, ser_type=AvailableSerialisers.dill)
Converts v1 models to v2 models.
- Parameters:
medcat1_model_pack_path (str)
new_model_pack_path (Optional[str])
- cdb_name = 'cdb.dat'
- vocab_name = 'vocab.dat'
- config_name = 'config.json'
- __init__(medcat1_model_pack_path, new_model_pack_path, ser_type=AvailableSerialisers.dill)
- Parameters:
medcat1_model_pack_path (str)
new_model_pack_path (Optional[str])
- old_model_folder
- new_model_folder
- ser_type
- property expected_files_in_folder
The base names of the required files in a folder for a v1 model.
- _validate()
- convert()
Use the gathered information to convert to a v2 model.
This converts the CDB, Vocab, and Config, in order and then created the model pack.
If self.new_model_folder is set, the model will be saved as well.
- Returns:
CAT – The model pack.
- Return type:
- medcat2.utils.legacy.conversion_all.unpack(model_zip_path, target_folder)
Unpack v1 model into target folder.
- Parameters:
model_zip_path (str) – ZIP path.
target_folder (str) – Target folder.