medcat2.utils.legacy.conversion_all

Attributes

logger

Classes

CAT

This is a collection of serialisable model parts.

CoreComponentType

Generic enumeration.

AvailableSerialisers

Describes the available serialisers.

NoActionLinker

Base class for protocol classes.

Converter

Converts v1 models to v2 models.

Functions

get_cdb_from_old(old_path)

Get the v2 CDB from a v1 CDB path.

get_config_from_old(path)

Convert the saved v1 config into a v2 Config.

get_vocab_from_old(old_path)

Convert a v1 vocab file to a v2 Vocab.

get_meta_cat_from_old(old_path, tokenizer)

Convert a v1 MetaCAT folder to a v2 MetaCAT.

get_trf_ner_from_old(old_path, tokenizer)

unpack(model_zip_path, target_folder)

Unpack v1 model into target folder.

Module Contents

class medcat2.utils.legacy.conversion_all.CAT(cdb, vocab=None, config=None, model_load_path=None)

Bases: medcat2.storage.serialisables.AbstractSerialisable

This is a collection of serialisable model parts.

Parameters:
__init__(cdb, vocab=None, config=None, model_load_path=None)
Parameters:
Return type:

None

cdb
vocab = None
config = None
_trainer: medcat2.trainer.Trainer | None = None
_pipeline
_recrate_pipe(model_load_path=None)
Parameters:

model_load_path (Optional[str])

Return type:

medcat2.pipeline.pipeline.Pipeline

classmethod get_init_attrs()
Return type:

list[str]

classmethod ignore_attrs()
Return type:

list[str]

__call__(text)
Parameters:

text (str)

Return type:

Optional[medcat2.tokenizing.tokens.MutableDocument]

_ensure_not_training()

Method to ensure config is not set to train.

config.components.linking.train should only be True while training and not during inference. This aalso corrects the setting if necessary.

Return type:

None

get_entities(text, only_cui=False)

Get the entities recognised and linked within the provided text.

This will run the text through the pipeline and annotated the recognised and linked entities.

Parameters:
  • text (str) – The text to use.

  • only_cui (bool, optional) – Whether to only output the CUIs rather than the entire context. Defaults to False.

Returns:

Union[dict, Entities, OnlyCUIEntities] – The entities found and linked within the text.

Return type:

Union[dict, medcat2.data.entities.Entities, medcat2.data.entities.OnlyCUIEntities]

_get_entity(ent, doc_tokens, cui)
Parameters:
Return type:

medcat2.data.entities.Entity

_doc_to_out_entity(ent, doc_tokens, only_cui)
Parameters:
Return type:

tuple[int, Union[medcat2.data.entities.Entity, str]]

_doc_to_out(doc, only_cui, out_with_text=False)
Parameters:
Return type:

Union[medcat2.data.entities.Entities, medcat2.data.entities.OnlyCUIEntities]

property trainer

The trainer object.

save_model_pack(target_folder, pack_name=DEFAULT_PACK_NAME, serialiser_type='dill', make_archive=True)

Save model pack.

Parameters:
  • target_folder (str) – The folder to save the pack in.

  • pack_name (str, optional) – The model pack name. Defaults to DEFAULT_PACK_NAME.

  • serialiser_type (Union[str, AvailableSerialisers], optional) – The serialiser type. Defaults to ‘dill’.

  • make_archive (bool) – Whether to make the arhive /.zip file. Defaults to True.

Returns:

str – The final model pack path.

Return type:

str

classmethod load_model_pack(model_pack_path)

Load the model pack from file.

Parameters:

model_pack_path (str) – The model pack path.

Raises:

ValueError – If the saved data does not represent a model pack.

Returns:

CAT – The loaded model pack.

Return type:

CAT

__eq__(other)
Parameters:

other (Any)

Return type:

bool

add_addon(addon)
Parameters:

addon (medcat2.components.addons.addons.AddonComponent)

Return type:

None

get_strategy()
Return type:

SerialisingStrategy

classmethod include_properties()
Return type:

list[str]

class medcat2.utils.legacy.conversion_all.CoreComponentType

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

tagging
token_normalizing
ner
linking
__new__(value)
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

classmethod _missing_(value)
__repr__()
__str__()
__dir__()

Returns all members and all public methods

__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

__hash__()
__reduce_ex__(proto)
name()

The name of the Enum member.

value()

The value of the Enum member.

class medcat2.utils.legacy.conversion_all.AvailableSerialisers

Bases: enum.Enum

Describes the available serialisers.

dill
write_to(file_path)
Parameters:

file_path (str)

Return type:

None

classmethod from_file(file_path)
Parameters:

file_path (str)

Return type:

AvailableSerialisers

__new__(value)
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

classmethod _missing_(value)
__repr__()
__str__()
__dir__()

Returns all members and all public methods

__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

__hash__()
__reduce_ex__(proto)
name()

The name of the Enum member.

value()

The value of the Enum member.

class medcat2.utils.legacy.conversion_all.NoActionLinker

Bases: medcat2.components.types.AbstractCoreComponent

Base class for protocol classes.

Protocol classes are defined as:

class Proto(Protocol):
    def meth(self) -> int:
        ...

Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example:

class C:
    def meth(self) -> int:
        return 0

def func(x: Proto) -> int:
    return x.meth()

func(C())  # Passes static type check

See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:

class GenProto(Protocol[T]):
    def meth(self) -> T:
        ...
name = 'no_action'

The name of the component.

get_type()
__call__(doc)
Parameters:

doc (medcat2.tokenizing.tokens.MutableDocument)

Return type:

medcat2.tokenizing.tokens.MutableDocument

classmethod get_init_args(tokenizer, cdb, vocab, model_load_path)

Get the init arguments for the component.

Parameters:
  • tokenizer (BaseTokenizer) – The tokenizer.

  • cdb (CDB) – The CDB.

  • vocab (Vocab) – The Vocab.

  • model_load_path (Optional[str]) – The model load path (or None).

Returns:

list[Any] – The list of init arguments.

Return type:

list[Any]

classmethod get_init_kwargs(tokenizer, cdb, vocab, model_load_path)

Get init keyword arguments for the component.

Parameters:
  • tokenizer (BaseTokenizer) – The tokenizer.

  • cdb (CDB) – The CDB.

  • vocab (Vocab) – The Vocab.

  • model_load_path (Optional[str]) – The model load path (or None).

Returns:

dict[str, Any] – The keywrod arguments.

Return type:

dict[str, Any]

NAME_PREFIX = 'core_'
property full_name: str

Name with the component type (e.g ner, linking, meta).

Return type:

str

is_core()

Whether the component is a core component or not.

Returns:

bool – Whether this is a core component.

Return type:

bool

__slots__ = ()
_is_protocol = True
_is_runtime_protocol = False
classmethod __init_subclass__(*args, **kwargs)
classmethod __class_getitem__(params)
medcat2.utils.legacy.conversion_all.get_cdb_from_old(old_path)

Get the v2 CDB from a v1 CDB path.

Parameters:

old_path (str) – The v1 CDB path.

Returns:

CDB – The v2 CDB.

Return type:

medcat2.cdb.CDB

medcat2.utils.legacy.conversion_all.get_config_from_old(path)

Convert the saved v1 config into a v2 Config.

Parameters:

path (str) – The v1 config path.

Returns:

Config – The v2 config.

Return type:

medcat2.config.Config

medcat2.utils.legacy.conversion_all.get_vocab_from_old(old_path)

Convert a v1 vocab file to a v2 Vocab.

Parameters:

old_path (str) – The v1 vocab file path.

Returns:

Vocab – The v2 Vocab.

Return type:

medcat2.vocab.Vocab

medcat2.utils.legacy.conversion_all.get_meta_cat_from_old(old_path, tokenizer)

Convert a v1 MetaCAT folder to a v2 MetaCAT.

Parameters:
  • old_path (str) – The v1 MetaCAT file path.

  • tokenizer (BaseTokenizer) – The tokenizer.

Returns:

MetaCATAddon – The v2 MetaCAT.

Return type:

medcat2.components.addons.meta_cat.meta_cat.MetaCATAddon

medcat2.utils.legacy.conversion_all.get_trf_ner_from_old(old_path, tokenizer)
Parameters:
Return type:

medcat2.components.ner.trf.transformers_ner.TransformersNER

medcat2.utils.legacy.conversion_all.logger
class medcat2.utils.legacy.conversion_all.Converter(medcat1_model_pack_path, new_model_pack_path, ser_type=AvailableSerialisers.dill)

Converts v1 models to v2 models.

Parameters:
cdb_name = 'cdb.dat'
vocab_name = 'vocab.dat'
config_name = 'config.json'
__init__(medcat1_model_pack_path, new_model_pack_path, ser_type=AvailableSerialisers.dill)
Parameters:
old_model_folder
new_model_folder
ser_type
property expected_files_in_folder

The base names of the required files in a folder for a v1 model.

_validate()
convert()

Use the gathered information to convert to a v2 model.

This converts the CDB, Vocab, and Config, in order and then created the model pack.

If self.new_model_folder is set, the model will be saved as well.

Returns:

CAT – The model pack.

Return type:

medcat2.cat.CAT

medcat2.utils.legacy.conversion_all.unpack(model_zip_path, target_folder)

Unpack v1 model into target folder.

Parameters:
  • model_zip_path (str) – ZIP path.

  • target_folder (str) – Target folder.