medcat.utils.legacy.convert_meta_cat

Attributes

logger

Classes

`MetaCAT`	The MetaCAT class used for training 'Meta-Annotation' models,
`MetaCATAddon`	Base/abstract addon component class.
`TokenizerWrapperBase`	Helper class that provides a standard way to create an ABC using
`ConfigMetaCAT`	The MetaCAT part of the config
`BaseTokenizer`	The base tokenizer protocol.

Functions

`load_tokenizer`(config, tokenizer_folder)
`fix_old_style_cnf`(data[, remove, take_from])
`_load_legacy`(config, save_dir_path)
`load_cnf`(cnf_path)
`get_meta_cat_from_old`(old_path, tokenizer)	Convert a v1 MetaCAT folder to a v2 MetaCAT.

Module Contents

class medcat.utils.legacy.convert_meta_cat.MetaCAT(tokenizer=None, embeddings=None, config=None, _model_state_dict=None)

Bases: medcat.storage.serialisables.AbstractSerialisable

The MetaCAT class used for training ‘Meta-Annotation’ models, i.e. annotations of clinical concept annotations. These are also known as properties or attributes of recognise entities sin similar tools such as MetaMap and cTakes.

This is a flexible model agnostic class that can learns any meta-annotation task, i.e. any multi-class classification task for recognised terms.

Parameters:

tokenizer (TokenizerWrapperBase) –
The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from a BERT-style model, or trained from scratch for the Bi-LSTM (w. attention) model that is currentl

used in most deployments.
embeddings (Tensor, numpy.ndarray) – embedding mapping (sub)word input id n-dim (sub)word embedding.
config (ConfigMetaCAT) – the configuration for MetaCAT. Param descriptions available in ConfigMetaCAT docs.
_model_state_dict (Optional[dict[str, Any]])

name = 'meta_cat'

_component_lock

classmethod get_init_attrs()

Return type:: list[str]

classmethod ignore_attrs()

Return type:: list[str]

classmethod include_properties()

Return type:: list[str]

property _model_state_dict

__init__(tokenizer=None, embeddings=None, config=None, _model_state_dict=None)

Parameters:

tokenizer (Optional[medcat.components.addons.meta_cat.mctokenizers.tokenizers.TokenizerWrapperBase])
embeddings (Optional[Union[torch.Tensor, numpy.ndarray]])
config (Optional[medcat.config.config_meta_cat.ConfigMetaCAT])
_model_state_dict (Optional[dict[str, Any]])

Return type:

None

config = None

tokenizer = None

embeddings

model

_reset_tokenizer_info()

get_model(embeddings)

Get the model

Parameters:: embeddings (Optional[Tensor]) – The embedding densor
Raises:: ValueError – If the meta model is not LSTM or BERT
Returns:: nn.Module – The module
Return type:: torch.nn.Module

get_hash()

A partial hash trying to catch differences between models.

Returns:: str – The hex hash.
Return type:: str

train_from_json(json_path, save_dir_path=None, data_oversampled=None, overwrite=False)

Train or continue training a model give a json_path containing a MedCATtrainer export. It will continue training if an existing model is loaded or start new training if the model is blank/new.

Parameters:

json_path (Union[str, list]) – Path/Paths to a MedCATtrainer export containing the meta_annotations we want to train for.
save_dir_path (Optional[str]) – In case we have aut_save_model (meaning during the training the best model will be saved) we need to set a save path. Defaults to None.
data_oversampled (Optional[list]) – In case of oversampling being performed, the data will be passed in the parameter allowing the model to be trained on original + synthetic data.
overwrite (bool) – Whether to allow overwriting the file if/when appropriate.

Returns:

dict – The resulting report.

Return type:

dict

train_raw(data_loaded, save_dir_path=None, data_oversampled=None, overwrite=False)

Train or continue training a model given raw data. It will continue training if an existing model is loaded or start new training if the model is blank/new.

The raw data is expected in the following format: {

‘projects’: [ # list of projects

{
‘name’: ‘<project_name>’, ‘documents’: [ # list of documents

{
‘name’: ‘<document_name>’, ‘text’: ‘<text_of_document>’, ‘annotations’: [ # list of annotations

{
# start index of the annotation ‘start’: -1, ‘end’: 1, # end index of the annotation ‘cui’: ‘cui’, ‘value’: ‘<annotation_value>’

],

]

]

}

Parameters:

data_loaded (dict) – The raw data we want to train for.
save_dir_path (Optional[str]) – In case we have aut_save_model (meaning during the training the best model will be saved) we need to set a save path. Defaults to None.
data_oversampled (Optional[list]) –
In case of oversampling being performed, the data will be passed in the parameter allowing the model to be trained on original + synthetic data. The format of which is expected: [[[‘text’,’of’,’the’,’document’], [index of medical entity],

”label” ],

[‘text’,’of’,’the’,’document’], [index of medical entity],
”label” ]]
overwrite (bool) – Whether to allow overwriting the file if/when appropriate.

Returns:

dict – The resulting report.

Raises:

Exception – If no save path is specified, or category name not in data.
AssertionError – If no tokeniser is set
FileNotFoundError – If phase_number is set to 2 and model.dat file is not found
KeyError – If phase_number is set to 2 and model.dat file contains mismatched architecture

Return type:

dict

eval(json_path)

Evaluate from json.

Parameters:

json_path (str) – The json file ath

Returns:

dict – The resulting model dict

Raises:

AssertionError – If self.tokenizer
Exception – If the category name does not exist

Return type:

dict

get_ents(doc)

Parameters:: doc (medcat.tokenizing.tokens.MutableDocument)
Return type:: Iterable[medcat.tokenizing.tokens.MutableEntity]

prepare_document(doc, input_ids, offset_mapping, lowercase)

Prepares document.

Parameters:

doc (Doc) – The document
input_ids (list) – Input ids
offset_mapping (list) – Offset mappings
lowercase (bool) – Whether to use lower case replace center

Returns:

tuple[dict, list] – Entity id to index mapping and Samples

Return type:

tuple[dict, list]

static batch_generator(stream, batch_size_chars)

Generator for batch of documents.

Parameters:

stream (Iterable[MutableDocument]) – The document stream
batch_size_chars (int) – Number of characters per batch

Yields:

list[MutableDocument] – The batch of documents.

Return type:

Iterable[list[medcat.tokenizing.tokens.MutableDocument]]

_set_meta_anns(doc, id2category_value)

Parameters:

doc (medcat.tokenizing.tokens.MutableDocument)
id2category_value (dict)

Return type:

medcat.tokenizing.tokens.MutableDocument

__call__(doc)

Process one document, used in the spacy pipeline for sequential document processing.

Parameters:: doc (Doc) – A spacy document
Returns:: Doc – The same spacy document.
Return type:: medcat.tokenizing.tokens.MutableDocument

get_model_card(as_dict=False)

A minimal model card.

Parameters:: as_dict (bool) – Return the model card as a dictionary instead of a str. Defaults to False.
Returns:: Union[str, dict] – An indented JSON object. OR A JSON object in dict form.
Return type:: Union[str, dict]

__repr__()

Prints the model_card for this MetaCAT instance.

Returns:

the ‘Model Card’ for this MetaCAT instance. This includes NER+L
config and any MetaCATs

get_strategy()

Return type:: SerialisingStrategy

__eq__(other)

Parameters:: other (Any)
Return type:: bool

class medcat.utils.legacy.convert_meta_cat.MetaCATAddon(config, base_tokenizer, meta_cat)

Bases: medcat.components.addons.addons.AddonComponent

Base/abstract addon component class.

Parameters:

config (medcat.config.config_meta_cat.ConfigMetaCAT)
base_tokenizer (medcat.tokenizing.tokenizers.BaseTokenizer)
meta_cat (Optional[MetaCAT])

addon_type = 'meta_cat'

output_key = 'meta_anns'

config: medcat.config.config_meta_cat.ConfigMetaCAT

__init__(config, base_tokenizer, meta_cat)

Parameters:

config (medcat.config.config_meta_cat.ConfigMetaCAT)
base_tokenizer (medcat.tokenizing.tokenizers.BaseTokenizer)
meta_cat (Optional[MetaCAT])

Return type:

None

base_tokenizer

_mc

_name

property mc: MetaCAT

Return type:: MetaCAT

classmethod create_new(config, base_tokenizer, tknzer_preprocessor=None)

Factory method to create a new MetaCATAddon instance.

Parameters:

config (medcat.config.config_meta_cat.ConfigMetaCAT)
base_tokenizer (medcat.tokenizing.tokenizers.BaseTokenizer)
tknzer_preprocessor (TokenizerPreprocessor)

Return type:

MetaCATAddon

classmethod create_new_component(cnf, tokenizer, cdb, vocab, model_load_path)

Create a new component or load one off disk if load path presented.

This may raise an exception if the wrong type of config is provided.

Parameters:

cnf (ComponentConfig) – The config relevant to this components.
tokenizer (BaseTokenizer) – The base tokenizer.
cdb (CDB) – The CDB.
vocab (Vocab) – The Vocab.
model_load_path (Optional[str]) – Model load path (if present).

Returns:

Self – The new components.

Return type:

MetaCATAddon

classmethod load_existing(cnf, base_tokenizer, load_path)

Factory method to load an existing MetaCATAddon from disk.

Parameters:

cnf (medcat.config.config_meta_cat.ConfigMetaCAT)
base_tokenizer (medcat.tokenizing.tokenizers.BaseTokenizer)
load_path (str)

Return type:

MetaCATAddon

property name: str

The name of the component.

Return type:: str

__call__(doc)

Parameters:: doc (medcat.tokenizing.tokens.MutableDocument)
Return type:: medcat.tokenizing.tokens.MutableDocument

load(folder_path)

Parameters:: folder_path (str)
Return type:: MetaCAT

classmethod _load_tokenizer(config, tokenizer_folder)

Parameters:

config (medcat.config.config_meta_cat.ConfigMetaCAT)
tokenizer_folder (str)

Return type:

Optional[medcat.components.addons.meta_cat.mctokenizers.tokenizers.TokenizerWrapperBase]

classmethod _get_meta_cat_and_tokenizer_paths(folder_path)

Parameters:: folder_path (str)
Return type:: tuple[str, str]

save(folder_path)

Parameters:: folder_path (str)
Return type:: None

_init_data_paths()

property include_in_output: bool

Return type:: bool

get_output_key_val(ent)

Parameters:: ent (medcat.tokenizing.tokens.MutableEntity)
Return type:: tuple[str, dict[str, MetaAnnotationValue]]

serialise_to(folder_path)

Parameters:: folder_path (str)
Return type:: None

classmethod deserialise_from(folder_path, **init_kwargs)

Parameters:: folder_path (str)
Return type:: MetaCATAddon

get_strategy()

Return type:: medcat.storage.serialisables.SerialisingStrategy

classmethod get_init_attrs()

Return type:: list[str]

classmethod ignore_attrs()

Return type:: list[str]

classmethod include_properties()

Return type:: list[str]

get_hash()

Return type:: str

NAME_PREFIX: str = 'addon_'

NAME_SPLITTER: str = '.'

is_core()

Whether the component is a core component or not.

Returns:: bool – Whether this is a core component.
Return type:: bool

classmethod get_folder_name_for_addon_and_name(addon_type, name)

Parameters:

addon_type (str)
name (str)

Return type:

str

get_folder_name()

Return type:: str

property full_name: str

Name with the component type (e.g ner, linking, meta).

Return type:: str

__slots__ = ()

_is_protocol = True

_is_runtime_protocol = False

classmethod __init_subclass__(*args, **kwargs)

classmethod __class_getitem__(params)

class medcat.utils.legacy.convert_meta_cat.TokenizerWrapperBase(hf_tokenizer=None)

Bases: abc.ABC

Helper class that provides a standard way to create an ABC using inheritance.

Parameters:: hf_tokenizer (Optional[tokenizers.Tokenizer])

name: str

__init__(hf_tokenizer=None)

Parameters:: hf_tokenizer (Optional[tokenizers.Tokenizer])
Return type:: None

hf_tokenizers = None

__call__(text: str) → dict
__call__(text: list[str]) → list[dict]

abstract save(dir_path)

Parameters:: dir_path (str)
Return type:: None

classmethod load(dir_path, model_variant='', **kwargs)

Abstractmethod:

Parameters:

dir_path (str)
model_variant (Optional[str])

Return type:

tokenizers.Tokenizer

abstract get_size()

Return type:: int

abstract token_to_id(token)

Parameters:: token (str)
Return type:: Union[int, list[int]]

abstract get_pad_id()

Return type:: Union[Optional[int], list[int]]

ensure_tokenizer()

Return type:: tokenizers.Tokenizer

__slots__ = ()

medcat.utils.legacy.convert_meta_cat.load_tokenizer(config, tokenizer_folder)

Parameters:

config (medcat.config.config_meta_cat.ConfigMetaCAT)
tokenizer_folder (str)

Return type:

Optional[TokenizerWrapperBase]

class medcat.utils.legacy.convert_meta_cat.ConfigMetaCAT(/, **data)

Bases: medcat.config.config.ComponentConfig

The MetaCAT part of the config

Parameters:: data (Any)

comp_name: str = 'meta_cat'

The name of the component.

If a custom implementation is required, it needs to be registered using `medcat.components.types.register_core_component(

<core component type>, <component name>, <implementing class>)

By default, only the ‘default’ component is registered.

general: General

model: Model

train: Train

class Config

extra = 'allow'

validate_assignment = True

_is_dirty: bool = False

__setattr__(name, value)

Parameters:

name (str)
value (Any)

property is_dirty: bool

Return type:: bool

mark_clean()

get_strategy()

Return type:: medcat.storage.serialisables.SerialisingStrategy

classmethod get_init_attrs()

Return type:: list[str]

classmethod ignore_attrs()

Return type:: list[str]

classmethod include_properties()

Return type:: list[str]

merge_config(other)

Merge this config with another config’s (partial) model dump.

The exepctation is that the other dict is a partial model dump. Values specified there are overwritten into the current config. Values not specified there are left intact.

The other config can have keys/values that do not exist in the config or sub-config. And they will be added where possible.

Parameters:: other (dict) – The model dump
Raises:: IncorrectConfigValues – If unable to set the attribute, trying to set incorrect value, or trying to set sub-config values in an incorrect format (non-dict).

classmethod load(path)

Parameters:: path (str)
Return type:: typing_extensions.Self

model_config: ClassVar[pydantic.config.ConfigDict]: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, pydantic.fields.FieldInfo]]

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

model_computed_fields: ClassVar[Dict[str, pydantic.fields.ComputedFieldInfo]]: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

__class_vars__: ClassVar[set[str]]: The names of the class variables defined on the model.

__private_attributes__: ClassVar[Dict[str, pydantic.fields.ModelPrivateAttr]]: Metadata about the private attributes of the model.

__signature__: ClassVar[inspect.Signature]: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: ClassVar[bool] = False: Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__: ClassVar[pydantic_core.CoreSchema]: The core schema of the model.

__pydantic_custom_init__: ClassVar[bool]: Whether the model has a custom __init__ method.

__pydantic_decorators__: ClassVar[pydantic._internal._decorators.DecoratorInfos]: Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__: ClassVar[pydantic._internal._generics.PydanticGenericMetadata]: Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__: ClassVar[Dict[str, Any] | None] = None: Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__: ClassVar[None | Literal['model_post_init']]: The name of the post-init method for the model, if defined.

__pydantic_root_model__: ClassVar[bool] = False: Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__: ClassVar[pydantic_core.SchemaSerializer]: The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__: ClassVar[pydantic_core.SchemaValidator | pydantic.plugin._schema_validator.PluggableSchemaValidator]: The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_extra__: dict[str, Any] | None: A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__: set[str]: The names of fields explicitly set during instantiation.

__pydantic_private__: dict[str, Any] | None: Values of private attributes set on the model instance.

__slots__ = ('__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__')

__init__(/, **data)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)
Return type:: None

property model_extra: dict[str, Any] | None

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or `None` if `config.extra` is not set to `”allow”`.
Return type:: dict[str, Any] | None

property model_fields_set: set[str]

Returns the set of fields that have been explicitly set on this model instance.

Returns:: A set of strings representing the fields that have been set, – i.e. that were not filled from defaults.
Return type:: set[str]

classmethod model_construct(_fields_set=None, **values)

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.

Returns:

A new instance of the `Model` class with validated data.

Return type:

typing_extensions.Self

model_copy(*, update=None, deep=False)

Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#model_copy

Returns a copy of the model.

Parameters:

update (dict[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.

Returns:

New model instance.

Return type:

typing_extensions.Self

model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, serialize_as_any=False)

Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (IncEx | None) – A set of fields to include in the output.
exclude (IncEx | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

Return type:

dict[str, Any]

model_dump_json(*, indent=None, include=None, exclude=None, context=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, serialize_as_any=False)

Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump_json

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
include (IncEx | None) – Field(s) to include in the JSON output.
exclude (IncEx | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Return type:

str

classmethod model_json_schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, schema_generator=GenerateJsonSchema, mode='validation')

Generates a JSON schema for a model class.

Parameters:

by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
schema_generator (type[pydantic.json_schema.GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (pydantic.json_schema.JsonSchemaMode) – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

Return type:

dict[str, Any]

classmethod model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params (tuple[type[Any], Ellipsis]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where `params` are passed to `cls` as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.
Return type:: str

model_post_init(__context)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Parameters:: __context (Any)
Return type:: None

classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (dict[str, Any] | None) – The types namespace, defaults to None.

Returns:

Returns `None` if the schema is already “complete” and rebuilding was not required.
If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`.

Return type:

bool | None

classmethod model_validate(obj, *, strict=None, from_attributes=None, context=None)

Validate a pydantic model instance.

Parameters:

obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

Return type:

typing_extensions.Self

classmethod model_validate_json(json_data, *, strict=None, context=None)

Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing

Validate the given JSON data against the Pydantic model.

Parameters:

json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
context (Any | None) – Extra variables to pass to the validator.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

Return type:

typing_extensions.Self

classmethod model_validate_strings(obj, *, strict=None, context=None)

Validate the given object with string data against the Pydantic model.

Parameters:

obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
context (Any | None) – Extra variables to pass to the validator.

Returns:

The validated Pydantic model.

Return type:

typing_extensions.Self

classmethod __get_pydantic_core_schema__(source, handler, /)

Hook into generating the model’s CoreSchema.

Parameters:

source (type[BaseModel]) – The class we are generating a schema for. This will generally be the same as the cls argument if this is a classmethod.
handler (pydantic.annotated_handlers.GetCoreSchemaHandler) – A callable that calls into Pydantic’s internal CoreSchema generation logic.

Returns:

A `pydantic-core` `CoreSchema`.

Return type:

pydantic_core.CoreSchema

classmethod __get_pydantic_json_schema__(core_schema, handler, /)

Hook into generating the model’s JSON schema.

Parameters:

core_schema (pydantic_core.CoreSchema) – A pydantic-core CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema ({‘type’: ‘nullable’, ‘schema’: current_schema}), or just call the handler with the original schema.
handler (pydantic.annotated_handlers.GetJsonSchemaHandler) – Call into Pydantic’s internal JSON schema generation. This will raise a pydantic.errors.PydanticInvalidForJsonSchema if JSON schema generation fails. Since this gets called by BaseModel.model_json_schema you can override the schema_generator argument to that function to change JSON schema generation globally for a type.

Returns:

A JSON schema, as a Python object.

Return type:

pydantic.json_schema.JsonSchemaValue

classmethod __pydantic_init_subclass__(**kwargs)

This is intended to behave just like __init_subclass__, but is called by ModelMetaclass only after the class is actually fully initialized. In particular, attributes like model_fields will be present when this is called.

This is necessary because __init_subclass__ will always be called by type.__new__, and it would require a prohibitively large refactor to the ModelMetaclass to ensure that type.__new__ was called in such a manner that the class would already be sufficiently initialized.

This will receive the same kwargs that would be passed to the standard __init_subclass__, namely, any kwargs passed to the class definition that aren’t used internally by pydantic.

Parameters:: **kwargs (Any) – Any keyword arguments passed to the class definition that aren’t used internally by pydantic.
Return type:: None

classmethod __class_getitem__(typevar_values)

Parameters:: typevar_values (type[Any] | tuple[type[Any], Ellipsis])
Return type:: type[BaseModel] | pydantic._internal._forward_ref.PydanticRecursiveRef

__copy__()

Returns a shallow copy of the model.

Return type:: typing_extensions.Self

__deepcopy__(memo=None)

Returns a deep copy of the model.

Parameters:: memo (dict[int, Any] | None)
Return type:: typing_extensions.Self

__getattr__(item)

Parameters:: item (str)
Return type:: Any

_check_frozen(name, value)

Parameters:

name (str)
value (Any)

Return type:

None

__getstate__()

Return type:: dict[Any, Any]

__setstate__(state)

Parameters:: state (dict[Any, Any])
Return type:: None

__eq__(other)

Parameters:: other (Any)
Return type:: bool

classmethod __init_subclass__(**kwargs)

This signature is included purely to help type-checkers check arguments to class declaration, which provides a way to conveniently set model_config key/value pairs.

```py from pydantic import BaseModel

class MyModel(BaseModel, extra=’allow’): … ```

However, this may be deceiving, since the _actual_ calls to __init_subclass__ will not receive any of the config arguments, and will only receive any keyword arguments passed during class initialization that are _not_ expected keys in ConfigDict. (This is due to the way ModelMetaclass.__new__ works.)

Parameters:: **kwargs (typing_extensions.Unpack[pydantic.config.ConfigDict]) – Keyword arguments passed to the class definition, which set model_config

Note

You may want to override __pydantic_init_subclass__ instead, which behaves similarly but is called after the class is fully initialized.

__iter__()

So dict(model) works.

Return type:: TupleGenerator

__repr__()

Return type:: str

__repr_args__()

Return type:: pydantic._internal._repr.ReprArgs

__repr_name__

__repr_str__

__pretty__

__rich_repr__

__str__()

Return type:: str

property __fields__: dict[str, pydantic.fields.FieldInfo]

Return type:: dict[str, pydantic.fields.FieldInfo]

property __fields_set__: set[str]

Return type:: set[str]

dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)

Parameters:

include (IncEx | None)
exclude (IncEx | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)

Return type:

Dict[str, Any]

json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)

Parameters:

include (IncEx | None)
exclude (IncEx | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)
encoder (Callable[[Any], Any] | None)
models_as_dict (bool)
dumps_kwargs (Any)

Return type:

str

classmethod parse_obj(obj)

Parameters:: obj (Any)
Return type:: typing_extensions.Self

classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

Parameters:

b (str | bytes)
content_type (str | None)
encoding (str)
proto (pydantic.deprecated.parse.Protocol | None)
allow_pickle (bool)

Return type:

typing_extensions.Self

classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)

Parameters:

path (str | pathlib.Path)
content_type (str | None)
encoding (str)
proto (pydantic.deprecated.parse.Protocol | None)
allow_pickle (bool)

Return type:

typing_extensions.Self

classmethod from_orm(obj)

Parameters:: obj (Any)
Return type:: typing_extensions.Self

classmethod construct(_fields_set=None, **values)

Parameters:

_fields_set (set[str] | None)
values (Any)

Return type:

typing_extensions.Self

copy(*, include=None, exclude=None, update=None, deep=False)

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`py data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include (pydantic._internal._utils.AbstractSetIntStr | pydantic._internal._utils.MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (pydantic._internal._utils.AbstractSetIntStr | pydantic._internal._utils.MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

Return type:

typing_extensions.Self

classmethod schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE)

Parameters:

by_alias (bool)
ref_template (str)

Return type:

Dict[str, Any]

classmethod schema_json(*, by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, **dumps_kwargs)

Parameters:

by_alias (bool)
ref_template (str)
dumps_kwargs (Any)

Return type:

str

classmethod validate(value)

Parameters:: value (Any)
Return type:: typing_extensions.Self

classmethod update_forward_refs(**localns)

Parameters:: localns (Any)
Return type:: None

_iter(*args, **kwargs)

Parameters:

args (Any)
kwargs (Any)

Return type:

Any

_copy_and_set_values(*args, **kwargs)

Parameters:

args (Any)
kwargs (Any)

Return type:

Any

classmethod _get_value(*args, **kwargs)

Parameters:

args (Any)
kwargs (Any)

Return type:

Any

_calculate_keys(*args, **kwargs)

Parameters:

args (Any)
kwargs (Any)

Return type:

Any

class medcat.utils.legacy.convert_meta_cat.BaseTokenizer

Bases: Protocol

The base tokenizer protocol.

create_entity(doc, token_start_index, token_end_index, label)

Create an entity from a document.

Parameters:

doc (MutableDocument) – The document to use.
token_start_index (int) – The token start index.
token_end_index (int) – The token end index.
label (str) – The label.

Returns:

MutableEntity – The resulting entity.

Return type:

medcat.tokenizing.tokens.MutableEntity

entity_from_tokens(tokens)

Get an entity from the list of tokens.

Parameters:: tokens (list[MutableToken]) – List of tokens.
Returns:: MutableEntity – The resulting entity.
Return type:: medcat.tokenizing.tokens.MutableEntity

__call__(text)

Parameters:: text (str)
Return type:: medcat.tokenizing.tokens.MutableDocument

classmethod create_new_tokenizer(config)

Parameters:: config (medcat.config.Config)
Return type:: typing_extensions.Self

get_doc_class()

Get the document implementation class used by the tokenizer.

This can be used (e.g) to register addon paths.

Returns:: Type[MutableDocument] – The document class.
Return type:: Type[medcat.tokenizing.tokens.MutableDocument]

get_entity_class()

Get the entity implementation class used by the tokenizer.

Returns:: Type[MutableEntity] – The entity class.
Return type:: Type[medcat.tokenizing.tokens.MutableEntity]

__slots__ = ()

_is_protocol = True

_is_runtime_protocol = False

classmethod __init_subclass__(*args, **kwargs)

classmethod __class_getitem__(params)

medcat.utils.legacy.convert_meta_cat.fix_old_style_cnf(data, remove={'py/object', '__fields_set__', '__private_attribute_values__'}, take_from='py/state.__dict__')

Parameters:

data (dict)
remove (set[str])
take_from (str)

medcat.utils.legacy.convert_meta_cat.logger

medcat.utils.legacy.convert_meta_cat._load_legacy(config, save_dir_path)

Parameters:

config (medcat.config.config_meta_cat.ConfigMetaCAT)
save_dir_path (str)

Return type:

medcat.components.addons.meta_cat.MetaCAT

medcat.utils.legacy.convert_meta_cat.load_cnf(cnf_path)

Parameters:: cnf_path (str)
Return type:: medcat.config.config_meta_cat.ConfigMetaCAT

medcat.utils.legacy.convert_meta_cat.get_meta_cat_from_old(old_path, tokenizer)

Convert a v1 MetaCAT folder to a v2 MetaCAT.

Parameters:

old_path (str) – The v1 MetaCAT file path.
tokenizer (BaseTokenizer) – The tokenizer.

Returns:

MetaCATAddon – The v2 MetaCAT.

Return type:

medcat.components.addons.meta_cat.MetaCATAddon