medcat.utils.legacy.convert_meta_cat
====================================

.. py:module:: medcat.utils.legacy.convert_meta_cat


Attributes
----------

.. autoapisummary::

   medcat.utils.legacy.convert_meta_cat.logger


Classes
-------

.. autoapisummary::

   medcat.utils.legacy.convert_meta_cat.MetaCAT
   medcat.utils.legacy.convert_meta_cat.MetaCATAddon
   medcat.utils.legacy.convert_meta_cat.TokenizerWrapperBase
   medcat.utils.legacy.convert_meta_cat.ConfigMetaCAT
   medcat.utils.legacy.convert_meta_cat.BaseTokenizer


Functions
---------

.. autoapisummary::

   medcat.utils.legacy.convert_meta_cat.load_tokenizer
   medcat.utils.legacy.convert_meta_cat.fix_old_style_cnf
   medcat.utils.legacy.convert_meta_cat._load_legacy
   medcat.utils.legacy.convert_meta_cat.load_cnf
   medcat.utils.legacy.convert_meta_cat.get_meta_cat_from_old


Module Contents
---------------

.. py:class:: MetaCAT(tokenizer = None, embeddings = None, config = None, _model_state_dict = None)

   Bases: :py:obj:`medcat.storage.serialisables.AbstractSerialisable`


   The MetaCAT class used for training 'Meta-Annotation' models,
   i.e. annotations of clinical concept annotations. These are also
   known as properties or attributes of recognise entities sin similar
   tools such as MetaMap and cTakes.

   This is a flexible model agnostic class that can learns any
   meta-annotation task, i.e. any multi-class classification task
   for recognised terms.

   :param tokenizer: The Huggingface tokenizer instance. This can be a pre-trained
                     tokenzier instance from a BERT-style model, or trained from
                     scratch for the Bi-LSTM (w. attention) model that is currentl
                       used in most deployments.
   :type tokenizer: TokenizerWrapperBase
   :param embeddings: embedding mapping (sub)word input id n-dim (sub)word embedding.
   :type embeddings: Tensor, numpy.ndarray
   :param config: the configuration for MetaCAT. Param descriptions available in
                  ConfigMetaCAT docs.
   :type config: ConfigMetaCAT


   .. py:attribute:: name
      :value: 'meta_cat'


   .. py:attribute:: _component_lock


   .. py:method:: get_init_attrs()
      :classmethod:


   .. py:method:: ignore_attrs()
      :classmethod:


   .. py:method:: include_properties()
      :classmethod:


   .. py:property:: _model_state_dict


   .. py:method:: __init__(tokenizer = None, embeddings = None, config = None, _model_state_dict = None)


   .. py:attribute:: config
      :value: None


   .. py:attribute:: tokenizer
      :value: None


   .. py:attribute:: embeddings


   .. py:attribute:: model


   .. py:method:: _reset_tokenizer_info()


   .. py:method:: get_model(embeddings)

      Get the model

      :param embeddings: The embedding densor
      :type embeddings: Optional[Tensor]

      :raises ValueError: If the meta model is not LSTM or BERT

      :Returns: **nn.Module** -- The module


   .. py:method:: get_hash()

      A partial hash trying to catch differences between models.

      :Returns: **str** -- The hex hash.


   .. py:method:: train_from_json(json_path, save_dir_path = None, data_oversampled = None, overwrite = False)

      Train or continue training a model give a json_path containing
      a MedCATtrainer export. It will continue training if an existing
      model is loaded or start new training if the model is blank/new.

      :param json_path: Path/Paths to a MedCATtrainer export containing the
                        meta_annotations we want to train for.
      :type json_path: Union[str, list]
      :param save_dir_path: In case we have aut_save_model (meaning during the
                            training the best model will be saved) we need to
                            set a save path. Defaults to `None`.
      :type save_dir_path: Optional[str]
      :param data_oversampled: In case of oversampling being performed, the data
                               will be passed in the parameter allowing the
                               model to be trained on original + synthetic data.
      :type data_oversampled: Optional[list]
      :param overwrite: Whether to allow overwriting the file if/when appropriate.
      :type overwrite: bool

      :Returns: **dict** -- The resulting report.


   .. py:method:: train_raw(data_loaded, save_dir_path = None, data_oversampled = None, overwrite = False)

      Train or continue training a model given raw data. It will continue
      training if an existing model is loaded or start new training if
      the model is blank/new.

      The raw data is expected in the following format:
      {
          'projects': [  # list of projects
              {
                  'name': '<project_name>',
                  'documents': [  # list of documents
                      {
                          'name': '<document_name>',
                          'text': '<text_of_document>',
                          'annotations': [  # list of annotations
                              {
                                  # start index of the annotation
                                  'start': -1,
                                  'end': 1,    # end index of the annotation
                                  'cui': 'cui',
                                  'value': '<annotation_value>'
                              },
                              ...
                          ],
                      },
                      ...
                  ]
              },
              ...
          ]
      }

      :param data_loaded: The raw data we want to train for.
      :type data_loaded: dict
      :param save_dir_path: In case we have aut_save_model (meaning during the training
                            the best model will be saved) we need to set a save path.
                            Defaults to `None`.
      :type save_dir_path: Optional[str]
      :param data_oversampled: In case of oversampling being performed, the data will be
                               passed in the parameter allowing the model to be trained on
                               original + synthetic data. The format of which is expected:
                               [[['text','of','the','document'], [index of medical entity],
                                   "label" ],
                               ['text','of','the','document'], [index of medical entity],
                                   "label" ]]
      :type data_oversampled: Optional[list]
      :param overwrite: Whether to allow overwriting the file if/when appropriate.
      :type overwrite: bool

      :Returns: **dict** -- The resulting report.

      :raises Exception: If no save path is specified, or category name
          not in data.
      :raises AssertionError: If no tokeniser is set
      :raises FileNotFoundError: If phase_number is set to 2 and model.dat
          file is not found
      :raises KeyError: If phase_number is set to 2 and model.dat file
          contains mismatched architecture


   .. py:method:: eval(json_path)

      Evaluate from json.

      :param json_path: The json file ath
      :type json_path: str

      :Returns: **dict** -- The resulting model dict

      :raises AssertionError: If self.tokenizer
      :raises Exception: If the category name does not exist


   .. py:method:: get_ents(doc)


   .. py:method:: prepare_document(doc, input_ids, offset_mapping, lowercase)

      Prepares document.

      :param doc: The document
      :type doc: Doc
      :param input_ids: Input ids
      :type input_ids: list
      :param offset_mapping: Offset mappings
      :type offset_mapping: list
      :param lowercase: Whether to use lower case replace center
      :type lowercase: bool

      :Returns: **tuple[dict, list]** -- Entity id to index mapping
                and
                Samples


   .. py:method:: batch_generator(stream, batch_size_chars)
      :staticmethod:


      Generator for batch of documents.

      :param stream: The document stream
      :type stream: Iterable[MutableDocument]
      :param batch_size_chars: Number of characters per batch
      :type batch_size_chars: int

      :Yields: *list[MutableDocument]* -- The batch of documents.


   .. py:method:: _set_meta_anns(doc, id2category_value)


   .. py:method:: __call__(doc)

      Process one document, used in the spacy pipeline for sequential
      document processing.

      :param doc: A spacy document
      :type doc: Doc

      :Returns: **Doc** -- The same spacy document.


   .. py:method:: get_model_card(as_dict = False)

      A minimal model card.

      :param as_dict: Return the model card as a dictionary instead of a str.
                      Defaults to `False`.
      :type as_dict: bool

      :Returns: **Union[str, dict]** -- An indented JSON object.
                OR A JSON object in dict form.


   .. py:method:: __repr__()

      Prints the model_card for this MetaCAT instance.

      :Returns: * **the 'Model Card' for this MetaCAT instance. This includes NER+L**
                * **config and any MetaCATs**


   .. py:method:: get_strategy()


   .. py:method:: __eq__(other)


.. py:class:: MetaCATAddon(config, base_tokenizer, meta_cat)

   Bases: :py:obj:`medcat.components.addons.addons.AddonComponent`


   Base/abstract addon component class.


   .. py:attribute:: addon_type
      :value: 'meta_cat'


   .. py:attribute:: output_key
      :value: 'meta_anns'


   .. py:attribute:: config
      :type:  medcat.config.config_meta_cat.ConfigMetaCAT


   .. py:method:: __init__(config, base_tokenizer, meta_cat)


   .. py:attribute:: base_tokenizer


   .. py:attribute:: _mc


   .. py:attribute:: _name


   .. py:property:: mc
      :type: MetaCAT


   .. py:method:: create_new(config, base_tokenizer, tknzer_preprocessor = None)
      :classmethod:


      Factory method to create a new MetaCATAddon instance.


   .. py:method:: create_new_component(cnf, tokenizer, cdb, vocab, model_load_path)
      :classmethod:


      Create a new component or load one off disk if load path presented.

      This may raise an exception if the wrong type of config is provided.

      :param cnf: The config relevant to this components.
      :type cnf: ComponentConfig
      :param tokenizer: The base tokenizer.
      :type tokenizer: BaseTokenizer
      :param cdb: The CDB.
      :type cdb: CDB
      :param vocab: The Vocab.
      :type vocab: Vocab
      :param model_load_path: Model load path (if present).
      :type model_load_path: Optional[str]

      :Returns: **Self** -- The new components.


   .. py:method:: load_existing(cnf, base_tokenizer, load_path)
      :classmethod:


      Factory method to load an existing MetaCATAddon from disk.


   .. py:property:: name
      :type: str


      The name of the component.


   .. py:method:: __call__(doc)


   .. py:method:: load(folder_path)


   .. py:method:: _load_tokenizer(config, tokenizer_folder)
      :classmethod:


   .. py:method:: _get_meta_cat_and_tokenizer_paths(folder_path)
      :classmethod:


   .. py:method:: save(folder_path)


   .. py:method:: _init_data_paths()


   .. py:property:: include_in_output
      :type: bool


   .. py:method:: get_output_key_val(ent)


   .. py:method:: serialise_to(folder_path)


   .. py:method:: deserialise_from(folder_path, **init_kwargs)
      :classmethod:


   .. py:method:: get_strategy()


   .. py:method:: get_init_attrs()
      :classmethod:


   .. py:method:: ignore_attrs()
      :classmethod:


   .. py:method:: include_properties()
      :classmethod:


   .. py:method:: get_hash()


   .. py:attribute:: NAME_PREFIX
      :type:  str
      :value: 'addon_'


   .. py:attribute:: NAME_SPLITTER
      :type:  str
      :value: '.'


   .. py:method:: is_core()

      Whether the component is a core component or not.

      :Returns: **bool** -- Whether this is a core component.


   .. py:method:: get_folder_name_for_addon_and_name(addon_type, name)
      :classmethod:


   .. py:method:: get_folder_name()


   .. py:property:: full_name
      :type: str


      Name with the component type (e.g ner, linking, meta).


   .. py:attribute:: __slots__
      :value: ()


   .. py:attribute:: _is_protocol
      :value: True


   .. py:attribute:: _is_runtime_protocol
      :value: False


   .. py:method:: __init_subclass__(*args, **kwargs)
      :classmethod:


   .. py:method:: __class_getitem__(params)
      :classmethod:


.. py:class:: TokenizerWrapperBase(hf_tokenizer = None)

   Bases: :py:obj:`abc.ABC`


   Helper class that provides a standard way to create an ABC using
   inheritance.


   .. py:attribute:: name
      :type:  str


   .. py:method:: __init__(hf_tokenizer = None)


   .. py:attribute:: hf_tokenizers
      :value: None


   .. py:method:: __call__(text: str) -> dict
                  __call__(text: list[str]) -> list[dict]


   .. py:method:: save(dir_path)
      :abstractmethod:


   .. py:method:: load(dir_path, model_variant = '', **kwargs)
      :classmethod:

      :abstractmethod:


   .. py:method:: get_size()
      :abstractmethod:


   .. py:method:: token_to_id(token)
      :abstractmethod:


   .. py:method:: get_pad_id()
      :abstractmethod:


   .. py:method:: ensure_tokenizer()


   .. py:attribute:: __slots__
      :value: ()


.. py:function:: load_tokenizer(config, tokenizer_folder)

.. py:class:: ConfigMetaCAT(/, **data)

   Bases: :py:obj:`medcat.config.config.ComponentConfig`


   The MetaCAT part of the config


   .. py:attribute:: comp_name
      :type:  str
      :value: 'meta_cat'


      The name of the component.

      If a custom implementation is required, it needs to be registered
      using `medcat.components.types.register_core_component(
              <core component type>, <component name>, <implementing class>)
      By default, only the 'default' component is registered.


   .. py:attribute:: general
      :type:  General


   .. py:attribute:: model
      :type:  Model


   .. py:attribute:: train
      :type:  Train


   .. py:class:: Config

      .. py:attribute:: extra
         :value: 'allow'


      .. py:attribute:: validate_assignment
         :value: True


   .. py:attribute:: _is_dirty
      :type:  bool
      :value: False


   .. py:method:: __setattr__(name, value)


   .. py:property:: is_dirty
      :type: bool


   .. py:method:: mark_clean()


   .. py:method:: get_strategy()


   .. py:method:: get_init_attrs()
      :classmethod:


   .. py:method:: ignore_attrs()
      :classmethod:


   .. py:method:: include_properties()
      :classmethod:


   .. py:method:: merge_config(other)

      Merge this config with another config's (partial) model dump.

      The exepctation is that the `other` dict is a partial model dump.
      Values specified there are overwritten into the current config.
      Values not specified there are left intact.

      The `other` config can have keys/values that do not exist in the
      config or sub-config. And they will be added where possible.

      :param other: The model dump
      :type other: dict

      :raises IncorrectConfigValues: If unable to set the attribute,
          trying to set incorrect value, or trying to set sub-config
          values in an incorrect format (non-dict).


   .. py:method:: load(path)
      :classmethod:


   .. py:attribute:: model_config
      :type:  ClassVar[pydantic.config.ConfigDict]

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


   .. py:attribute:: model_fields
      :type:  ClassVar[Dict[str, pydantic.fields.FieldInfo]]

      Metadata about the fields defined on the model,
      mapping of field names to [`FieldInfo`][pydantic.fields.FieldInfo] objects.

      This replaces `Model.__fields__` from Pydantic V1.


   .. py:attribute:: model_computed_fields
      :type:  ClassVar[Dict[str, pydantic.fields.ComputedFieldInfo]]

      A dictionary of computed field names and their corresponding `ComputedFieldInfo` objects.


   .. py:attribute:: __class_vars__
      :type:  ClassVar[set[str]]

      The names of the class variables defined on the model.


   .. py:attribute:: __private_attributes__
      :type:  ClassVar[Dict[str, pydantic.fields.ModelPrivateAttr]]

      Metadata about the private attributes of the model.


   .. py:attribute:: __signature__
      :type:  ClassVar[inspect.Signature]

      The synthesized `__init__` [`Signature`][inspect.Signature] of the model.


   .. py:attribute:: __pydantic_complete__
      :type:  ClassVar[bool]
      :value: False


      Whether model building is completed, or if there are still undefined fields.


   .. py:attribute:: __pydantic_core_schema__
      :type:  ClassVar[pydantic_core.CoreSchema]

      The core schema of the model.


   .. py:attribute:: __pydantic_custom_init__
      :type:  ClassVar[bool]

      Whether the model has a custom `__init__` method.


   .. py:attribute:: __pydantic_decorators__
      :type:  ClassVar[pydantic._internal._decorators.DecoratorInfos]

      Metadata containing the decorators defined on the model.
      This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.


   .. py:attribute:: __pydantic_generic_metadata__
      :type:  ClassVar[pydantic._internal._generics.PydanticGenericMetadata]

      Metadata for generic models; contains data used for a similar purpose to
      __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.


   .. py:attribute:: __pydantic_parent_namespace__
      :type:  ClassVar[Dict[str, Any] | None]
      :value: None


      Parent namespace of the model, used for automatic rebuilding of models.


   .. py:attribute:: __pydantic_post_init__
      :type:  ClassVar[None | Literal['model_post_init']]

      The name of the post-init method for the model, if defined.


   .. py:attribute:: __pydantic_root_model__
      :type:  ClassVar[bool]
      :value: False


      Whether the model is a [`RootModel`][pydantic.root_model.RootModel].


   .. py:attribute:: __pydantic_serializer__
      :type:  ClassVar[pydantic_core.SchemaSerializer]

      The `pydantic-core` `SchemaSerializer` used to dump instances of the model.


   .. py:attribute:: __pydantic_validator__
      :type:  ClassVar[pydantic_core.SchemaValidator | pydantic.plugin._schema_validator.PluggableSchemaValidator]

      The `pydantic-core` `SchemaValidator` used to validate instances of the model.


   .. py:attribute:: __pydantic_extra__
      :type:  dict[str, Any] | None

      A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra] is set to `'allow'`.


   .. py:attribute:: __pydantic_fields_set__
      :type:  set[str]

      The names of fields explicitly set during instantiation.


   .. py:attribute:: __pydantic_private__
      :type:  dict[str, Any] | None

      Values of private attributes set on the model instance.


   .. py:attribute:: __slots__
      :value: ('__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__')


   .. py:method:: __init__(/, **data)

      Create a new model by parsing and validating input data from keyword arguments.

      Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
      validated to form a valid model.

      `self` is explicitly positional-only to allow `self` as a field name.


   .. py:property:: model_extra
      :type: dict[str, Any] | None


      Get extra fields set during validation.

      :Returns: **A dictionary of extra fields, or `None` if `config.extra` is not set to `"allow"`.**


   .. py:property:: model_fields_set
      :type: set[str]


      Returns the set of fields that have been explicitly set on this model instance.

      :Returns: **A set of strings representing the fields that have been set,** -- i.e. that were not filled from defaults.


   .. py:method:: model_construct(_fields_set = None, **values)
      :classmethod:


      Creates a new instance of the `Model` class with validated data.

      Creates a new model setting `__dict__` and `__pydantic_fields_set__` from trusted or pre-validated data.
      Default values are respected, but no other validation is performed.

      !!! note
          `model_construct()` generally respects the `model_config.extra` setting on the provided model.
          That is, if `model_config.extra == 'allow'`, then all extra passed values are added to the model instance's `__dict__`
          and `__pydantic_extra__` fields. If `model_config.extra == 'ignore'` (the default), then all extra passed values are ignored.
          Because no validation is performed with a call to `model_construct()`, having `model_config.extra == 'forbid'` does not result in
          an error if extra values are passed, but they will be ignored.

      :param _fields_set: A set of field names that were originally explicitly set during instantiation. If provided,
                          this is directly used for the [`model_fields_set`][pydantic.BaseModel.model_fields_set] attribute.
                          Otherwise, the field names from the `values` argument will be used.
      :param values: Trusted or pre-validated data dictionary.

      :Returns: **A new instance of the `Model` class with validated data.**


   .. py:method:: model_copy(*, update = None, deep = False)

      Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#model_copy

      Returns a copy of the model.

      :param update: Values to change/add in the new model. Note: the data is not validated
                     before creating the new model. You should trust this data.
      :param deep: Set to `True` to make a deep copy of the model.

      :Returns: **New model instance.**


   .. py:method:: model_dump(*, mode = 'python', include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False)

      Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump

      Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

      :param mode: The mode in which `to_python` should run.
                   If mode is 'json', the output will only contain JSON serializable types.
                   If mode is 'python', the output may contain non-JSON-serializable Python objects.
      :param include: A set of fields to include in the output.
      :param exclude: A set of fields to exclude from the output.
      :param context: Additional context to pass to the serializer.
      :param by_alias: Whether to use the field's alias in the dictionary key if defined.
      :param exclude_unset: Whether to exclude fields that have not been explicitly set.
      :param exclude_defaults: Whether to exclude fields that are set to their default value.
      :param exclude_none: Whether to exclude fields that have a value of `None`.
      :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T].
      :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors,
                       "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError].
      :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.

      :Returns: **A dictionary representation of the model.**


   .. py:method:: model_dump_json(*, indent = None, include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False)

      Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump_json

      Generates a JSON representation of the model using Pydantic's `to_json` method.

      :param indent: Indentation to use in the JSON output. If None is passed, the output will be compact.
      :param include: Field(s) to include in the JSON output.
      :param exclude: Field(s) to exclude from the JSON output.
      :param context: Additional context to pass to the serializer.
      :param by_alias: Whether to serialize using field aliases.
      :param exclude_unset: Whether to exclude fields that have not been explicitly set.
      :param exclude_defaults: Whether to exclude fields that are set to their default value.
      :param exclude_none: Whether to exclude fields that have a value of `None`.
      :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T].
      :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors,
                       "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError].
      :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.

      :Returns: **A JSON string representation of the model.**


   .. py:method:: model_json_schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, schema_generator = GenerateJsonSchema, mode = 'validation')
      :classmethod:


      Generates a JSON schema for a model class.

      :param by_alias: Whether to use attribute aliases or not.
      :param ref_template: The reference template.
      :param schema_generator: To override the logic used to generate the JSON schema, as a subclass of
                               `GenerateJsonSchema` with your desired modifications
      :param mode: The mode in which to generate the schema.

      :Returns: **The JSON schema for the given model class.**


   .. py:method:: model_parametrized_name(params)
      :classmethod:


      Compute the class name for parametrizations of generic classes.

      This method can be overridden to achieve a custom naming scheme for generic BaseModels.

      :param params: Tuple of types of the class. Given a generic class
                     `Model` with 2 type variables and a concrete model `Model[str, int]`,
                     the value `(str, int)` would be passed to `params`.

      :Returns: **String representing the new class where `params` are passed to `cls` as type variables.**

      :raises TypeError: Raised when trying to generate concrete names for non-generic models.


   .. py:method:: model_post_init(__context)

      Override this method to perform additional initialization after `__init__` and `model_construct`.
      This is useful if you want to do some validation that requires the entire model to be initialized.


   .. py:method:: model_rebuild(*, force = False, raise_errors = True, _parent_namespace_depth = 2, _types_namespace = None)
      :classmethod:


      Try to rebuild the pydantic-core schema for the model.

      This may be necessary when one of the annotations is a ForwardRef which could not be resolved during
      the initial attempt to build the schema, and automatic rebuilding fails.

      :param force: Whether to force the rebuilding of the model schema, defaults to `False`.
      :param raise_errors: Whether to raise errors, defaults to `True`.
      :param _parent_namespace_depth: The depth level of the parent namespace, defaults to 2.
      :param _types_namespace: The types namespace, defaults to `None`.

      :Returns: * **Returns `None` if the schema is already "complete" and rebuilding was not required.**
                * **If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`.**


   .. py:method:: model_validate(obj, *, strict = None, from_attributes = None, context = None)
      :classmethod:


      Validate a pydantic model instance.

      :param obj: The object to validate.
      :param strict: Whether to enforce types strictly.
      :param from_attributes: Whether to extract data from object attributes.
      :param context: Additional context to pass to the validator.

      :raises ValidationError: If the object could not be validated.

      :Returns: **The validated model instance.**


   .. py:method:: model_validate_json(json_data, *, strict = None, context = None)
      :classmethod:


      Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing

      Validate the given JSON data against the Pydantic model.

      :param json_data: The JSON data to validate.
      :param strict: Whether to enforce types strictly.
      :param context: Extra variables to pass to the validator.

      :Returns: **The validated Pydantic model.**

      :raises ValidationError: If `json_data` is not a JSON string or the object could not be validated.


   .. py:method:: model_validate_strings(obj, *, strict = None, context = None)
      :classmethod:


      Validate the given object with string data against the Pydantic model.

      :param obj: The object containing string data to validate.
      :param strict: Whether to enforce types strictly.
      :param context: Extra variables to pass to the validator.

      :Returns: **The validated Pydantic model.**


   .. py:method:: __get_pydantic_core_schema__(source, handler, /)
      :classmethod:


      Hook into generating the model's CoreSchema.

      :param source: The class we are generating a schema for.
                     This will generally be the same as the `cls` argument if this is a classmethod.
      :param handler: A callable that calls into Pydantic's internal CoreSchema generation logic.

      :Returns: **A `pydantic-core` `CoreSchema`.**


   .. py:method:: __get_pydantic_json_schema__(core_schema, handler, /)
      :classmethod:


      Hook into generating the model's JSON schema.

      :param core_schema: A `pydantic-core` CoreSchema.
                          You can ignore this argument and call the handler with a new CoreSchema,
                          wrap this CoreSchema (`{'type': 'nullable', 'schema': current_schema}`),
                          or just call the handler with the original schema.
      :param handler: Call into Pydantic's internal JSON schema generation.
                      This will raise a `pydantic.errors.PydanticInvalidForJsonSchema` if JSON schema
                      generation fails.
                      Since this gets called by `BaseModel.model_json_schema` you can override the
                      `schema_generator` argument to that function to change JSON schema generation globally
                      for a type.

      :Returns: **A JSON schema, as a Python object.**


   .. py:method:: __pydantic_init_subclass__(**kwargs)
      :classmethod:


      This is intended to behave just like `__init_subclass__`, but is called by `ModelMetaclass`
      only after the class is actually fully initialized. In particular, attributes like `model_fields` will
      be present when this is called.

      This is necessary because `__init_subclass__` will always be called by `type.__new__`,
      and it would require a prohibitively large refactor to the `ModelMetaclass` to ensure that
      `type.__new__` was called in such a manner that the class would already be sufficiently initialized.

      This will receive the same `kwargs` that would be passed to the standard `__init_subclass__`, namely,
      any kwargs passed to the class definition that aren't used internally by pydantic.

      :param \*\*kwargs: Any keyword arguments passed to the class definition that aren't used internally
                         by pydantic.


   .. py:method:: __class_getitem__(typevar_values)
      :classmethod:


   .. py:method:: __copy__()

      Returns a shallow copy of the model.


   .. py:method:: __deepcopy__(memo = None)

      Returns a deep copy of the model.


   .. py:method:: __getattr__(item)


   .. py:method:: _check_frozen(name, value)


   .. py:method:: __getstate__()


   .. py:method:: __setstate__(state)


   .. py:method:: __eq__(other)


   .. py:method:: __init_subclass__(**kwargs)
      :classmethod:


      This signature is included purely to help type-checkers check arguments to class declaration, which
      provides a way to conveniently set model_config key/value pairs.

      ```py
      from pydantic import BaseModel

      class MyModel(BaseModel, extra='allow'): ...
      ```

      However, this may be deceiving, since the _actual_ calls to `__init_subclass__` will not receive any
      of the config arguments, and will only receive any keyword arguments passed during class initialization
      that are _not_ expected keys in ConfigDict. (This is due to the way `ModelMetaclass.__new__` works.)

      :param \*\*kwargs: Keyword arguments passed to the class definition, which set model_config

      .. note::

         You may want to override `__pydantic_init_subclass__` instead, which behaves similarly but is called
         *after* the class is fully initialized.


   .. py:method:: __iter__()

      So `dict(model)` works.


   .. py:method:: __repr__()


   .. py:method:: __repr_args__()


   .. py:attribute:: __repr_name__


   .. py:attribute:: __repr_str__


   .. py:attribute:: __pretty__


   .. py:attribute:: __rich_repr__


   .. py:method:: __str__()


   .. py:property:: __fields__
      :type: dict[str, pydantic.fields.FieldInfo]


   .. py:property:: __fields_set__
      :type: set[str]


   .. py:method:: dict(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False)


   .. py:method:: json(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, encoder = PydanticUndefined, models_as_dict = PydanticUndefined, **dumps_kwargs)


   .. py:method:: parse_obj(obj)
      :classmethod:


   .. py:method:: parse_raw(b, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False)
      :classmethod:


   .. py:method:: parse_file(path, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False)
      :classmethod:


   .. py:method:: from_orm(obj)
      :classmethod:


   .. py:method:: construct(_fields_set = None, **values)
      :classmethod:


   .. py:method:: copy(*, include = None, exclude = None, update = None, deep = False)

      Returns a copy of the model.

      !!! warning "Deprecated"
          This method is now deprecated; use `model_copy` instead.

      If you need `include` or `exclude`, use:

      ```py
      data = self.model_dump(include=include, exclude=exclude, round_trip=True)
      data = {**data, **(update or {})}
      copied = self.model_validate(data)
      ```

      :param include: Optional set or mapping specifying which fields to include in the copied model.
      :param exclude: Optional set or mapping specifying which fields to exclude in the copied model.
      :param update: Optional dictionary of field-value pairs to override field values in the copied model.
      :param deep: If True, the values of fields that are Pydantic models will be deep-copied.

      :Returns: **A copy of the model with included, excluded and updated fields as specified.**


   .. py:method:: schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE)
      :classmethod:


   .. py:method:: schema_json(*, by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, **dumps_kwargs)
      :classmethod:


   .. py:method:: validate(value)
      :classmethod:


   .. py:method:: update_forward_refs(**localns)
      :classmethod:


   .. py:method:: _iter(*args, **kwargs)


   .. py:method:: _copy_and_set_values(*args, **kwargs)


   .. py:method:: _get_value(*args, **kwargs)
      :classmethod:


   .. py:method:: _calculate_keys(*args, **kwargs)


.. py:class:: BaseTokenizer

   Bases: :py:obj:`Protocol`


   The base tokenizer protocol.


   .. py:method:: create_entity(doc, token_start_index, token_end_index, label)

      Create an entity from a document.

      :param doc: The document to use.
      :type doc: MutableDocument
      :param token_start_index: The token start index.
      :type token_start_index: int
      :param token_end_index: The token end index.
      :type token_end_index: int
      :param label: The label.
      :type label: str

      :Returns: **MutableEntity** -- The resulting entity.


   .. py:method:: entity_from_tokens(tokens)

      Get an entity from the list of tokens.

      :param tokens: List of tokens.
      :type tokens: list[MutableToken]

      :Returns: **MutableEntity** -- The resulting entity.


   .. py:method:: __call__(text)


   .. py:method:: create_new_tokenizer(config)
      :classmethod:


   .. py:method:: get_doc_class()

      Get the document implementation class used by the tokenizer.

      This can be used (e.g) to register addon paths.

      :Returns: **Type[MutableDocument]** -- The document class.


   .. py:method:: get_entity_class()

      Get the entity implementation class used by the tokenizer.

      :Returns: **Type[MutableEntity]** -- The entity class.


   .. py:attribute:: __slots__
      :value: ()


   .. py:attribute:: _is_protocol
      :value: True


   .. py:attribute:: _is_runtime_protocol
      :value: False


   .. py:method:: __init_subclass__(*args, **kwargs)
      :classmethod:


   .. py:method:: __class_getitem__(params)
      :classmethod:


.. py:function:: fix_old_style_cnf(data, remove = {'py/object', '__fields_set__', '__private_attribute_values__'}, take_from = 'py/state.__dict__')

.. py:data:: logger

.. py:function:: _load_legacy(config, save_dir_path)

.. py:function:: load_cnf(cnf_path)

.. py:function:: get_meta_cat_from_old(old_path, tokenizer)

   Convert a v1 MetaCAT folder to a v2 MetaCAT.

   :param old_path: The v1 MetaCAT file path.
   :type old_path: str
   :param tokenizer: The tokenizer.
   :type tokenizer: BaseTokenizer

   :Returns: **MetaCATAddon** -- The v2 MetaCAT.