medcat.utils.legacy.convert_meta_cat ==================================== .. py:module:: medcat.utils.legacy.convert_meta_cat Attributes ---------- .. autoapisummary:: medcat.utils.legacy.convert_meta_cat.logger Classes ------- .. autoapisummary:: medcat.utils.legacy.convert_meta_cat.MetaCAT medcat.utils.legacy.convert_meta_cat.MetaCATAddon medcat.utils.legacy.convert_meta_cat.TokenizerWrapperBase medcat.utils.legacy.convert_meta_cat.ConfigMetaCAT medcat.utils.legacy.convert_meta_cat.BaseTokenizer Functions --------- .. autoapisummary:: medcat.utils.legacy.convert_meta_cat.load_tokenizer medcat.utils.legacy.convert_meta_cat.fix_old_style_cnf medcat.utils.legacy.convert_meta_cat._load_legacy medcat.utils.legacy.convert_meta_cat.load_cnf medcat.utils.legacy.convert_meta_cat.get_meta_cat_from_old Module Contents --------------- .. py:class:: MetaCAT(tokenizer = None, embeddings = None, config = None, _model_state_dict = None) Bases: :py:obj:`medcat.storage.serialisables.AbstractSerialisable` The MetaCAT class used for training 'Meta-Annotation' models, i.e. annotations of clinical concept annotations. These are also known as properties or attributes of recognise entities sin similar tools such as MetaMap and cTakes. This is a flexible model agnostic class that can learns any meta-annotation task, i.e. any multi-class classification task for recognised terms. :param tokenizer: The Huggingface tokenizer instance. This can be a pre-trained tokenzier instance from a BERT-style model, or trained from scratch for the Bi-LSTM (w. attention) model that is currentl used in most deployments. :type tokenizer: TokenizerWrapperBase :param embeddings: embedding mapping (sub)word input id n-dim (sub)word embedding. :type embeddings: Tensor, numpy.ndarray :param config: the configuration for MetaCAT. Param descriptions available in ConfigMetaCAT docs. :type config: ConfigMetaCAT .. py:attribute:: name :value: 'meta_cat' .. py:attribute:: _component_lock .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:property:: _model_state_dict .. py:method:: __init__(tokenizer = None, embeddings = None, config = None, _model_state_dict = None) .. py:attribute:: config :value: None .. py:attribute:: tokenizer :value: None .. py:attribute:: embeddings .. py:attribute:: model .. py:method:: _reset_tokenizer_info() .. py:method:: get_model(embeddings) Get the model :param embeddings: The embedding densor :type embeddings: Optional[Tensor] :raises ValueError: If the meta model is not LSTM or BERT :Returns: **nn.Module** -- The module .. py:method:: get_hash() A partial hash trying to catch differences between models. :Returns: **str** -- The hex hash. .. py:method:: train_from_json(json_path, save_dir_path = None, data_oversampled = None, overwrite = False) Train or continue training a model give a json_path containing a MedCATtrainer export. It will continue training if an existing model is loaded or start new training if the model is blank/new. :param json_path: Path/Paths to a MedCATtrainer export containing the meta_annotations we want to train for. :type json_path: Union[str, list] :param save_dir_path: In case we have aut_save_model (meaning during the training the best model will be saved) we need to set a save path. Defaults to `None`. :type save_dir_path: Optional[str] :param data_oversampled: In case of oversampling being performed, the data will be passed in the parameter allowing the model to be trained on original + synthetic data. :type data_oversampled: Optional[list] :param overwrite: Whether to allow overwriting the file if/when appropriate. :type overwrite: bool :Returns: **dict** -- The resulting report. .. py:method:: train_raw(data_loaded, save_dir_path = None, data_oversampled = None, overwrite = False) Train or continue training a model given raw data. It will continue training if an existing model is loaded or start new training if the model is blank/new. The raw data is expected in the following format: { 'projects': [ # list of projects { 'name': '', 'documents': [ # list of documents { 'name': '', 'text': '', 'annotations': [ # list of annotations { # start index of the annotation 'start': -1, 'end': 1, # end index of the annotation 'cui': 'cui', 'value': '' }, ... ], }, ... ] }, ... ] } :param data_loaded: The raw data we want to train for. :type data_loaded: dict :param save_dir_path: In case we have aut_save_model (meaning during the training the best model will be saved) we need to set a save path. Defaults to `None`. :type save_dir_path: Optional[str] :param data_oversampled: In case of oversampling being performed, the data will be passed in the parameter allowing the model to be trained on original + synthetic data. The format of which is expected: [[['text','of','the','document'], [index of medical entity], "label" ], ['text','of','the','document'], [index of medical entity], "label" ]] :type data_oversampled: Optional[list] :param overwrite: Whether to allow overwriting the file if/when appropriate. :type overwrite: bool :Returns: **dict** -- The resulting report. :raises Exception: If no save path is specified, or category name not in data. :raises AssertionError: If no tokeniser is set :raises FileNotFoundError: If phase_number is set to 2 and model.dat file is not found :raises KeyError: If phase_number is set to 2 and model.dat file contains mismatched architecture .. py:method:: eval(json_path) Evaluate from json. :param json_path: The json file ath :type json_path: str :Returns: **dict** -- The resulting model dict :raises AssertionError: If self.tokenizer :raises Exception: If the category name does not exist .. py:method:: get_ents(doc) .. py:method:: prepare_document(doc, input_ids, offset_mapping, lowercase) Prepares document. :param doc: The document :type doc: Doc :param input_ids: Input ids :type input_ids: list :param offset_mapping: Offset mappings :type offset_mapping: list :param lowercase: Whether to use lower case replace center :type lowercase: bool :Returns: **tuple[dict, list]** -- Entity id to index mapping and Samples .. py:method:: batch_generator(stream, batch_size_chars) :staticmethod: Generator for batch of documents. :param stream: The document stream :type stream: Iterable[MutableDocument] :param batch_size_chars: Number of characters per batch :type batch_size_chars: int :Yields: *list[MutableDocument]* -- The batch of documents. .. py:method:: _set_meta_anns(doc, id2category_value) .. py:method:: __call__(doc) Process one document, used in the spacy pipeline for sequential document processing. :param doc: A spacy document :type doc: Doc :Returns: **Doc** -- The same spacy document. .. py:method:: get_model_card(as_dict = False) A minimal model card. :param as_dict: Return the model card as a dictionary instead of a str. Defaults to `False`. :type as_dict: bool :Returns: **Union[str, dict]** -- An indented JSON object. OR A JSON object in dict form. .. py:method:: __repr__() Prints the model_card for this MetaCAT instance. :Returns: * **the 'Model Card' for this MetaCAT instance. This includes NER+L** * **config and any MetaCATs** .. py:method:: get_strategy() .. py:method:: __eq__(other) .. py:class:: MetaCATAddon(config, base_tokenizer, meta_cat) Bases: :py:obj:`medcat.components.addons.addons.AddonComponent` Base/abstract addon component class. .. py:attribute:: addon_type :value: 'meta_cat' .. py:attribute:: output_key :value: 'meta_anns' .. py:attribute:: config :type: medcat.config.config_meta_cat.ConfigMetaCAT .. py:method:: __init__(config, base_tokenizer, meta_cat) .. py:attribute:: base_tokenizer .. py:attribute:: _mc .. py:attribute:: _name .. py:property:: mc :type: MetaCAT .. py:method:: create_new(config, base_tokenizer, tknzer_preprocessor = None) :classmethod: Factory method to create a new MetaCATAddon instance. .. py:method:: create_new_component(cnf, tokenizer, cdb, vocab, model_load_path) :classmethod: Create a new component or load one off disk if load path presented. This may raise an exception if the wrong type of config is provided. :param cnf: The config relevant to this components. :type cnf: ComponentConfig :param tokenizer: The base tokenizer. :type tokenizer: BaseTokenizer :param cdb: The CDB. :type cdb: CDB :param vocab: The Vocab. :type vocab: Vocab :param model_load_path: Model load path (if present). :type model_load_path: Optional[str] :Returns: **Self** -- The new components. .. py:method:: load_existing(cnf, base_tokenizer, load_path) :classmethod: Factory method to load an existing MetaCATAddon from disk. .. py:property:: name :type: str The name of the component. .. py:method:: __call__(doc) .. py:method:: load(folder_path) .. py:method:: _load_tokenizer(config, tokenizer_folder) :classmethod: .. py:method:: _get_meta_cat_and_tokenizer_paths(folder_path) :classmethod: .. py:method:: save(folder_path) .. py:method:: _init_data_paths() .. py:property:: include_in_output :type: bool .. py:method:: get_output_key_val(ent) .. py:method:: serialise_to(folder_path) .. py:method:: deserialise_from(folder_path, **init_kwargs) :classmethod: .. py:method:: get_strategy() .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:method:: get_hash() .. py:attribute:: NAME_PREFIX :type: str :value: 'addon_' .. py:attribute:: NAME_SPLITTER :type: str :value: '.' .. py:method:: is_core() Whether the component is a core component or not. :Returns: **bool** -- Whether this is a core component. .. py:method:: get_folder_name_for_addon_and_name(addon_type, name) :classmethod: .. py:method:: get_folder_name() .. py:property:: full_name :type: str Name with the component type (e.g ner, linking, meta). .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:class:: TokenizerWrapperBase(hf_tokenizer = None) Bases: :py:obj:`abc.ABC` Helper class that provides a standard way to create an ABC using inheritance. .. py:attribute:: name :type: str .. py:method:: __init__(hf_tokenizer = None) .. py:attribute:: hf_tokenizers :value: None .. py:method:: __call__(text: str) -> dict __call__(text: list[str]) -> list[dict] .. py:method:: save(dir_path) :abstractmethod: .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: :abstractmethod: .. py:method:: get_size() :abstractmethod: .. py:method:: token_to_id(token) :abstractmethod: .. py:method:: get_pad_id() :abstractmethod: .. py:method:: ensure_tokenizer() .. py:attribute:: __slots__ :value: () .. py:function:: load_tokenizer(config, tokenizer_folder) .. py:class:: ConfigMetaCAT(/, **data) Bases: :py:obj:`medcat.config.config.ComponentConfig` The MetaCAT part of the config .. py:attribute:: comp_name :type: str :value: 'meta_cat' The name of the component. If a custom implementation is required, it needs to be registered using `medcat.components.types.register_core_component( , , ) By default, only the 'default' component is registered. .. py:attribute:: general :type: General .. py:attribute:: model :type: Model .. py:attribute:: train :type: Train .. py:class:: Config .. py:attribute:: extra :value: 'allow' .. py:attribute:: validate_assignment :value: True .. py:attribute:: _is_dirty :type: bool :value: False .. py:method:: __setattr__(name, value) .. py:property:: is_dirty :type: bool .. py:method:: mark_clean() .. py:method:: get_strategy() .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:method:: merge_config(other) Merge this config with another config's (partial) model dump. The exepctation is that the `other` dict is a partial model dump. Values specified there are overwritten into the current config. Values not specified there are left intact. The `other` config can have keys/values that do not exist in the config or sub-config. And they will be added where possible. :param other: The model dump :type other: dict :raises IncorrectConfigValues: If unable to set the attribute, trying to set incorrect value, or trying to set sub-config values in an incorrect format (non-dict). .. py:method:: load(path) :classmethod: .. py:attribute:: model_config :type: ClassVar[pydantic.config.ConfigDict] Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:attribute:: model_fields :type: ClassVar[Dict[str, pydantic.fields.FieldInfo]] Metadata about the fields defined on the model, mapping of field names to [`FieldInfo`][pydantic.fields.FieldInfo] objects. This replaces `Model.__fields__` from Pydantic V1. .. py:attribute:: model_computed_fields :type: ClassVar[Dict[str, pydantic.fields.ComputedFieldInfo]] A dictionary of computed field names and their corresponding `ComputedFieldInfo` objects. .. py:attribute:: __class_vars__ :type: ClassVar[set[str]] The names of the class variables defined on the model. .. py:attribute:: __private_attributes__ :type: ClassVar[Dict[str, pydantic.fields.ModelPrivateAttr]] Metadata about the private attributes of the model. .. py:attribute:: __signature__ :type: ClassVar[inspect.Signature] The synthesized `__init__` [`Signature`][inspect.Signature] of the model. .. py:attribute:: __pydantic_complete__ :type: ClassVar[bool] :value: False Whether model building is completed, or if there are still undefined fields. .. py:attribute:: __pydantic_core_schema__ :type: ClassVar[pydantic_core.CoreSchema] The core schema of the model. .. py:attribute:: __pydantic_custom_init__ :type: ClassVar[bool] Whether the model has a custom `__init__` method. .. py:attribute:: __pydantic_decorators__ :type: ClassVar[pydantic._internal._decorators.DecoratorInfos] Metadata containing the decorators defined on the model. This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1. .. py:attribute:: __pydantic_generic_metadata__ :type: ClassVar[pydantic._internal._generics.PydanticGenericMetadata] Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these. .. py:attribute:: __pydantic_parent_namespace__ :type: ClassVar[Dict[str, Any] | None] :value: None Parent namespace of the model, used for automatic rebuilding of models. .. py:attribute:: __pydantic_post_init__ :type: ClassVar[None | Literal['model_post_init']] The name of the post-init method for the model, if defined. .. py:attribute:: __pydantic_root_model__ :type: ClassVar[bool] :value: False Whether the model is a [`RootModel`][pydantic.root_model.RootModel]. .. py:attribute:: __pydantic_serializer__ :type: ClassVar[pydantic_core.SchemaSerializer] The `pydantic-core` `SchemaSerializer` used to dump instances of the model. .. py:attribute:: __pydantic_validator__ :type: ClassVar[pydantic_core.SchemaValidator | pydantic.plugin._schema_validator.PluggableSchemaValidator] The `pydantic-core` `SchemaValidator` used to validate instances of the model. .. py:attribute:: __pydantic_extra__ :type: dict[str, Any] | None A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra] is set to `'allow'`. .. py:attribute:: __pydantic_fields_set__ :type: set[str] The names of fields explicitly set during instantiation. .. py:attribute:: __pydantic_private__ :type: dict[str, Any] | None Values of private attributes set on the model instance. .. py:attribute:: __slots__ :value: ('__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__') .. py:method:: __init__(/, **data) Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:property:: model_extra :type: dict[str, Any] | None Get extra fields set during validation. :Returns: **A dictionary of extra fields, or `None` if `config.extra` is not set to `"allow"`.** .. py:property:: model_fields_set :type: set[str] Returns the set of fields that have been explicitly set on this model instance. :Returns: **A set of strings representing the fields that have been set,** -- i.e. that were not filled from defaults. .. py:method:: model_construct(_fields_set = None, **values) :classmethod: Creates a new instance of the `Model` class with validated data. Creates a new model setting `__dict__` and `__pydantic_fields_set__` from trusted or pre-validated data. Default values are respected, but no other validation is performed. !!! note `model_construct()` generally respects the `model_config.extra` setting on the provided model. That is, if `model_config.extra == 'allow'`, then all extra passed values are added to the model instance's `__dict__` and `__pydantic_extra__` fields. If `model_config.extra == 'ignore'` (the default), then all extra passed values are ignored. Because no validation is performed with a call to `model_construct()`, having `model_config.extra == 'forbid'` does not result in an error if extra values are passed, but they will be ignored. :param _fields_set: A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [`model_fields_set`][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the `values` argument will be used. :param values: Trusted or pre-validated data dictionary. :Returns: **A new instance of the `Model` class with validated data.** .. py:method:: model_copy(*, update = None, deep = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#model_copy Returns a copy of the model. :param update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. :param deep: Set to `True` to make a deep copy of the model. :Returns: **New model instance.** .. py:method:: model_dump(*, mode = 'python', include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A dictionary representation of the model.** .. py:method:: model_dump_json(*, indent = None, include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump_json Generates a JSON representation of the model using Pydantic's `to_json` method. :param indent: Indentation to use in the JSON output. If None is passed, the output will be compact. :param include: Field(s) to include in the JSON output. :param exclude: Field(s) to exclude from the JSON output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to serialize using field aliases. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A JSON string representation of the model.** .. py:method:: model_json_schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, schema_generator = GenerateJsonSchema, mode = 'validation') :classmethod: Generates a JSON schema for a model class. :param by_alias: Whether to use attribute aliases or not. :param ref_template: The reference template. :param schema_generator: To override the logic used to generate the JSON schema, as a subclass of `GenerateJsonSchema` with your desired modifications :param mode: The mode in which to generate the schema. :Returns: **The JSON schema for the given model class.** .. py:method:: model_parametrized_name(params) :classmethod: Compute the class name for parametrizations of generic classes. This method can be overridden to achieve a custom naming scheme for generic BaseModels. :param params: Tuple of types of the class. Given a generic class `Model` with 2 type variables and a concrete model `Model[str, int]`, the value `(str, int)` would be passed to `params`. :Returns: **String representing the new class where `params` are passed to `cls` as type variables.** :raises TypeError: Raised when trying to generate concrete names for non-generic models. .. py:method:: model_post_init(__context) Override this method to perform additional initialization after `__init__` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized. .. py:method:: model_rebuild(*, force = False, raise_errors = True, _parent_namespace_depth = 2, _types_namespace = None) :classmethod: Try to rebuild the pydantic-core schema for the model. This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails. :param force: Whether to force the rebuilding of the model schema, defaults to `False`. :param raise_errors: Whether to raise errors, defaults to `True`. :param _parent_namespace_depth: The depth level of the parent namespace, defaults to 2. :param _types_namespace: The types namespace, defaults to `None`. :Returns: * **Returns `None` if the schema is already "complete" and rebuilding was not required.** * **If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`.** .. py:method:: model_validate(obj, *, strict = None, from_attributes = None, context = None) :classmethod: Validate a pydantic model instance. :param obj: The object to validate. :param strict: Whether to enforce types strictly. :param from_attributes: Whether to extract data from object attributes. :param context: Additional context to pass to the validator. :raises ValidationError: If the object could not be validated. :Returns: **The validated model instance.** .. py:method:: model_validate_json(json_data, *, strict = None, context = None) :classmethod: Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing Validate the given JSON data against the Pydantic model. :param json_data: The JSON data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :Returns: **The validated Pydantic model.** :raises ValidationError: If `json_data` is not a JSON string or the object could not be validated. .. py:method:: model_validate_strings(obj, *, strict = None, context = None) :classmethod: Validate the given object with string data against the Pydantic model. :param obj: The object containing string data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :Returns: **The validated Pydantic model.** .. py:method:: __get_pydantic_core_schema__(source, handler, /) :classmethod: Hook into generating the model's CoreSchema. :param source: The class we are generating a schema for. This will generally be the same as the `cls` argument if this is a classmethod. :param handler: A callable that calls into Pydantic's internal CoreSchema generation logic. :Returns: **A `pydantic-core` `CoreSchema`.** .. py:method:: __get_pydantic_json_schema__(core_schema, handler, /) :classmethod: Hook into generating the model's JSON schema. :param core_schema: A `pydantic-core` CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema (`{'type': 'nullable', 'schema': current_schema}`), or just call the handler with the original schema. :param handler: Call into Pydantic's internal JSON schema generation. This will raise a `pydantic.errors.PydanticInvalidForJsonSchema` if JSON schema generation fails. Since this gets called by `BaseModel.model_json_schema` you can override the `schema_generator` argument to that function to change JSON schema generation globally for a type. :Returns: **A JSON schema, as a Python object.** .. py:method:: __pydantic_init_subclass__(**kwargs) :classmethod: This is intended to behave just like `__init_subclass__`, but is called by `ModelMetaclass` only after the class is actually fully initialized. In particular, attributes like `model_fields` will be present when this is called. This is necessary because `__init_subclass__` will always be called by `type.__new__`, and it would require a prohibitively large refactor to the `ModelMetaclass` to ensure that `type.__new__` was called in such a manner that the class would already be sufficiently initialized. This will receive the same `kwargs` that would be passed to the standard `__init_subclass__`, namely, any kwargs passed to the class definition that aren't used internally by pydantic. :param \*\*kwargs: Any keyword arguments passed to the class definition that aren't used internally by pydantic. .. py:method:: __class_getitem__(typevar_values) :classmethod: .. py:method:: __copy__() Returns a shallow copy of the model. .. py:method:: __deepcopy__(memo = None) Returns a deep copy of the model. .. py:method:: __getattr__(item) .. py:method:: _check_frozen(name, value) .. py:method:: __getstate__() .. py:method:: __setstate__(state) .. py:method:: __eq__(other) .. py:method:: __init_subclass__(**kwargs) :classmethod: This signature is included purely to help type-checkers check arguments to class declaration, which provides a way to conveniently set model_config key/value pairs. ```py from pydantic import BaseModel class MyModel(BaseModel, extra='allow'): ... ``` However, this may be deceiving, since the _actual_ calls to `__init_subclass__` will not receive any of the config arguments, and will only receive any keyword arguments passed during class initialization that are _not_ expected keys in ConfigDict. (This is due to the way `ModelMetaclass.__new__` works.) :param \*\*kwargs: Keyword arguments passed to the class definition, which set model_config .. note:: You may want to override `__pydantic_init_subclass__` instead, which behaves similarly but is called *after* the class is fully initialized. .. py:method:: __iter__() So `dict(model)` works. .. py:method:: __repr__() .. py:method:: __repr_args__() .. py:attribute:: __repr_name__ .. py:attribute:: __repr_str__ .. py:attribute:: __pretty__ .. py:attribute:: __rich_repr__ .. py:method:: __str__() .. py:property:: __fields__ :type: dict[str, pydantic.fields.FieldInfo] .. py:property:: __fields_set__ :type: set[str] .. py:method:: dict(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False) .. py:method:: json(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, encoder = PydanticUndefined, models_as_dict = PydanticUndefined, **dumps_kwargs) .. py:method:: parse_obj(obj) :classmethod: .. py:method:: parse_raw(b, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False) :classmethod: .. py:method:: parse_file(path, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False) :classmethod: .. py:method:: from_orm(obj) :classmethod: .. py:method:: construct(_fields_set = None, **values) :classmethod: .. py:method:: copy(*, include = None, exclude = None, update = None, deep = False) Returns a copy of the model. !!! warning "Deprecated" This method is now deprecated; use `model_copy` instead. If you need `include` or `exclude`, use: ```py data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) ``` :param include: Optional set or mapping specifying which fields to include in the copied model. :param exclude: Optional set or mapping specifying which fields to exclude in the copied model. :param update: Optional dictionary of field-value pairs to override field values in the copied model. :param deep: If True, the values of fields that are Pydantic models will be deep-copied. :Returns: **A copy of the model with included, excluded and updated fields as specified.** .. py:method:: schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE) :classmethod: .. py:method:: schema_json(*, by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, **dumps_kwargs) :classmethod: .. py:method:: validate(value) :classmethod: .. py:method:: update_forward_refs(**localns) :classmethod: .. py:method:: _iter(*args, **kwargs) .. py:method:: _copy_and_set_values(*args, **kwargs) .. py:method:: _get_value(*args, **kwargs) :classmethod: .. py:method:: _calculate_keys(*args, **kwargs) .. py:class:: BaseTokenizer Bases: :py:obj:`Protocol` The base tokenizer protocol. .. py:method:: create_entity(doc, token_start_index, token_end_index, label) Create an entity from a document. :param doc: The document to use. :type doc: MutableDocument :param token_start_index: The token start index. :type token_start_index: int :param token_end_index: The token end index. :type token_end_index: int :param label: The label. :type label: str :Returns: **MutableEntity** -- The resulting entity. .. py:method:: entity_from_tokens(tokens) Get an entity from the list of tokens. :param tokens: List of tokens. :type tokens: list[MutableToken] :Returns: **MutableEntity** -- The resulting entity. .. py:method:: __call__(text) .. py:method:: create_new_tokenizer(config) :classmethod: .. py:method:: get_doc_class() Get the document implementation class used by the tokenizer. This can be used (e.g) to register addon paths. :Returns: **Type[MutableDocument]** -- The document class. .. py:method:: get_entity_class() Get the entity implementation class used by the tokenizer. :Returns: **Type[MutableEntity]** -- The entity class. .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:function:: fix_old_style_cnf(data, remove = {'py/object', '__fields_set__', '__private_attribute_values__'}, take_from = 'py/state.__dict__') .. py:data:: logger .. py:function:: _load_legacy(config, save_dir_path) .. py:function:: load_cnf(cnf_path) .. py:function:: get_meta_cat_from_old(old_path, tokenizer) Convert a v1 MetaCAT folder to a v2 MetaCAT. :param old_path: The v1 MetaCAT file path. :type old_path: str :param tokenizer: The tokenizer. :type tokenizer: BaseTokenizer :Returns: **MetaCATAddon** -- The v2 MetaCAT.