medcat.components.addons.meta_cat.mctokenizers.tokenizers ========================================================= .. py:module:: medcat.components.addons.meta_cat.mctokenizers.tokenizers Attributes ---------- .. autoapisummary:: medcat.components.addons.meta_cat.mctokenizers.tokenizers.FAKE_TOKENIZER_PATH Classes ------- .. autoapisummary:: medcat.components.addons.meta_cat.mctokenizers.tokenizers.ConfigMetaCAT medcat.components.addons.meta_cat.mctokenizers.tokenizers.TokenizerWrapperBase Functions --------- .. autoapisummary:: medcat.components.addons.meta_cat.mctokenizers.tokenizers.init_tokenizer medcat.components.addons.meta_cat.mctokenizers.tokenizers.load_tokenizer Module Contents --------------- .. py:class:: ConfigMetaCAT(/, **data) Bases: :py:obj:`medcat.config.config.ComponentConfig` The MetaCAT part of the config .. py:attribute:: comp_name :type: str :value: 'meta_cat' The name of the component. If a custom implementation is required, it needs to be registered using `medcat.components.types.register_core_component( , , ) By default, only the 'default' component is registered. .. py:attribute:: general :type: General .. py:attribute:: model :type: Model .. py:attribute:: train :type: Train .. py:class:: Config .. py:attribute:: extra :value: 'allow' .. py:attribute:: validate_assignment :value: True .. py:attribute:: _is_dirty :type: bool :value: False .. py:method:: __setattr__(name, value) .. py:property:: is_dirty :type: bool .. py:method:: mark_clean() .. py:method:: get_strategy() .. py:method:: get_init_attrs() :classmethod: .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:method:: merge_config(other) Merge this config with another config's (partial) model dump. The exepctation is that the `other` dict is a partial model dump. Values specified there are overwritten into the current config. Values not specified there are left intact. The `other` config can have keys/values that do not exist in the config or sub-config. And they will be added where possible. :param other: The model dump :type other: dict :raises IncorrectConfigValues: If unable to set the attribute, trying to set incorrect value, or trying to set sub-config values in an incorrect format (non-dict). .. py:method:: load(path) :classmethod: .. py:attribute:: model_config :type: ClassVar[pydantic.config.ConfigDict] Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:attribute:: model_fields :type: ClassVar[Dict[str, pydantic.fields.FieldInfo]] Metadata about the fields defined on the model, mapping of field names to [`FieldInfo`][pydantic.fields.FieldInfo] objects. This replaces `Model.__fields__` from Pydantic V1. .. py:attribute:: model_computed_fields :type: ClassVar[Dict[str, pydantic.fields.ComputedFieldInfo]] A dictionary of computed field names and their corresponding `ComputedFieldInfo` objects. .. py:attribute:: __class_vars__ :type: ClassVar[set[str]] The names of the class variables defined on the model. .. py:attribute:: __private_attributes__ :type: ClassVar[Dict[str, pydantic.fields.ModelPrivateAttr]] Metadata about the private attributes of the model. .. py:attribute:: __signature__ :type: ClassVar[inspect.Signature] The synthesized `__init__` [`Signature`][inspect.Signature] of the model. .. py:attribute:: __pydantic_complete__ :type: ClassVar[bool] :value: False Whether model building is completed, or if there are still undefined fields. .. py:attribute:: __pydantic_core_schema__ :type: ClassVar[pydantic_core.CoreSchema] The core schema of the model. .. py:attribute:: __pydantic_custom_init__ :type: ClassVar[bool] Whether the model has a custom `__init__` method. .. py:attribute:: __pydantic_decorators__ :type: ClassVar[pydantic._internal._decorators.DecoratorInfos] Metadata containing the decorators defined on the model. This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1. .. py:attribute:: __pydantic_generic_metadata__ :type: ClassVar[pydantic._internal._generics.PydanticGenericMetadata] Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these. .. py:attribute:: __pydantic_parent_namespace__ :type: ClassVar[Dict[str, Any] | None] :value: None Parent namespace of the model, used for automatic rebuilding of models. .. py:attribute:: __pydantic_post_init__ :type: ClassVar[None | Literal['model_post_init']] The name of the post-init method for the model, if defined. .. py:attribute:: __pydantic_root_model__ :type: ClassVar[bool] :value: False Whether the model is a [`RootModel`][pydantic.root_model.RootModel]. .. py:attribute:: __pydantic_serializer__ :type: ClassVar[pydantic_core.SchemaSerializer] The `pydantic-core` `SchemaSerializer` used to dump instances of the model. .. py:attribute:: __pydantic_validator__ :type: ClassVar[pydantic_core.SchemaValidator | pydantic.plugin._schema_validator.PluggableSchemaValidator] The `pydantic-core` `SchemaValidator` used to validate instances of the model. .. py:attribute:: __pydantic_extra__ :type: dict[str, Any] | None A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra] is set to `'allow'`. .. py:attribute:: __pydantic_fields_set__ :type: set[str] The names of fields explicitly set during instantiation. .. py:attribute:: __pydantic_private__ :type: dict[str, Any] | None Values of private attributes set on the model instance. .. py:attribute:: __slots__ :value: ('__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__') .. py:method:: __init__(/, **data) Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:property:: model_extra :type: dict[str, Any] | None Get extra fields set during validation. :Returns: **A dictionary of extra fields, or `None` if `config.extra` is not set to `"allow"`.** .. py:property:: model_fields_set :type: set[str] Returns the set of fields that have been explicitly set on this model instance. :Returns: **A set of strings representing the fields that have been set,** -- i.e. that were not filled from defaults. .. py:method:: model_construct(_fields_set = None, **values) :classmethod: Creates a new instance of the `Model` class with validated data. Creates a new model setting `__dict__` and `__pydantic_fields_set__` from trusted or pre-validated data. Default values are respected, but no other validation is performed. !!! note `model_construct()` generally respects the `model_config.extra` setting on the provided model. That is, if `model_config.extra == 'allow'`, then all extra passed values are added to the model instance's `__dict__` and `__pydantic_extra__` fields. If `model_config.extra == 'ignore'` (the default), then all extra passed values are ignored. Because no validation is performed with a call to `model_construct()`, having `model_config.extra == 'forbid'` does not result in an error if extra values are passed, but they will be ignored. :param _fields_set: A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [`model_fields_set`][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the `values` argument will be used. :param values: Trusted or pre-validated data dictionary. :Returns: **A new instance of the `Model` class with validated data.** .. py:method:: model_copy(*, update = None, deep = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#model_copy Returns a copy of the model. :param update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. :param deep: Set to `True` to make a deep copy of the model. :Returns: **New model instance.** .. py:method:: model_dump(*, mode = 'python', include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A dictionary representation of the model.** .. py:method:: model_dump_json(*, indent = None, include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump_json Generates a JSON representation of the model using Pydantic's `to_json` method. :param indent: Indentation to use in the JSON output. If None is passed, the output will be compact. :param include: Field(s) to include in the JSON output. :param exclude: Field(s) to exclude from the JSON output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to serialize using field aliases. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :Returns: **A JSON string representation of the model.** .. py:method:: model_json_schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, schema_generator = GenerateJsonSchema, mode = 'validation') :classmethod: Generates a JSON schema for a model class. :param by_alias: Whether to use attribute aliases or not. :param ref_template: The reference template. :param schema_generator: To override the logic used to generate the JSON schema, as a subclass of `GenerateJsonSchema` with your desired modifications :param mode: The mode in which to generate the schema. :Returns: **The JSON schema for the given model class.** .. py:method:: model_parametrized_name(params) :classmethod: Compute the class name for parametrizations of generic classes. This method can be overridden to achieve a custom naming scheme for generic BaseModels. :param params: Tuple of types of the class. Given a generic class `Model` with 2 type variables and a concrete model `Model[str, int]`, the value `(str, int)` would be passed to `params`. :Returns: **String representing the new class where `params` are passed to `cls` as type variables.** :raises TypeError: Raised when trying to generate concrete names for non-generic models. .. py:method:: model_post_init(__context) Override this method to perform additional initialization after `__init__` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized. .. py:method:: model_rebuild(*, force = False, raise_errors = True, _parent_namespace_depth = 2, _types_namespace = None) :classmethod: Try to rebuild the pydantic-core schema for the model. This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails. :param force: Whether to force the rebuilding of the model schema, defaults to `False`. :param raise_errors: Whether to raise errors, defaults to `True`. :param _parent_namespace_depth: The depth level of the parent namespace, defaults to 2. :param _types_namespace: The types namespace, defaults to `None`. :Returns: * **Returns `None` if the schema is already "complete" and rebuilding was not required.** * **If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`.** .. py:method:: model_validate(obj, *, strict = None, from_attributes = None, context = None) :classmethod: Validate a pydantic model instance. :param obj: The object to validate. :param strict: Whether to enforce types strictly. :param from_attributes: Whether to extract data from object attributes. :param context: Additional context to pass to the validator. :raises ValidationError: If the object could not be validated. :Returns: **The validated model instance.** .. py:method:: model_validate_json(json_data, *, strict = None, context = None) :classmethod: Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing Validate the given JSON data against the Pydantic model. :param json_data: The JSON data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :Returns: **The validated Pydantic model.** :raises ValidationError: If `json_data` is not a JSON string or the object could not be validated. .. py:method:: model_validate_strings(obj, *, strict = None, context = None) :classmethod: Validate the given object with string data against the Pydantic model. :param obj: The object containing string data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :Returns: **The validated Pydantic model.** .. py:method:: __get_pydantic_core_schema__(source, handler, /) :classmethod: Hook into generating the model's CoreSchema. :param source: The class we are generating a schema for. This will generally be the same as the `cls` argument if this is a classmethod. :param handler: A callable that calls into Pydantic's internal CoreSchema generation logic. :Returns: **A `pydantic-core` `CoreSchema`.** .. py:method:: __get_pydantic_json_schema__(core_schema, handler, /) :classmethod: Hook into generating the model's JSON schema. :param core_schema: A `pydantic-core` CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema (`{'type': 'nullable', 'schema': current_schema}`), or just call the handler with the original schema. :param handler: Call into Pydantic's internal JSON schema generation. This will raise a `pydantic.errors.PydanticInvalidForJsonSchema` if JSON schema generation fails. Since this gets called by `BaseModel.model_json_schema` you can override the `schema_generator` argument to that function to change JSON schema generation globally for a type. :Returns: **A JSON schema, as a Python object.** .. py:method:: __pydantic_init_subclass__(**kwargs) :classmethod: This is intended to behave just like `__init_subclass__`, but is called by `ModelMetaclass` only after the class is actually fully initialized. In particular, attributes like `model_fields` will be present when this is called. This is necessary because `__init_subclass__` will always be called by `type.__new__`, and it would require a prohibitively large refactor to the `ModelMetaclass` to ensure that `type.__new__` was called in such a manner that the class would already be sufficiently initialized. This will receive the same `kwargs` that would be passed to the standard `__init_subclass__`, namely, any kwargs passed to the class definition that aren't used internally by pydantic. :param \*\*kwargs: Any keyword arguments passed to the class definition that aren't used internally by pydantic. .. py:method:: __class_getitem__(typevar_values) :classmethod: .. py:method:: __copy__() Returns a shallow copy of the model. .. py:method:: __deepcopy__(memo = None) Returns a deep copy of the model. .. py:method:: __getattr__(item) .. py:method:: _check_frozen(name, value) .. py:method:: __getstate__() .. py:method:: __setstate__(state) .. py:method:: __eq__(other) .. py:method:: __init_subclass__(**kwargs) :classmethod: This signature is included purely to help type-checkers check arguments to class declaration, which provides a way to conveniently set model_config key/value pairs. ```py from pydantic import BaseModel class MyModel(BaseModel, extra='allow'): ... ``` However, this may be deceiving, since the _actual_ calls to `__init_subclass__` will not receive any of the config arguments, and will only receive any keyword arguments passed during class initialization that are _not_ expected keys in ConfigDict. (This is due to the way `ModelMetaclass.__new__` works.) :param \*\*kwargs: Keyword arguments passed to the class definition, which set model_config .. note:: You may want to override `__pydantic_init_subclass__` instead, which behaves similarly but is called *after* the class is fully initialized. .. py:method:: __iter__() So `dict(model)` works. .. py:method:: __repr__() .. py:method:: __repr_args__() .. py:attribute:: __repr_name__ .. py:attribute:: __repr_str__ .. py:attribute:: __pretty__ .. py:attribute:: __rich_repr__ .. py:method:: __str__() .. py:property:: __fields__ :type: dict[str, pydantic.fields.FieldInfo] .. py:property:: __fields_set__ :type: set[str] .. py:method:: dict(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False) .. py:method:: json(*, include = None, exclude = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, encoder = PydanticUndefined, models_as_dict = PydanticUndefined, **dumps_kwargs) .. py:method:: parse_obj(obj) :classmethod: .. py:method:: parse_raw(b, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False) :classmethod: .. py:method:: parse_file(path, *, content_type = None, encoding = 'utf8', proto = None, allow_pickle = False) :classmethod: .. py:method:: from_orm(obj) :classmethod: .. py:method:: construct(_fields_set = None, **values) :classmethod: .. py:method:: copy(*, include = None, exclude = None, update = None, deep = False) Returns a copy of the model. !!! warning "Deprecated" This method is now deprecated; use `model_copy` instead. If you need `include` or `exclude`, use: ```py data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) ``` :param include: Optional set or mapping specifying which fields to include in the copied model. :param exclude: Optional set or mapping specifying which fields to exclude in the copied model. :param update: Optional dictionary of field-value pairs to override field values in the copied model. :param deep: If True, the values of fields that are Pydantic models will be deep-copied. :Returns: **A copy of the model with included, excluded and updated fields as specified.** .. py:method:: schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE) :classmethod: .. py:method:: schema_json(*, by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, **dumps_kwargs) :classmethod: .. py:method:: validate(value) :classmethod: .. py:method:: update_forward_refs(**localns) :classmethod: .. py:method:: _iter(*args, **kwargs) .. py:method:: _copy_and_set_values(*args, **kwargs) .. py:method:: _get_value(*args, **kwargs) :classmethod: .. py:method:: _calculate_keys(*args, **kwargs) .. py:data:: FAKE_TOKENIZER_PATH :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """# /fake-path-not-exist#/""" .. raw:: html
.. py:class:: TokenizerWrapperBase(hf_tokenizer = None) Bases: :py:obj:`abc.ABC` Helper class that provides a standard way to create an ABC using inheritance. .. py:attribute:: name :type: str .. py:method:: __init__(hf_tokenizer = None) .. py:attribute:: hf_tokenizers :value: None .. py:method:: __call__(text: str) -> dict __call__(text: list[str]) -> list[dict] .. py:method:: save(dir_path) :abstractmethod: .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: :abstractmethod: .. py:method:: get_size() :abstractmethod: .. py:method:: token_to_id(token) :abstractmethod: .. py:method:: get_pad_id() :abstractmethod: .. py:method:: ensure_tokenizer() .. py:attribute:: __slots__ :value: () .. py:function:: init_tokenizer(cnf) .. py:function:: load_tokenizer(config, tokenizer_folder)