medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer ============================================================ .. py:module:: medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer Attributes ---------- .. autoapisummary:: medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer.FAKE_TOKENIZER_PATH Classes ------- .. autoapisummary:: medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer.TokenizerWrapperBase medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer.TokenizerWrapperBPE Module Contents --------------- .. py:class:: TokenizerWrapperBase(hf_tokenizer = None) Bases: :py:obj:`abc.ABC` Helper class that provides a standard way to create an ABC using inheritance. .. py:attribute:: name :type: str .. py:method:: __init__(hf_tokenizer = None) .. py:attribute:: hf_tokenizers :value: None .. py:method:: __call__(text: str) -> dict __call__(text: list[str]) -> list[dict] .. py:method:: save(dir_path) :abstractmethod: .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: :abstractmethod: .. py:method:: get_size() :abstractmethod: .. py:method:: token_to_id(token) :abstractmethod: .. py:method:: get_pad_id() :abstractmethod: .. py:method:: ensure_tokenizer() .. py:attribute:: __slots__ :value: () .. py:data:: FAKE_TOKENIZER_PATH :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """# /fake-path-not-exist#/""" .. raw:: html
.. py:class:: TokenizerWrapperBPE(hf_tokenizers = None) Bases: :py:obj:`medcat.components.addons.meta_cat.mctokenizers.tokenizers.TokenizerWrapperBase` Wrapper around a huggingface tokenizer so that it works with the MetaCAT models. :param tokenizers.ByteLevelBPETokenizer: A huggingface BBPE tokenizer. .. py:attribute:: name :value: 'bbpe' .. py:method:: __init__(hf_tokenizers = None) .. py:method:: __call__(text: str) -> dict __call__(text: list[str]) -> list[dict] Tokenize some text :param text: Text/texts to be tokenized. :type text: Union[str, list[str]] :Returns: **Union** (*dict, list[dict]*) -- Dictionary/ies containing `offset_mapping`, `input_ids` and `tokens` corresponding to the input text/s. :raises Exception: If the input is something other than text or a list of text. .. py:method:: save(dir_path) .. py:method:: load(dir_path, model_variant = '', **kwargs) :classmethod: .. py:method:: create_new() :classmethod: .. py:method:: get_size() .. py:method:: token_to_id(token) .. py:method:: get_pad_id() .. py:attribute:: hf_tokenizers :value: None .. py:method:: ensure_tokenizer() .. py:attribute:: __slots__ :value: ()