medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer
Attributes
Classes
Helper class that provides a standard way to create an ABC using |
|
Wrapper around a huggingface tokenizer so that it works with the |
Module Contents
- class medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer.TokenizerWrapperBase(hf_tokenizer=None)
Bases:
abc.ABCHelper class that provides a standard way to create an ABC using inheritance.
- Parameters:
hf_tokenizer (Optional[tokenizers.Tokenizer])
- name: str
- __init__(hf_tokenizer=None)
- Parameters:
hf_tokenizer (Optional[tokenizers.Tokenizer])
- Return type:
None
- hf_tokenizers = None
- __call__(text: str) dict
- __call__(text: list[str]) list[dict]
- abstract save(dir_path)
- Parameters:
dir_path (str)
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Abstractmethod:
- Parameters:
dir_path (str)
model_variant (Optional[str])
- Return type:
tokenizers.Tokenizer
- abstract get_size()
- Return type:
int
- abstract token_to_id(token)
- Parameters:
token (str)
- Return type:
Union[int, list[int]]
- abstract get_pad_id()
- Return type:
Union[Optional[int], list[int]]
- ensure_tokenizer()
- Return type:
tokenizers.Tokenizer
- __slots__ = ()
- medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer.FAKE_TOKENIZER_PATH = Multiline-String
Show Value
"""# /fake-path-not-exist#/"""
- class medcat.components.addons.meta_cat.mctokenizers.bpe_tokenizer.TokenizerWrapperBPE(hf_tokenizers=None)
Bases:
medcat.components.addons.meta_cat.mctokenizers.tokenizers.TokenizerWrapperBaseWrapper around a huggingface tokenizer so that it works with the MetaCAT models.
- Parameters:
tokenizers.ByteLevelBPETokenizer – A huggingface BBPE tokenizer.
hf_tokenizers (Optional[tokenizers.ByteLevelBPETokenizer])
- name = 'bbpe'
- __init__(hf_tokenizers=None)
- Parameters:
hf_tokenizers (Optional[tokenizers.ByteLevelBPETokenizer])
- Return type:
None
- __call__(text: str) dict
- __call__(text: list[str]) list[dict]
Tokenize some text
- Parameters:
text (Union[str, list[str]]) – Text/texts to be tokenized.
- Returns:
Union (dict, list[dict]) – Dictionary/ies containing offset_mapping, input_ids and tokens corresponding to the input text/s.
- Raises:
Exception – If the input is something other than text or a list of text.
- save(dir_path)
- Parameters:
dir_path (str)
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Parameters:
dir_path (str)
model_variant (Optional[str])
- Return type:
- classmethod create_new()
- get_size()
- Return type:
int
- token_to_id(token)
- Parameters:
token (str)
- Return type:
Union[int, list[int]]
- get_pad_id()
- Return type:
Union[int, list[int]]
- hf_tokenizers = None
- ensure_tokenizer()
- Return type:
tokenizers.Tokenizer
- __slots__ = ()