medcat2.components.addons.meta_cat.meta_cat_tokenizers
Attributes
Classes
Helper class that provides a standard way to create an ABC using |
|
Wrapper around a huggingface tokenizer so that it works with the |
|
Wrapper around a huggingface BERT tokenizer so that it works with the |
Module Contents
- medcat2.components.addons.meta_cat.meta_cat_tokenizers.FAKE_TOKENIZER_PATH = Multiline-String
Show Value
"""# /fake-path-not-exist#/"""
- class medcat2.components.addons.meta_cat.meta_cat_tokenizers.TokenizerWrapperBase(hf_tokenizer=None)
Bases:
abc.ABCHelper class that provides a standard way to create an ABC using inheritance.
- Parameters:
hf_tokenizer (Optional[tokenizers.Tokenizer])
- name: str
- __init__(hf_tokenizer=None)
- Parameters:
hf_tokenizer (Optional[tokenizers.Tokenizer])
- Return type:
None
- hf_tokenizers = None
- __call__(text: str) dict
- __call__(text: list[str]) list[dict]
- abstract save(dir_path)
- Parameters:
dir_path (str)
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Abstractmethod:
- Parameters:
dir_path (str)
model_variant (Optional[str])
- Return type:
tokenizers.Tokenizer
- abstract get_size()
- Return type:
int
- abstract token_to_id(token)
- Parameters:
token (str)
- Return type:
Union[int, list[int]]
- abstract get_pad_id()
- Return type:
Union[Optional[int], list[int]]
- ensure_tokenizer()
- Return type:
tokenizers.Tokenizer
- __slots__ = ()
- class medcat2.components.addons.meta_cat.meta_cat_tokenizers.TokenizerWrapperBPE(hf_tokenizers=None)
Bases:
TokenizerWrapperBaseWrapper around a huggingface tokenizer so that it works with the MetaCAT models.
- Parameters:
tokenizers.ByteLevelBPETokenizer – A huggingface BBPE tokenizer.
hf_tokenizers (Optional[tokenizers.ByteLevelBPETokenizer])
- name = 'bbpe'
- __init__(hf_tokenizers=None)
- Parameters:
hf_tokenizers (Optional[tokenizers.ByteLevelBPETokenizer])
- Return type:
None
- __call__(text: str) dict
- __call__(text: list[str]) list[dict]
Tokenize some text
- Parameters:
text (Union[str, list[str]]) – Text/texts to be tokenized.
- Returns:
Union (dict, list[dict]) – Dictionary/ies containing offset_mapping, input_ids and tokens corresponding to the input text/s.
- Raises:
Exception – If the input is something other than text or a list of text.
- save(dir_path)
- Parameters:
dir_path (str)
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Parameters:
dir_path (str)
model_variant (Optional[str])
- Return type:
- get_size()
- Return type:
int
- token_to_id(token)
- Parameters:
token (str)
- Return type:
Union[int, list[int]]
- get_pad_id()
- Return type:
Union[int, list[int]]
- hf_tokenizers = None
- ensure_tokenizer()
- Return type:
tokenizers.Tokenizer
- __slots__ = ()
- class medcat2.components.addons.meta_cat.meta_cat_tokenizers.TokenizerWrapperBERT(hf_tokenizers=None)
Bases:
TokenizerWrapperBaseWrapper around a huggingface BERT tokenizer so that it works with the MetaCAT models.
- Parameters:
transformers.models.bert.tokenization_bert_fast.BertTokenizerFast – A huggingface Fast BERT.
hf_tokenizers (Optional[transformers.models.bert.tokenization_bert_fast.BertTokenizerFast])
- name = 'bert-tokenizer'
- __init__(hf_tokenizers=None)
- Parameters:
hf_tokenizers (Optional[transformers.models.bert.tokenization_bert_fast.BertTokenizerFast])
- Return type:
None
- __call__(text: str) dict
- __call__(text: list[str]) list[dict]
- save(dir_path)
- Parameters:
dir_path (str)
- Return type:
None
- classmethod load(dir_path, model_variant='', **kwargs)
- Parameters:
dir_path (str)
model_variant (Optional[str])
- Return type:
- get_size()
- Return type:
int
- token_to_id(token)
- Parameters:
token (str)
- Return type:
Union[int, list[int]]
- get_pad_id()
- Return type:
Optional[int]
- hf_tokenizers = None
- ensure_tokenizer()
- Return type:
tokenizers.Tokenizer
- __slots__ = ()