medcat.components.addons.meta_cat.mctokenizers.bert_tokenizer

Attributes

FAKE_TOKENIZER_PATH

Classes

TokenizerWrapperBase

Helper class that provides a standard way to create an ABC using

TokenizerWrapperBERT

Wrapper around a huggingface BERT tokenizer so that it works with the

Module Contents

class medcat.components.addons.meta_cat.mctokenizers.bert_tokenizer.TokenizerWrapperBase(hf_tokenizer=None)

Bases: abc.ABC

Helper class that provides a standard way to create an ABC using inheritance.

Parameters:

hf_tokenizer (Optional[tokenizers.Tokenizer])

name: str
__init__(hf_tokenizer=None)
Parameters:

hf_tokenizer (Optional[tokenizers.Tokenizer])

Return type:

None

hf_tokenizers = None
__call__(text: str) dict
__call__(text: list[str]) list[dict]
abstract save(dir_path)
Parameters:

dir_path (str)

Return type:

None

classmethod load(dir_path, model_variant='', **kwargs)
Abstractmethod:

Parameters:
  • dir_path (str)

  • model_variant (Optional[str])

Return type:

tokenizers.Tokenizer

abstract get_size()
Return type:

int

abstract token_to_id(token)
Parameters:

token (str)

Return type:

Union[int, list[int]]

abstract get_pad_id()
Return type:

Union[Optional[int], list[int]]

ensure_tokenizer()
Return type:

tokenizers.Tokenizer

__slots__ = ()
medcat.components.addons.meta_cat.mctokenizers.bert_tokenizer.FAKE_TOKENIZER_PATH = Multiline-String
Show Value
"""#
/fake-path-not-exist#/"""
class medcat.components.addons.meta_cat.mctokenizers.bert_tokenizer.TokenizerWrapperBERT(hf_tokenizers=None)

Bases: medcat.components.addons.meta_cat.mctokenizers.tokenizers.TokenizerWrapperBase

Wrapper around a huggingface BERT tokenizer so that it works with the MetaCAT models.

Parameters:
  • transformers.models.bert.tokenization_bert_fast.BertTokenizerFast – A huggingface Fast BERT.

  • hf_tokenizers (Optional[transformers.models.bert.tokenization_bert_fast.BertTokenizerFast])

name = 'bert-tokenizer'
__init__(hf_tokenizers=None)
Parameters:

hf_tokenizers (Optional[transformers.models.bert.tokenization_bert_fast.BertTokenizerFast])

Return type:

None

__call__(text: str) dict
__call__(text: list[str]) list[dict]
save(dir_path)
Parameters:

dir_path (str)

Return type:

None

classmethod load(dir_path, model_variant='', **kwargs)
Parameters:
  • dir_path (str)

  • model_variant (Optional[str])

Return type:

TokenizerWrapperBERT

classmethod create_new(model_variant)
Parameters:

model_variant (Optional[str])

Return type:

TokenizerWrapperBERT

get_size()
Return type:

int

token_to_id(token)
Parameters:

token (str)

Return type:

Union[int, list[int]]

get_pad_id()
Return type:

Optional[int]

hf_tokenizers = None
ensure_tokenizer()
Return type:

tokenizers.Tokenizer

__slots__ = ()