medcat.tokenizing.spacy_impl.tokens =================================== .. py:module:: medcat.tokenizing.spacy_impl.tokens Attributes ---------- .. autoapisummary:: medcat.tokenizing.spacy_impl.tokens.logger Exceptions ---------- .. autoapisummary:: medcat.tokenizing.spacy_impl.tokens.UnregisteredDataPathException Classes ------- .. autoapisummary:: medcat.tokenizing.spacy_impl.tokens.BaseToken medcat.tokenizing.spacy_impl.tokens.MutableToken medcat.tokenizing.spacy_impl.tokens.BaseEntity medcat.tokenizing.spacy_impl.tokens.MutableEntity medcat.tokenizing.spacy_impl.tokens.BaseDocument medcat.tokenizing.spacy_impl.tokens.Token medcat.tokenizing.spacy_impl.tokens.Entity medcat.tokenizing.spacy_impl.tokens.Document Module Contents --------------- .. py:class:: BaseToken Bases: :py:obj:`Protocol` Base token protocol. This represents the static (unchangeable) parts of a token. .. py:property:: text :type: str The text represented by this token. .. py:property:: lower :type: str The lower case text representation. .. py:property:: text_versions :type: list[str] The different versions of text (e.g normalised and lower) .. py:property:: is_upper :type: bool Whether the text is upper case. .. py:property:: is_stop :type: bool Whether the token represents a stop token. .. py:property:: char_index :type: int The character index of the start of this token .. py:property:: index :type: int The index (in terms of tokens) of this token in the document. .. py:property:: text_with_ws :type: str The text with tailing whitespace (where applicable). .. py:property:: is_digit :type: bool Whether the token represents a digit. .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:class:: MutableToken Bases: :py:obj:`Protocol` The mutable part of a token. This protocol describes all the parts of a token that could be expected to change. .. py:property:: base :type: BaseToken The base portion of the token. .. py:property:: is_punctuation :type: bool Whether the token represents punctuation. .. py:property:: to_skip :type: bool Whether the token should be skipped. .. py:property:: lemma :type: str The lemmatised version of the text. .. py:property:: tag :type: Optional[str] Optional tag (e.g) for normalization. .. py:property:: norm :type: str The normalised text. .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:class:: BaseEntity Bases: :py:obj:`Protocol` Base entity protocol. This describes the static (unchangeable) parts of an entity or sequence of tokens. .. py:property:: start_index :type: int The index of the first token in the entity. .. py:property:: end_index :type: int The index of the last token in the entity. .. py:property:: start_char_index :type: int The character index of the first token. .. py:property:: end_char_index :type: int The character index of the last token. .. py:property:: label :type: int seems unused). :type: The label of the entity (NOTE .. py:property:: text :type: str The text of the entire entity. .. py:method:: __iter__() .. py:method:: __len__() .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:class:: MutableEntity Bases: :py:obj:`Protocol` The mutable part of an entity. This represent the changeable part of an entnity. That is, parts that should be changed by the various components. .. py:property:: base :type: BaseEntity The base / static entity part. .. py:property:: detected_name :type: str The detected name (if any) for this entity. This should be set by the NER component. .. py:method:: set_addon_data(path, val) Used to add arbitrary data to the entity. This is generally used by addons to keep track of their data. NB! The path used needs to be registered using the `register_addon_path` class method. :param path: The data ID / path. :type path: str :param val: The value to be added. :type val: Any .. py:method:: has_addon_data(path) Checks whether the addon data for a specific path has been set. :param path: The path to check. :type path: str :Returns: **bool** -- Whether the addon data had been set. .. py:method:: get_addon_data(path) Get data added to the entity. See `add_data` for details. :param path: The data ID / path. :type path: str :Returns: **Any** -- The stored value. .. py:method:: get_available_addon_paths() Gets the available addon data paths for this entity. This will only include paths that have values set. :Returns: **list[str]** -- List of available addon data paths. .. py:property:: link_candidates :type: list[str] The candidates for the detected name (if any) for this entity. This should be set by the NER component. .. py:property:: context_similarity :type: float The context similarity of the lnked entity. This should be set by the linker component. .. py:property:: confidence :type: float The confidence for the lnked entity. NOTE: This seems to be unused! .. py:property:: cui :type: str The CUI of the lnked entity. This should be set by the linker component. .. py:property:: id :type: int The ID of the entity within the document. This counts all the entities recognised, not just ones that were successfully linked. This should be set by the NER. .. py:method:: register_addon_path(path, def_val = None, force = True) :classmethod: Register a custom/arbitrary data path. This can be used to store arbitrary data along with the entity for use in an addon (e.g MetaCAT). PS: If using this, it is important to use paths namespaced to the component you're using in order to avoid conflicts. :param path: The path to be used. Should be prefixed by component name (e.g `meta_cat_id` for an ID tied to the `meta_cat` addon) :type path: str :param def_val: Default value. Defaults to `None`. :type def_val: Any :param force: Whether to forcefully add the value. Defaults to True. :type force: bool .. py:method:: __iter__() .. py:method:: __len__() .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:class:: BaseDocument Bases: :py:obj:`Protocol` The base document protocol. Represents the unchangeable parts of the whole document. .. py:property:: text :type: str The document raw text. .. py:method:: __getitem__(index: int) -> BaseToken __getitem__(index: slice) -> BaseEntity .. py:method:: __iter__() .. py:method:: isupper() Whether the entire document is upper case. .. py:attribute:: __slots__ :value: () .. py:attribute:: _is_protocol :value: True .. py:attribute:: _is_runtime_protocol :value: False .. py:method:: __init_subclass__(*args, **kwargs) :classmethod: .. py:method:: __class_getitem__(params) :classmethod: .. py:exception:: UnregisteredDataPathException(cls, path) Bases: :py:obj:`ValueError` Inappropriate argument value (of correct type). .. py:method:: __init__(cls, path) Initialize self. See help(type(self)) for accurate signature. .. py:attribute:: cls .. py:attribute:: path .. py:class:: __cause__ exception cause .. py:class:: __context__ exception context .. py:method:: __delattr__() Implement delattr(self, name). .. py:method:: __dir__() Default dir() implementation. .. py:method:: __eq__() Return self==value. .. py:method:: __format__() Default object formatter. .. py:method:: __ge__() Return self>=value. .. py:method:: __getattribute__() Return getattr(self, name). .. py:method:: __gt__() Return self>value. .. py:method:: __hash__() Return hash(self). .. py:method:: __le__() Return self<=value. .. py:method:: __lt__() Return self medcat.tokenizing.tokens.MutableToken __getitem__(index: slice) -> medcat.tokenizing.tokens.MutableEntity .. py:method:: __len__() .. py:method:: get_tokens(start_index, end_index) .. py:method:: set_addon_data(path, val) .. py:method:: has_addon_data(path) .. py:method:: get_addon_data(path) .. py:method:: get_available_addon_paths() .. py:method:: register_addon_path(path, def_val = None, force = True) :classmethod: .. py:method:: __iter__() .. py:method:: isupper() .. py:method:: __str__() .. py:method:: __repr__()