medcat.cdb ========== .. py:module:: medcat.cdb Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/medcat/cdb/cdb/index /autoapi/medcat/cdb/concepts/index Attributes ---------- .. autoapisummary:: medcat.cdb.__all__ Classes ------- .. autoapisummary:: medcat.cdb.CDB Package Contents ---------------- .. py:class:: CDB(config) Bases: :py:obj:`medcat.storage.serialisables.AbstractSerialisable` The abstract serialisable base class. This defines some common defaults. .. py:method:: __init__(config) .. py:attribute:: config .. py:attribute:: cui2info :type: dict[str, medcat.cdb.concepts.CUIInfo] .. py:attribute:: name2info :type: dict[str, medcat.cdb.concepts.NameInfo] .. py:attribute:: type_id2info :type: dict[str, medcat.cdb.concepts.TypeInfo] .. py:attribute:: token_counts :type: dict[str, int] .. py:attribute:: addl_info :type: dict[str, Any] .. py:attribute:: _subnames :type: set[str] .. py:attribute:: is_dirty :value: False .. py:attribute:: has_changed_names :value: False .. py:method:: get_init_attrs() :classmethod: .. py:method:: _reset_subnames() .. py:method:: has_subname(name) Whether the CDB has the specified subname. :param name: The subname to check. :type name: str :Returns: **bool** -- Whether the subname is present in this CDB. .. py:method:: get_name(cui) Returns preferred name if it exists, otherwise it will return the longest name assigned to the concept. :param cui: Concept ID or unique identifier in this database. :type cui: str :Returns: **str** -- The name of the concept. .. py:method:: weighted_average_function(step) Get the weighted average for steop. :param step: The steop. :type step: int :Returns: **float** -- The weighted average. .. py:method:: add_types(types) Add type info to CDB. :param types: The raw type info. :type types: Iterable[tuple[str, str]] .. py:method:: add_names(cui, names, name_status = ST.AUTOMATIC, full_build = False) Adds a name to an existing concept. :param cui: Concept ID or unique identifier in this database, all concepts that have the same CUI will be merged internally. :type cui: str :param names: Names for this concept, or the value that if found in free text can be linked to this concept. Names is an dict like: `{name: {'tokens': tokens, 'snames': snames, 'raw_name': raw_name}, ...}` Names should be generated by helper function 'medcat.preprocessing.cleaners.prepare_name' :type names: dict[str, NameDescriptor] :param name_status: One of `P`, `N`, `A`. Defaults to 'A'. :type name_status: str :param full_build: If True the dictionary self.addl_info will also be populated, contains a lot of extra information about concepts, but can be very memory consuming. This is not necessary for normal functioning of MedCAT (Default value `False`). :type full_build: bool .. py:method:: _add_concept_names(cui, names, name_status) .. py:method:: _add_full_build(cui, names, ontologies, description, type_ids) .. py:method:: _add_concept(cui, names, ontologies, name_status, type_ids, description, full_build = False) Add a concept to internal Concept Database (CDB). Depending on what you are providing this will add a large number of properties for each concept. :param cui: Concept ID or unique identifier in this database, all concepts that have the same CUI will be merged internally. :type cui: str :param names: Names for this concept, or the value that if found in free text can be linked to this concept. Names is a dict like: `{name: {'tokens': tokens, 'snames': snames, 'raw_name': raw_name}, ...}` Names should be generated by helper function 'medcat.preprocessing.cleaners.prepare_name' :type names: dict[str, NameDescriptor] :param ontologies: ontologies in which the concept exists (e.g. SNOMEDCT, HPO) :type ontologies: set[str] :param name_status: One of `P`, `N`, `A` :type name_status: str :param type_ids: Semantic type identifier (have a look at TUIs in UMLS or SNOMED-CT) :type type_ids: set[str] :param description: Description of this concept. :type description: str :param full_build: If True the dictionary self.addl_info will also be populated, contains a lot of extra information about concepts, but can be very memory consuming. This is not necessary for normal functioning of MedCAT (Default Value `False`). :type full_build: bool .. py:method:: reset_training() Will remove all training efforts - in other words all embeddings that are learnt for concepts in the current CDB. Please note that this does not remove synonyms (names) that were potentially added during supervised/online learning. .. py:method:: filter_by_cui(cuis_to_keep) Subset the core CDB fields (dictionaries/maps). Note that this will potenitally keep a bit more CUIs then in cuis_to_keep. It will first find all names that link to the cuis_to_keep and then find all CUIs that link to those names and keep all of them. This also will not remove any data from cdb.addl_info - as this field can contain data of unknown structure. :param cuis_to_keep: CUIs that will be kept, the rest will be removed (not completely, look above). :type cuis_to_keep: Collection[str] :raises Exception: If no snames and subsetting is not possible. .. py:method:: remove_cui(cui) This function takes a CUI and removes it the CDB. It also removes the CUI from name specific per_cui_status maps as well as well as removes all the names that do not correspond to any CUIs after the removal of this one. :param cui: The CUI to remove. :type cui: str .. py:method:: _remove_names(cui, names) Remove names from an existing concept - effect is this name will never again be used to link to this concept. This will only remove the name from the linker (namely name2cuis and name2cuis2status), the name will still be present everywhere else. Why? Because it is bothersome to remove it from everywhere, but could also be useful to keep the removed names in e.g. cui2names. :param cui: Concept ID or unique identifier in this database. :type cui: str :param names: Names to be removed (e.g list, set, or even a dict (in which case keys will be used)). :type names: Iterable[str] .. py:method:: __eq__(other) .. py:method:: get_cui2count_train() .. py:method:: get_name2count_train() .. py:method:: get_hash() .. py:method:: get_basic_info() .. py:method:: save(save_path, serialiser = AvailableSerialisers.dill, overwrite = False) Save CDB at path. :param save_path: The path to save at. :type save_path: str :param serialiser: The serialiser. Defaults to AvailableSerialisers.dill. :type serialiser: Union[ str, AvailableSerialisers], optional :param overwrite: Whether to allow overwriting existing files. Defaults to False. :type overwrite: bool, optional .. py:method:: load(path) :classmethod: .. py:method:: get_strategy() .. py:method:: ignore_attrs() :classmethod: .. py:method:: include_properties() :classmethod: .. py:data:: __all__ :value: ['CDB']