medcat2.utils.cdb_state

Attributes

logger

CDBState

CDB State.

Classes

NameInfo

dict() -> new empty dictionary

CUIInfo

dict() -> new empty dictionary

Functions

copy_cdb_state(cdb)

Creates a (deep) copy of the CDB state.

save_cdb_state(cdb, file_path)

Saves CDB state in a file.

apply_cdb_state(cdb, state)

Apply the specified state to the specified CDB.

load_and_apply_cdb_state(cdb, file_path)

Delete current CDB state and apply CDB state from file.

captured_state_cdb(cdb[, save_state_to_disk])

A context manager that captures and re-applies the initial CDB state.

in_memory_state_capture(cdb)

Capture the CDB state in memory.

on_disk_memory_capture(cdb)

Capture the CDB state in a temporary file.

Module Contents

class medcat2.utils.cdb_state.NameInfo

Bases: TypedDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

name: str
per_cui_status: dict[str, str]
is_upper: bool
count_train: int
__contains__()

True if the dictionary has the specified key, else False.

__delattr__()

Implement delattr(self, name).

__delitem__()

Delete self[key].

__dir__()

Default dir() implementation.

__eq__()

Return self==value.

__format__()

Default object formatter.

__ge__()

Return self>=value.

__getattribute__()

Return getattr(self, name).

__getitem__()

x.__getitem__(y) <==> x[y]

__gt__()

Return self>value.

__init__()

Initialize self. See help(type(self)) for accurate signature.

__ior__()

Return self|=value.

__iter__()

Implement iter(self).

__le__()

Return self<=value.

__len__()

Return len(self).

__lt__()

Return self<value.

__ne__()

Return self!=value.

__new__()

Create and return a new object. See help(type) for accurate signature.

__or__()

Return self|value.

__reduce__()

Helper for pickle.

__reduce_ex__()

Helper for pickle.

__repr__()

Return repr(self).

__reversed__()

Return a reverse iterator over the dict keys.

__ror__()

Return value|self.

__setattr__()

Implement setattr(self, name, value).

__setitem__()

Set self[key] to value.

__sizeof__()

D.__sizeof__() -> size of D in memory, in bytes

__str__()

Return str(self).

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

clear()

D.clear() -> None. Remove all items from D.

copy()

D.copy() -> a shallow copy of D

get()

Return the value for key if key is in the dictionary, else default.

items()

D.items() -> a set-like object providing a view on D’s items

keys()

D.keys() -> a set-like object providing a view on D’s keys

pop()

D.pop(k[,d]) -> v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault()

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update()

D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values()

D.values() -> an object providing a view on D’s values

class medcat2.utils.cdb_state.CUIInfo

Bases: TypedDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

cui: str
preferred_name: str
names: set[str]
subnames: set[str]
type_ids: set[str]
description: str | None
original_names: set[str] | None
tags: list[str] | None
group: str | None
in_other_ontology: set[str] | None
count_train: int
context_vectors: dict[str, numpy.ndarray] | None
average_confidence: float
__contains__()

True if the dictionary has the specified key, else False.

__delattr__()

Implement delattr(self, name).

__delitem__()

Delete self[key].

__dir__()

Default dir() implementation.

__eq__()

Return self==value.

__format__()

Default object formatter.

__ge__()

Return self>=value.

__getattribute__()

Return getattr(self, name).

__getitem__()

x.__getitem__(y) <==> x[y]

__gt__()

Return self>value.

__init__()

Initialize self. See help(type(self)) for accurate signature.

__ior__()

Return self|=value.

__iter__()

Implement iter(self).

__le__()

Return self<=value.

__len__()

Return len(self).

__lt__()

Return self<value.

__ne__()

Return self!=value.

__new__()

Create and return a new object. See help(type) for accurate signature.

__or__()

Return self|value.

__reduce__()

Helper for pickle.

__reduce_ex__()

Helper for pickle.

__repr__()

Return repr(self).

__reversed__()

Return a reverse iterator over the dict keys.

__ror__()

Return value|self.

__setattr__()

Implement setattr(self, name, value).

__setitem__()

Set self[key] to value.

__sizeof__()

D.__sizeof__() -> size of D in memory, in bytes

__str__()

Return str(self).

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

clear()

D.clear() -> None. Remove all items from D.

copy()

D.copy() -> a shallow copy of D

get()

Return the value for key if key is in the dictionary, else default.

items()

D.items() -> a set-like object providing a view on D’s items

keys()

D.keys() -> a set-like object providing a view on D’s keys

pop()

D.pop(k[,d]) -> v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault()

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update()

D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values()

D.values() -> an object providing a view on D’s values

medcat2.utils.cdb_state.logger
medcat2.utils.cdb_state.CDBState

CDB State.

This is a dictionary of the parts of the CDB that change during (supervised) training. It can be used to store and restore the state of a CDB after modifying it.

Currently, the following fields are saved:
  • name2info

  • cui2info

  • token_counts

medcat2.utils.cdb_state.copy_cdb_state(cdb)

Creates a (deep) copy of the CDB state.

Grabs the fields that correspond to the state, creates deep copies, and returns the copies.

Parameters:

cdb – The CDB from which to grab the state.

Returns:

CDBState – The copied state.

Return type:

CDBState

medcat2.utils.cdb_state.save_cdb_state(cdb, file_path)

Saves CDB state in a file.

Currently uses dill.dump to save the relevant fields/values.

Parameters:
  • cdb – The CDB from which to grab the state.

  • file_path (str) – The file to dump the state.

Return type:

None

medcat2.utils.cdb_state.apply_cdb_state(cdb, state)

Apply the specified state to the specified CDB.

This overwrites the current state of the CDB with one provided.

Parameters:
  • cdb – The CDB to apply the state to.

  • state (CDBState) – The state to use.

Return type:

None

medcat2.utils.cdb_state.load_and_apply_cdb_state(cdb, file_path)

Delete current CDB state and apply CDB state from file.

This first deletes the current state of the CDB. This is to save memory. The idea is that saving the staet on disk will save on RAM usage. But it wouldn’t really work too well if upon load, two instances were still in memory.

Parameters:
  • cdb – The CDB to apply the state to.

  • file_path (str) – The file where the state has been saved to.

Return type:

None

medcat2.utils.cdb_state.captured_state_cdb(cdb, save_state_to_disk=False)

A context manager that captures and re-applies the initial CDB state.

The context manager captures/copies the initial state of the CDB when entering. It then allows the user to modify the state (i.e training). Upon exit re-applies the initial CDB state.

If RAM is an issue, it is recommended to use save_state_to_disk. Otherwise the copy of the original state will be held in memory. If saved on disk, a temporary file is used and removed afterwards.

Parameters:
  • cdb – The CDB to use.

  • save_state_to_disk (bool) – Whether to save state on disk or hold in memory. Defaults to False.

Yields:

None

medcat2.utils.cdb_state.in_memory_state_capture(cdb)

Capture the CDB state in memory.

Parameters:

cdb – The CDB to use.

Yields:

None

medcat2.utils.cdb_state.on_disk_memory_capture(cdb)

Capture the CDB state in a temporary file.

Parameters:

cdb – The CDB to use

Yields:

None