medcat.storage.serialisers
Attributes
Exceptions
Inappropriate argument value (of correct type). |
Classes
The base serialisable protocol. |
|
The base serialisable protocol. |
|
python's Unpickler extended to interpreter sessions and more types |
|
The abstract serialiser base class. |
|
Describes the available serialisers. |
|
The dill based serialiser. |
Functions
|
Gets all serialisable members of an object. |
|
Loads the schema for a folder of deserialisable files from the file. |
|
Saves the schema of a class to the specified file. |
|
|
|
Get the serialiser based on the type specified. |
|
Get the serialiser type that was used to serialise data in the folder. |
|
Get the serialiser that was used to serialise the data in the folder. |
|
Serialise an object based on the specified serialiser type. |
|
Deserialise contents of a folder. |
Module Contents
- class medcat.storage.serialisers.Serialisable
Bases:
ProtocolThe base serialisable protocol.
- get_strategy()
Get the serialisation strategy.
- Returns:
SerialisingStrategy – The strategy.
- Return type:
- classmethod get_init_attrs()
Get the names of the arguments needed for init upon deserialisation.
- Returns:
list[str] – The list of init arguments’ names.
- Return type:
list[str]
- classmethod ignore_attrs()
Get the names of attributes not to serialise.
- Returns:
list[str] – The attribute names that should not be serialised.
- Return type:
list[str]
- classmethod include_properties()
- Return type:
list[str]
- __slots__ = ()
- _is_protocol = True
- _is_runtime_protocol = False
- classmethod __init_subclass__(*args, **kwargs)
- classmethod __class_getitem__(params)
- class medcat.storage.serialisers.ManualSerialisable
Bases:
Serialisable,ProtocolThe base serialisable protocol.
- serialise_to(folder_path)
Serialise to a folder.
- Parameters:
folder_path (str) – The folder to serialise to.
- Return type:
None
- classmethod deserialise_from(folder_path, **init_kwargs)
Deserialise from a specifc path.
The init keyword arguments are generally: - cnf: The config relevant to the components - tokenizer (BaseTokenizer): The base tokenizer for the model - cdb (CDB): The CDB for the model - vocab (Vocab): The Vocab for the model - model_load_path (Optional[str]): The model load path,
but not the component load path
- Parameters:
folder_path (str) – The path to deserialsie form.
- Returns:
ManualSerialisable – The deserialised object.
- Return type:
- get_strategy()
Get the serialisation strategy.
- Returns:
SerialisingStrategy – The strategy.
- Return type:
- classmethod get_init_attrs()
Get the names of the arguments needed for init upon deserialisation.
- Returns:
list[str] – The list of init arguments’ names.
- Return type:
list[str]
- classmethod ignore_attrs()
Get the names of attributes not to serialise.
- Returns:
list[str] – The attribute names that should not be serialised.
- Return type:
list[str]
- classmethod include_properties()
- Return type:
list[str]
- __slots__ = ()
- _is_protocol = True
- _is_runtime_protocol = False
- classmethod __init_subclass__(*args, **kwargs)
- classmethod __class_getitem__(params)
- medcat.storage.serialisers.get_all_serialisable_members(object)
Gets all serialisable members of an object.
This looks for public and protected members, but not private ones. It should also be able to return parts of lists and tuples. It also provides the name of each serialisable object.
- Parameters:
object (Any) – The target object.
- Returns:
tuple[list[tuple[Serialisable, str]], dict[str, Any]] – list of serialisable objects along with their names
- Return type:
tuple[list[tuple[Serialisable, str]], dict[str, Any]]
- medcat.storage.serialisers.load_schema(file_name)
Loads the schema for a folder of deserialisable files from the file.
- Parameters:
file_name (str) – The schema file
- Returns:
tuple[str, list[str]] – The class package/name along with the parts needed for initialising.
- Return type:
tuple[str, list[str]]
- medcat.storage.serialisers.save_schema(file_name, cls, init_parts)
Saves the schema of a class to the specified file.
- Parameters:
file_name (str) – The file to save to.
cls (Type) – The class in question
list[str] (init_parts) – The parts of the .
init_parts (list[str])
- Return type:
None
- medcat.storage.serialisers.DEFAULT_SCHEMA_FILE = '.schema.json'
- exception medcat.storage.serialisers.IllegalSchemaException(*args)
Bases:
ValueErrorInappropriate argument value (of correct type).
- __init__(*args)
Initialize self. See help(type(self)) for accurate signature.
- class __cause__
exception cause
- class __context__
exception context
- __delattr__()
Implement delattr(self, name).
- __dir__()
Default dir() implementation.
- __eq__()
Return self==value.
- __format__()
Default object formatter.
- __ge__()
Return self>=value.
- __getattribute__()
Return getattr(self, name).
- __gt__()
Return self>value.
- __hash__()
Return hash(self).
- __le__()
Return self<=value.
- __lt__()
Return self<value.
- __ne__()
Return self!=value.
- __new__()
Create and return a new object. See help(type) for accurate signature.
- __reduce__()
- __reduce_ex__()
Helper for pickle.
- __repr__()
Return repr(self).
- __setattr__()
Implement setattr(self, name, value).
- __setstate__()
- __sizeof__()
Size of object in memory, in bytes.
- __str__()
Return str(self).
- __subclasshook__()
Abstract classes can override this to customize issubclass().
This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).
- class __suppress_context__
- class __traceback__
- class args
- with_traceback()
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- medcat.storage.serialisers.fix_module_and_cls_name(module_name, cls_name)
- Parameters:
module_name (str)
cls_name (str)
- Return type:
tuple[str, str]
- class medcat.storage.serialisers.RemappingUnpickler(*args, **kwds)
Bases:
dill.Unpicklerpython’s Unpickler extended to interpreter sessions and more types
- find_class(module, name)
Return an object from a specified module.
If necessary, the module will be imported. Subclasses may override this method (e.g. to restrict unpickling of arbitrary classes and functions).
This method is called whenever a class or a function object is needed. Both arguments passed are str objects.
- Parameters:
module (str)
name (str)
- _session = False
- __init__(*args, **kwds)
Initialize self. See help(type(self)) for accurate signature.
- _main
- _ignore = False
- load()
Load a pickle.
Read a pickled object representation from the open file object given in the constructor, and return the reconstituted object hierarchy specified therein.
- __delattr__()
Implement delattr(self, name).
- __dir__()
Default dir() implementation.
- __eq__()
Return self==value.
- __format__()
Default object formatter.
- __ge__()
Return self>=value.
- __getattribute__()
Return getattr(self, name).
- __gt__()
Return self>value.
- __hash__()
Return hash(self).
- __le__()
Return self<=value.
- __lt__()
Return self<value.
- __ne__()
Return self!=value.
- __reduce__()
Helper for pickle.
- __reduce_ex__()
Helper for pickle.
- __repr__()
Return repr(self).
- __setattr__()
Implement setattr(self, name, value).
- __sizeof__()
Returns size in memory, in bytes.
- __str__()
Return str(self).
- class memo
- class persistent_load
- _buffers
- _file_readline
- _file_read
- encoding = 'ASCII'
- errors = 'strict'
- proto = 0
- fix_imports = True
- pop_mark()
- dispatch
- load_proto()
- load_frame()
- load_persid()
- load_binpersid()
- load_none()
- load_false()
- load_true()
- load_int()
- load_binint()
- load_binint1()
- load_binint2()
- load_long()
- load_long1()
- load_long4()
- load_float()
- load_binfloat()
- _decode_string(value)
- load_string()
- load_binstring()
- load_binbytes()
- load_unicode()
- load_binunicode()
- load_binunicode8()
- load_binbytes8()
- load_bytearray8()
- load_next_buffer()
- load_readonly_buffer()
- load_short_binstring()
- load_short_binbytes()
- load_short_binunicode()
- load_tuple()
- load_empty_tuple()
- load_tuple1()
- load_tuple2()
- load_tuple3()
- load_empty_list()
- load_empty_dictionary()
- load_empty_set()
- load_frozenset()
- load_list()
- load_dict()
- _instantiate(klass, args)
- load_inst()
- load_obj()
- load_newobj()
- load_newobj_ex()
- load_global()
- load_stack_global()
- load_ext1()
- load_ext2()
- load_ext4()
- get_extension(code)
- load_reduce()
- load_pop()
- load_pop_mark()
- load_dup()
- load_get()
- load_binget()
- load_long_binget()
- load_put()
- load_binput()
- load_long_binput()
- load_memoize()
- load_append()
- load_appends()
- load_setitem()
- load_setitems()
- load_additems()
- load_build()
- load_mark()
- load_stop()
- medcat.storage.serialisers.logger
- medcat.storage.serialisers.SER_TYPE_FILE = '.serialised_by'
- medcat.storage.serialisers.MANUAL_SERIALISED_TAG = 'MANUALLY_SERIALISED:'
- medcat.storage.serialisers.MANUAL_SERIALISED_RE
- class medcat.storage.serialisers.Serialiser
Bases:
abc.ABCThe abstract serialiser base class.
This class is responsible for both serialising and deserialising.
- RAW_FILE = 'raw_dict.dat'
- property ser_type: AvailableSerialisers
- Abstractmethod:
- Return type:
The serialiser type.
- abstract serialise(raw_parts, target_file)
Serialise the raw attributes / objects.
- Parameters:
raw_parts (dict[str, Any]) – The raw objects to serialise.
target_file (str) – The file name to write to.
- Return type:
None
- abstract deserialise(target_file)
Deserialise data written to the specified file.
- Parameters:
target_file (str) – The file to read from.
- Returns:
dict[str, Any] – The deserialised raw attributes / objects.
- Return type:
dict[str, Any]
- classmethod get_ser_type_file(folder)
- Parameters:
folder (str)
- Return type:
str
- save_ser_type_file(folder)
Save the serialiser type into the specified folder.
- Parameters:
folder (str) – The folder to use.
- Return type:
None
- classmethod get_manually_serialised_path(folder)
- Parameters:
folder (str)
- Return type:
Optional[str]
- check_ser_type(folder)
Check that the folder contains data serialised by this serialiser.
- Parameters:
folder (str) – Target folder.
- Raises:
TypeError – If the folder was not serialised by this serialiser.
- Return type:
None
- serialise_all(obj, target_folder, overwrite=False)
Serialise the entire object into the target folder.
This finds the serialisable parts (attributes) of the object and calls the same method on them recursively. It also finds the raw attributes (if any) and serialises them.
- Parameters:
obj (Serialisable) – The object to serialise.
target_folder (str) – The target folder.
overwrite (bool) – Whether to allow overwriting. Defaults to False.
- Raises:
IllegalSchemaException – If there’s multiple parts with the same name or a file already exists.
- Return type:
None
- classmethod deserialise_manually(folder_path, man_cls_path, **init_kwargs)
- Parameters:
folder_path (str)
man_cls_path (str)
- Return type:
- deserialise_all(folder_path, ignore_folders_prefix=set(), ignore_folders_suffix=set(), **kwargs)
Deserialise contents of folder.
Additional initialisation keyword arguments can be provided if needed.
This loads both the raw attributes for this object as well as the serialisable parts / attributes recursively.
- Parameters:
folder_path (str) – The folder path.
ignore_folders_prefix (set[str]) – The prefixes of folders to ignore.
ignore_folders_suffix (set[str]) – The suffixes of folders to ignore.
- Returns:
Serialisable – The resulting object.
- Return type:
- __slots__ = ()
- class medcat.storage.serialisers.AvailableSerialisers
Bases:
enum.EnumDescribes the available serialisers.
- dill
- json
- write_to(file_path)
- Parameters:
file_path (str)
- Return type:
None
- classmethod from_file(file_path)
- Parameters:
file_path (str)
- Return type:
- __new__(value)
- _generate_next_value_(start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None
- classmethod _missing_(value)
- __repr__()
- __str__()
- __dir__()
Returns all members and all public methods
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- __hash__()
- __reduce_ex__(proto)
- name()
The name of the Enum member.
- value()
The value of the Enum member.
- class medcat.storage.serialisers.DillSerialiser
Bases:
SerialiserThe dill based serialiser.
- ser_type
The serialiser type.
- serialise(raw_parts, target_file)
Serialise the raw attributes / objects.
- Parameters:
raw_parts (dict[str, Any]) – The raw objects to serialise.
target_file (str) – The file name to write to.
- Return type:
None
- deserialise(target_file)
Deserialise data written to the specified file.
- Parameters:
target_file (str) – The file to read from.
- Returns:
dict[str, Any] – The deserialised raw attributes / objects.
- Return type:
dict[str, Any]
- RAW_FILE = 'raw_dict.dat'
- classmethod get_ser_type_file(folder)
- Parameters:
folder (str)
- Return type:
str
- save_ser_type_file(folder)
Save the serialiser type into the specified folder.
- Parameters:
folder (str) – The folder to use.
- Return type:
None
- classmethod get_manually_serialised_path(folder)
- Parameters:
folder (str)
- Return type:
Optional[str]
- check_ser_type(folder)
Check that the folder contains data serialised by this serialiser.
- Parameters:
folder (str) – Target folder.
- Raises:
TypeError – If the folder was not serialised by this serialiser.
- Return type:
None
- serialise_all(obj, target_folder, overwrite=False)
Serialise the entire object into the target folder.
This finds the serialisable parts (attributes) of the object and calls the same method on them recursively. It also finds the raw attributes (if any) and serialises them.
- Parameters:
obj (Serialisable) – The object to serialise.
target_folder (str) – The target folder.
overwrite (bool) – Whether to allow overwriting. Defaults to False.
- Raises:
IllegalSchemaException – If there’s multiple parts with the same name or a file already exists.
- Return type:
None
- classmethod deserialise_manually(folder_path, man_cls_path, **init_kwargs)
- Parameters:
folder_path (str)
man_cls_path (str)
- Return type:
- deserialise_all(folder_path, ignore_folders_prefix=set(), ignore_folders_suffix=set(), **kwargs)
Deserialise contents of folder.
Additional initialisation keyword arguments can be provided if needed.
This loads both the raw attributes for this object as well as the serialisable parts / attributes recursively.
- Parameters:
folder_path (str) – The folder path.
ignore_folders_prefix (set[str]) – The prefixes of folders to ignore.
ignore_folders_suffix (set[str]) – The suffixes of folders to ignore.
- Returns:
Serialisable – The resulting object.
- Return type:
- __slots__ = ()
- medcat.storage.serialisers._DEF_SER
- medcat.storage.serialisers.get_serialiser(serialiser_type=_DEF_SER)
Get the serialiser based on the type specified.
- Parameters:
serialiser_type (Union[str, AvailableSerialisers], optional) – The required type. Defaults to ‘dill’.
- Raises:
ValueError – If no serialiser is found.
- Returns:
Serialiser – The appropriate serialiser.
- Return type:
- medcat.storage.serialisers.get_serialiser_type_from_folder(folder_path)
Get the serialiser type that was used to serialise data in the folder.
- Parameters:
folder_path (str) – The folder in question.
- Returns:
AvailableSerialisers – The serialiser type.
- Return type:
- medcat.storage.serialisers.get_serialiser_from_folder(folder_path)
Get the serialiser that was used to serialise the data in the folder.
- Parameters:
folder_path (str) – The folder in question.
- Returns:
Serialiser – The appropriate serialiser.
- Return type:
- medcat.storage.serialisers.serialise(serialiser_type, obj, target_folder, overwrite=False)
Serialise an object based on the specified serialiser type.
- Parameters:
serialiser_type (Union[str, AvailableSerialisers]) – The serialiser type.
obj (Serialisable) – The object to serialise.
target_folder (str) – The folder to serialise into.
overwrite (bool) – Whether to allow overwriting. Defaults to False.
- Return type:
None
- medcat.storage.serialisers.deserialise(folder_path, ignore_folders_prefix=set(), ignore_folders_suffix=set(), **init_kwargs)
Deserialise contents of a folder.
Extra init keyword arguments can be provided if needed. These are generally: - cnf: The config relevant to the components - tokenizer (BaseTokenizer): The base tokenizer for the model - cdb (CDB): The CDB for the model - vocab (Vocab): The Vocab for the model - model_load_path (Optional[str]): The model load path,
but not the component load path
This method finds the serialiser to be used based on the files on disk.
- Parameters:
folder_path (str) – The folder to serialise.
ignore_folders_prefix (set[str]) – The prefixes of folders to ignore.
ignore_folders_suffix (set[str]) – The suffixes of folders to ignore.
- Returns:
Serialisable – The deserialised object.
- Return type: