medcat2.storage.serialisers

Attributes

DEFAULT_SCHEMA_FILE

logger

SER_TYPE_FILE

MANUAL_SERIALISED_TAG

MANUAL_SERIALISED_RE

_DEF_SER

Exceptions

IllegalSchemaException

Inappropriate argument value (of correct type).

Classes

Serialisable

The base serialisable protocol.

ManualSerialisable

The base serialisable protocol.

Serialiser

The abstract serialiser base class.

AvailableSerialisers

Describes the available serialisers.

DillSerialiser

The dill based serialiser.

Functions

get_all_serialisable_members(object)

Gets all serialisable members of an object.

load_schema(file_name)

Loads the schema for a folder of deserialisable files from the file.

save_schema(file_name, cls, init_parts)

Saves the schema of a class to the specified file.

get_serialiser([serialiser_type])

Get the serialiser based on the type specified.

get_serialiser_type_from_folder(folder_path)

Get the serialiser type that was used to serialise data in the folder.

get_serialiser_from_folder(folder_path)

Get the serialiser that was used to serialise the data in the folder.

serialise(serialiser_type, obj, target_folder[, overwrite])

Serialise an object based on the specified serialiser type.

deserialise(folder_path[, ignore_folders_prefix, ...])

Deserialise contents of a folder.

Module Contents

class medcat2.storage.serialisers.Serialisable

Bases: Protocol

The base serialisable protocol.

get_strategy()

Get the serialisation strategy.

Returns:

SerialisingStrategy – The strategy.

Return type:

SerialisingStrategy

classmethod get_init_attrs()

Get the names of the arguments needed for init upon deserialisation.

Returns:

list[str] – The list of init arguments’ names.

Return type:

list[str]

classmethod ignore_attrs()

Get the names of attributes not to serialise.

Returns:

list[str] – The attribute names that should not be serialised.

Return type:

list[str]

classmethod include_properties()
Return type:

list[str]

__slots__ = ()
_is_protocol = True
_is_runtime_protocol = False
classmethod __init_subclass__(*args, **kwargs)
classmethod __class_getitem__(params)
class medcat2.storage.serialisers.ManualSerialisable

Bases: Serialisable, Protocol

The base serialisable protocol.

serialise_to(folder_path)
Parameters:

folder_path (str)

Return type:

None

classmethod deserialise_from(folder_path, **init_kwargs)
Parameters:

folder_path (str)

Return type:

ManualSerialisable

get_strategy()

Get the serialisation strategy.

Returns:

SerialisingStrategy – The strategy.

Return type:

SerialisingStrategy

classmethod get_init_attrs()

Get the names of the arguments needed for init upon deserialisation.

Returns:

list[str] – The list of init arguments’ names.

Return type:

list[str]

classmethod ignore_attrs()

Get the names of attributes not to serialise.

Returns:

list[str] – The attribute names that should not be serialised.

Return type:

list[str]

classmethod include_properties()
Return type:

list[str]

__slots__ = ()
_is_protocol = True
_is_runtime_protocol = False
classmethod __init_subclass__(*args, **kwargs)
classmethod __class_getitem__(params)
medcat2.storage.serialisers.get_all_serialisable_members(object)

Gets all serialisable members of an object.

This looks for public and protected members, but not private ones. It should also be able to return parts of lists and tuples. It also provides the name of each serialisable object.

Parameters:

object (Any) – The target object.

Returns:

tuple[list[tuple[Serialisable, str]], dict[str, Any]] – list of serialisable objects along with their names

Return type:

tuple[list[tuple[Serialisable, str]], dict[str, Any]]

medcat2.storage.serialisers.load_schema(file_name)

Loads the schema for a folder of deserialisable files from the file.

Parameters:

file_name (str) – The schema file

Returns:

tuple[str, list[str]] – The class package/name along with the parts needed for initialising.

Return type:

tuple[str, list[str]]

medcat2.storage.serialisers.save_schema(file_name, cls, init_parts)

Saves the schema of a class to the specified file.

Parameters:
  • file_name (str) – The file to save to.

  • cls (Type) – The class in question

  • list[str] (init_parts) – The parts of the .

  • init_parts (list[str])

Return type:

None

medcat2.storage.serialisers.DEFAULT_SCHEMA_FILE = '.schema.json'
exception medcat2.storage.serialisers.IllegalSchemaException(*args)

Bases: ValueError

Inappropriate argument value (of correct type).

__init__(*args)

Initialize self. See help(type(self)) for accurate signature.

class __cause__

exception cause

class __context__

exception context

__delattr__()

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__()

Return self==value.

__format__()

Default object formatter.

__ge__()

Return self>=value.

__getattribute__()

Return getattr(self, name).

__gt__()

Return self>value.

__hash__()

Return hash(self).

__le__()

Return self<=value.

__lt__()

Return self<value.

__ne__()

Return self!=value.

__new__()

Create and return a new object. See help(type) for accurate signature.

__reduce__()
__reduce_ex__()

Helper for pickle.

__repr__()

Return repr(self).

__setattr__()

Implement setattr(self, name, value).

__setstate__()
__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

class __suppress_context__
class __traceback__
class args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

medcat2.storage.serialisers.logger
medcat2.storage.serialisers.SER_TYPE_FILE = '.serialised_by'
medcat2.storage.serialisers.MANUAL_SERIALISED_TAG = 'MANUALLY_SERIALISED:'
medcat2.storage.serialisers.MANUAL_SERIALISED_RE
class medcat2.storage.serialisers.Serialiser

Bases: abc.ABC

The abstract serialiser base class.

This class is responsible for both serialising and deserialising.

RAW_FILE = 'raw_dict.dat'
property ser_type: AvailableSerialisers
Abstractmethod:

Return type:

AvailableSerialisers

The serialiser type.

abstract serialise(raw_parts, target_file)

Serialise the raw attributes / objects.

Parameters:
  • raw_parts (dict[str, Any]) – The raw objects to serialise.

  • target_file (str) – The file name to write to.

Return type:

None

abstract deserialise(target_file)

Deserialise data written to the specified file.

Parameters:

target_file (str) – The file to read from.

Returns:

dict[str, Any] – The deserialised raw attributes / objects.

Return type:

dict[str, Any]

classmethod get_ser_type_file(folder)
Parameters:

folder (str)

Return type:

str

save_ser_type_file(folder)

Save the serialiser type into the specified folder.

Parameters:

folder (str) – The folder to use.

Return type:

None

classmethod get_manually_serialised_path(folder)
Parameters:

folder (str)

Return type:

Optional[str]

check_ser_type(folder)

Check that the folder contains data serialised by this serialiser.

Parameters:

folder (str) – Target folder.

Raises:

TypeError – If the folder was not serialised by this serialiser.

Return type:

None

serialise_all(obj, target_folder, overwrite=False)

Serialise the entire object into the target folder.

This finds the serialisable parts (attributes) of the object and calls the same method on them recursively. It also finds the raw attributes (if any) and serialises them.

Parameters:
  • obj (Serialisable) – The object to serialise.

  • target_folder (str) – The target folder.

  • overwrite (bool) – Whether to allow overwriting. Defaults to False.

Raises:

IllegalSchemaException – If there’s multiple parts with the same name or a file already exists.

Return type:

None

classmethod deserialise_manually(folder_path, man_cls_path, **init_kwargs)
Parameters:
  • folder_path (str)

  • man_cls_path (str)

Return type:

medcat2.storage.serialisables.Serialisable

deserialise_all(folder_path, ignore_folders_prefix=set(), ignore_folders_suffix=set(), **kwargs)

Deserialise contents of folder.

Additional initialisation keyword arguments can be provided if needed.

This loads both the raw attributes for this object as well as the serialisable parts / attributes recursively.

Parameters:
  • folder_path (str) – The folder path.

  • ignore_folders_prefix (set[str]) – The prefixes of folders to ignore.

  • ignore_folders_suffix (set[str]) – The suffixes of folders to ignore.

Returns:

Serialisable – The resulting object.

Return type:

medcat2.storage.serialisables.Serialisable

__slots__ = ()
class medcat2.storage.serialisers.AvailableSerialisers

Bases: enum.Enum

Describes the available serialisers.

dill
json
write_to(file_path)
Parameters:

file_path (str)

Return type:

None

classmethod from_file(file_path)
Parameters:

file_path (str)

Return type:

AvailableSerialisers

__new__(value)
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

classmethod _missing_(value)
__repr__()
__str__()
__dir__()

Returns all members and all public methods

__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

__hash__()
__reduce_ex__(proto)
name()

The name of the Enum member.

value()

The value of the Enum member.

class medcat2.storage.serialisers.DillSerialiser

Bases: Serialiser

The dill based serialiser.

ser_type

The serialiser type.

serialise(raw_parts, target_file)

Serialise the raw attributes / objects.

Parameters:
  • raw_parts (dict[str, Any]) – The raw objects to serialise.

  • target_file (str) – The file name to write to.

Return type:

None

deserialise(target_file)

Deserialise data written to the specified file.

Parameters:

target_file (str) – The file to read from.

Returns:

dict[str, Any] – The deserialised raw attributes / objects.

Return type:

dict[str, Any]

RAW_FILE = 'raw_dict.dat'
classmethod get_ser_type_file(folder)
Parameters:

folder (str)

Return type:

str

save_ser_type_file(folder)

Save the serialiser type into the specified folder.

Parameters:

folder (str) – The folder to use.

Return type:

None

classmethod get_manually_serialised_path(folder)
Parameters:

folder (str)

Return type:

Optional[str]

check_ser_type(folder)

Check that the folder contains data serialised by this serialiser.

Parameters:

folder (str) – Target folder.

Raises:

TypeError – If the folder was not serialised by this serialiser.

Return type:

None

serialise_all(obj, target_folder, overwrite=False)

Serialise the entire object into the target folder.

This finds the serialisable parts (attributes) of the object and calls the same method on them recursively. It also finds the raw attributes (if any) and serialises them.

Parameters:
  • obj (Serialisable) – The object to serialise.

  • target_folder (str) – The target folder.

  • overwrite (bool) – Whether to allow overwriting. Defaults to False.

Raises:

IllegalSchemaException – If there’s multiple parts with the same name or a file already exists.

Return type:

None

classmethod deserialise_manually(folder_path, man_cls_path, **init_kwargs)
Parameters:
  • folder_path (str)

  • man_cls_path (str)

Return type:

medcat2.storage.serialisables.Serialisable

deserialise_all(folder_path, ignore_folders_prefix=set(), ignore_folders_suffix=set(), **kwargs)

Deserialise contents of folder.

Additional initialisation keyword arguments can be provided if needed.

This loads both the raw attributes for this object as well as the serialisable parts / attributes recursively.

Parameters:
  • folder_path (str) – The folder path.

  • ignore_folders_prefix (set[str]) – The prefixes of folders to ignore.

  • ignore_folders_suffix (set[str]) – The suffixes of folders to ignore.

Returns:

Serialisable – The resulting object.

Return type:

medcat2.storage.serialisables.Serialisable

__slots__ = ()
medcat2.storage.serialisers._DEF_SER
medcat2.storage.serialisers.get_serialiser(serialiser_type=_DEF_SER)

Get the serialiser based on the type specified.

Parameters:

serialiser_type (Union[str, AvailableSerialisers], optional) – The required type. Defaults to ‘dill’.

Raises:

ValueError – If no serialiser is found.

Returns:

Serialiser – The appropriate serialiser.

Return type:

Serialiser

medcat2.storage.serialisers.get_serialiser_type_from_folder(folder_path)

Get the serialiser type that was used to serialise data in the folder.

Parameters:

folder_path (str) – The folder in question.

Returns:

AvailableSerialisers – The serialiser type.

Return type:

AvailableSerialisers

medcat2.storage.serialisers.get_serialiser_from_folder(folder_path)

Get the serialiser that was used to serialise the data in the folder.

Parameters:

folder_path (str) – The folder in question.

Returns:

Serialiser – The appropriate serialiser.

Return type:

Serialiser

medcat2.storage.serialisers.serialise(serialiser_type, obj, target_folder, overwrite=False)

Serialise an object based on the specified serialiser type.

Parameters:
  • serialiser_type (Union[str, AvailableSerialisers]) – The serialiser type.

  • obj (Serialisable) – The object to serialise.

  • target_folder (str) – The folder to serialise into.

  • overwrite (bool) – Whether to allow overwriting. Defaults to False.

Return type:

None

medcat2.storage.serialisers.deserialise(folder_path, ignore_folders_prefix=set(), ignore_folders_suffix=set(), **init_kwargs)

Deserialise contents of a folder.

Extra init keyword arguments can be provided if needed.

This method finds the serialiser to be used based on the files on disk.

Parameters:
  • folder_path (str) – The folder to serialise.

  • ignore_folders_prefix (set[str]) – The prefixes of folders to ignore.

  • ignore_folders_suffix (set[str]) – The suffixes of folders to ignore.

Returns:

Serialisable – The deserialised object.

Return type:

medcat2.storage.serialisables.Serialisable