medcat.model_creation.preprocess_snomed

Attributes

_IGNORE_TAG

SNOMED_FOLDER_NAME_PATTERN

PER_FILE_TYPE_PATHS

Exceptions

UnkownSnomedReleaseException

Inappropriate argument value (of correct type).

Classes

RefSetFileType

Generic enumeration.

FileFormatDescriptor

ExtensionDescription

SupportedExtension

Generic enumeration.

BundleDescriptor

SupportedBundles

Generic enumeration.

Snomed

Pre-process SNOMED CT release files.

Functions

parse_file(filename[, first_row_header, columns])

get_all_children(sctid, pt2ch)

Retrieves all the children of a given SNOMED CT ID (SCTID) from a given

get_direct_refset_mapping(in_dict)

This method uses the output from Snomed.map_snomed2icd10 or

match_partials_with_folders(exp_names, folder_names[, ...])

Module Contents

medcat.model_creation.preprocess_snomed.parse_file(filename, first_row_header=True, columns=None)
medcat.model_creation.preprocess_snomed.get_all_children(sctid, pt2ch)

Retrieves all the children of a given SNOMED CT ID (SCTID) from a given parent-to-child mapping (pt2ch) via the “IS A” relationship. pt2ch can be found in a MedCAT model in the additional info via the call: cat.cdb.addl_info[‘pt2ch’]

Parameters:
  • sctid (int) – The SCTID whose children need to be retrieved.

  • pt2ch (dict) – A dictionary containing the parent-to-child elationships in the form {parent_sctid: [list of child sctids]}.

Returns:

list – A list of unique SCTIDs that are children of the given SCTID.

medcat.model_creation.preprocess_snomed.get_direct_refset_mapping(in_dict)

This method uses the output from Snomed.map_snomed2icd10 or Snomed.map_snomed2opcs4 and removes the metadata and maps each SNOMED CUI to the prioritised list of the target ontology CUIs.

The input dict is expected to be in the following format: - Keys are SnomedCT CUIs - The values are lists of dictionaries, each list item (at least)

  • Has a key ‘code’ that specifies the target onotlogy CUI

  • Has a key ‘mapPriority’ that specifies the priority

Parameters:

in_dict (dict) – The input dict.

Returns:

dict – The map from Snomed CUI to list of priorities list of target ontology CUIs.

Return type:

dict

medcat.model_creation.preprocess_snomed._IGNORE_TAG = '##IGNORE-THIS##'
class medcat.model_creation.preprocess_snomed.RefSetFileType

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

concept
description
relationship
refset
__new__(value)
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

classmethod _missing_(value)
__repr__()
__str__()
__dir__()

Returns all members and all public methods

__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

__hash__()
__reduce_ex__(proto)
name()

The name of the Enum member.

value()

The value of the Enum member.

class medcat.model_creation.preprocess_snomed.FileFormatDescriptor
concept: str
description: str
relationship: str
refset: str
common_prefix: str = 'sct2_'
classmethod ignore_all()
Return type:

FileFormatDescriptor

get_file_per_type(file_type)
Parameters:

file_type (RefSetFileType)

Return type:

str

_get_raw(file_type)
Parameters:

file_type (RefSetFileType)

Return type:

str

get_concept()
Return type:

str

get_description()
Return type:

str

get_relationship()
Return type:

str

get_refset()
Return type:

str

class medcat.model_creation.preprocess_snomed.ExtensionDescription
exp_name_in_folder: str
exp_files: FileFormatDescriptor
exp_2nd_part_in_folder: str | None = None
medcat.model_creation.preprocess_snomed.SNOMED_FOLDER_NAME_PATTERN
medcat.model_creation.preprocess_snomed.PER_FILE_TYPE_PATHS
class medcat.model_creation.preprocess_snomed.SupportedExtension

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

INTERNATIONAL
UK_CLINICAL
UK_CLINICAL_REFSET
UK_EDITION
UK_DRUG
AU
__new__(value)
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

classmethod _missing_(value)
__repr__()
__str__()
__dir__()

Returns all members and all public methods

__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

__hash__()
__reduce_ex__(proto)
name()

The name of the Enum member.

value()

The value of the Enum member.

class medcat.model_creation.preprocess_snomed.BundleDescriptor
extensions: List[SupportedExtension]
ignores: Dict[RefSetFileType, List[SupportedExtension]]
has_invalid(ext, file_types)
Parameters:
Return type:

bool

class medcat.model_creation.preprocess_snomed.SupportedBundles

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

UK_CLIN
UK_DRUG_EXT
__new__(value)
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

classmethod _missing_(value)
__repr__()
__str__()
__dir__()

Returns all members and all public methods

__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

__hash__()
__reduce_ex__(proto)
name()

The name of the Enum member.

value()

The value of the Enum member.

medcat.model_creation.preprocess_snomed.match_partials_with_folders(exp_names, folder_names, _group_nr1=1, _group_nr2=2)
Parameters:
  • exp_names (List[Tuple[str, Optional[str]]])

  • folder_names (List[str])

  • _group_nr1 (int)

  • _group_nr2 (int)

Return type:

bool

class medcat.model_creation.preprocess_snomed.Snomed(data_path)

Pre-process SNOMED CT release files.

This class is used to create a SNOMED CT concept DataFrame ready for MedCAT CDB creation.

data_path

Path to the unzipped SNOMED CT folder.

Type:

str

release

Release of SNOMED CT folder.

Type:

str

uk_ext

Specifies whether the version is a SNOMED UK extension released after 2021. Defaults to False.

Type:

bool, optional

uk_drug_ext

Specifies whether the version is a SNOMED UK drug extension. Defaults to False.

Type:

bool, optional

au_ext

Specifies whether the version is a AU release. Defaults to False.

Type:

bool, optional

NO_VERSION_DETECTED = 'N/A'
__init__(data_path)
data_path
bundle = None
classmethod _determine_bundle(data_path)
Return type:

Optional[SupportedBundles]

_set_extension(release, extension)
Parameters:
Return type:

None

classmethod _determine_extension(folder_path, _group_nr1=1, _group_nr2=2)
Parameters:
  • folder_path (str)

  • _group_nr1 (int)

  • _group_nr2 (int)

Return type:

SupportedExtension

classmethod _determine_release(folder_path, strict=True, _group_nr=3, _keep_chars=8)
Parameters:
  • folder_path (str)

  • strict (bool)

  • _group_nr (int)

  • _keep_chars (int)

Return type:

str

to_concept_df()

Create a SNOMED CT concept DataFrame.

Creates a SNOMED CT concept DataFrame ready for MEDCAT CDB creation. Checks if the version is a UK extension release and sets the correct file names for the concept and description snapshots accordingly. Additionally, handles the divergent release format of the UK Drug Extension >v2021 with the uk_drug_ext variable.

Returns:

pandas.DataFrame – SNOMED CT concept DataFrame.

list_all_relationships()

List all SNOMED CT relationships.

SNOMED CT provides a rich set of inter-relationships between concepts.

Returns:

list – List of all SNOMED CT relationships.

relationship2json(relationshipcode, output_jsonfile)

Convert a single relationship map structure to JSON file.

Parameters:
  • relationshipcode (str) – A single SCTID or unique concept identifier of the relationship type.

  • output_jsonfile (str) – Name of JSON file output.

Returns:

file – JSON file of relationship mapping.

map_snomed2icd10()

This function maps SNOMED CT concepts to ICD-10 codes using the refset mappings provided in the SNOMED CT release package.

Returns:

dict – A dictionary containing the SNOMED CT to ICD-10 mappings including metadata.

map_snomed2opcs4()

This function maps SNOMED CT concepts to OPCS-4 codes using the refset mappings provided in the SNOMED CT release package.

Then it calls the internal function _map_snomed2refset() to get the

DataFrame containing the OPCS-4 mappings.

The function then converts the DataFrame to a dictionary using the

internal function _refset_df2dict()

Raises:

AttributeError – If OPCS-4 mappings aren’t available.

Returns:

dict – A dictionary containing the SNOMED CT to OPCS-4 mappings including metadata.

Return type:

dict

_check_path_and_release()

This function checks the path and release of the SNOMED CT data provided.

It looks for the “Snapshot” folder within the data path, and if it’s not found, it looks for any folder containing the name “SnomedCT”. It then stores the path and release in separate lists. If no valid paths are found, it raises a FileNotFoundError.

Returns:

tuple – a tuple containing two lists, the first one is a list of the paths where the data is located and the second is a list of the releases of the data.

Raises:

FileNotFoundError – If the path to the SNOMED CT directory is incorrect.

_refset_df2dict(refset_df)

This function takes a SNOMED refset DataFrame as an input and converts it into a dictionary.

The DataFrame should contain the columns: ‘referencedComponentId’,’mapTarget’,’mapGroup’,’mapPriority’,’mapRule’,’mapAdvice’.

Parameters:

refset_df (pd.DataFrame) – DataFrame containing the refset data

Returns:

dict – mapping from SNOMED CT codes as key and the refset metadata list of dictionaries as values.

Return type:

dict

_map_snomed2refset()

Maps SNOMED CT concepts to refset mappings provided in the SNOMED CT release package.

This function maps SNOMED CT concepts using the refset mappings in the Snapshot/Refset/Map directory. The refset mappings can either be ICD-10 codes in international releases or OPCS4 codes for SNOMED UK_extension, if available.

Returns:
  • pd.DataFrame – Dataframe containing SNOMED CT to refset mappings and metadata.

  • OR

  • tuple – Tuple of dataframes containing SNOMED CT to refset mappings and metadata (ICD-10, OPCS4), if uk_ext is True.

exception medcat.model_creation.preprocess_snomed.UnkownSnomedReleaseException(*args)

Bases: ValueError

Inappropriate argument value (of correct type).

__init__(*args)

Initialize self. See help(type(self)) for accurate signature.

Return type:

None

class __cause__

exception cause

class __context__

exception context

__delattr__()

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__()

Return self==value.

__format__()

Default object formatter.

__ge__()

Return self>=value.

__getattribute__()

Return getattr(self, name).

__gt__()

Return self>value.

__hash__()

Return hash(self).

__le__()

Return self<=value.

__lt__()

Return self<value.

__ne__()

Return self!=value.

__new__()

Create and return a new object. See help(type) for accurate signature.

__reduce__()
__reduce_ex__()

Helper for pickle.

__repr__()

Return repr(self).

__setattr__()

Implement setattr(self, name, value).

__setstate__()
__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

class __suppress_context__
class __traceback__
class args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.