medcat.model_creation.preprocess_snomed
Attributes
Exceptions
Inappropriate argument value (of correct type). |
Classes
Generic enumeration. |
|
Generic enumeration. |
|
Generic enumeration. |
|
Pre-process SNOMED CT release files. |
Functions
|
|
|
Retrieves all the children of a given SNOMED CT ID (SCTID) from a given |
|
This method uses the output from Snomed.map_snomed2icd10 or |
|
Module Contents
- medcat.model_creation.preprocess_snomed.parse_file(filename, first_row_header=True, columns=None)
- medcat.model_creation.preprocess_snomed.get_all_children(sctid, pt2ch)
Retrieves all the children of a given SNOMED CT ID (SCTID) from a given parent-to-child mapping (pt2ch) via the “IS A” relationship. pt2ch can be found in a MedCAT model in the additional info via the call: cat.cdb.addl_info[‘pt2ch’]
- Parameters:
sctid (int) – The SCTID whose children need to be retrieved.
pt2ch (dict) – A dictionary containing the parent-to-child elationships in the form {parent_sctid: [list of child sctids]}.
- Returns:
list – A list of unique SCTIDs that are children of the given SCTID.
- medcat.model_creation.preprocess_snomed.get_direct_refset_mapping(in_dict)
This method uses the output from Snomed.map_snomed2icd10 or Snomed.map_snomed2opcs4 and removes the metadata and maps each SNOMED CUI to the prioritised list of the target ontology CUIs.
The input dict is expected to be in the following format: - Keys are SnomedCT CUIs - The values are lists of dictionaries, each list item (at least)
Has a key ‘code’ that specifies the target onotlogy CUI
Has a key ‘mapPriority’ that specifies the priority
- Parameters:
in_dict (dict) – The input dict.
- Returns:
dict – The map from Snomed CUI to list of priorities list of target ontology CUIs.
- Return type:
dict
- medcat.model_creation.preprocess_snomed._IGNORE_TAG = '##IGNORE-THIS##'
- class medcat.model_creation.preprocess_snomed.RefSetFileType
Bases:
enum.EnumGeneric enumeration.
Derive from this class to define new enumerations.
- concept
- description
- relationship
- refset
- __new__(value)
- _generate_next_value_(start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None
- classmethod _missing_(value)
- __repr__()
- __str__()
- __dir__()
Returns all members and all public methods
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- __hash__()
- __reduce_ex__(proto)
- name()
The name of the Enum member.
- value()
The value of the Enum member.
- class medcat.model_creation.preprocess_snomed.FileFormatDescriptor
- concept: str
- description: str
- relationship: str
- refset: str
- common_prefix: str = 'sct2_'
- classmethod ignore_all()
- Return type:
- get_file_per_type(file_type)
- Parameters:
file_type (RefSetFileType)
- Return type:
str
- _get_raw(file_type)
- Parameters:
file_type (RefSetFileType)
- Return type:
str
- get_concept()
- Return type:
str
- get_description()
- Return type:
str
- get_relationship()
- Return type:
str
- get_refset()
- Return type:
str
- class medcat.model_creation.preprocess_snomed.ExtensionDescription
- exp_name_in_folder: str
- exp_files: FileFormatDescriptor
- exp_2nd_part_in_folder: str | None = None
- medcat.model_creation.preprocess_snomed.SNOMED_FOLDER_NAME_PATTERN
- medcat.model_creation.preprocess_snomed.PER_FILE_TYPE_PATHS
- class medcat.model_creation.preprocess_snomed.SupportedExtension
Bases:
enum.EnumGeneric enumeration.
Derive from this class to define new enumerations.
- INTERNATIONAL
- UK_CLINICAL
- UK_CLINICAL_REFSET
- UK_EDITION
- UK_DRUG
- AU
- __new__(value)
- _generate_next_value_(start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None
- classmethod _missing_(value)
- __repr__()
- __str__()
- __dir__()
Returns all members and all public methods
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- __hash__()
- __reduce_ex__(proto)
- name()
The name of the Enum member.
- value()
The value of the Enum member.
- class medcat.model_creation.preprocess_snomed.BundleDescriptor
- extensions: List[SupportedExtension]
- ignores: Dict[RefSetFileType, List[SupportedExtension]]
- has_invalid(ext, file_types)
- Parameters:
ext (SupportedExtension)
file_types (Tuple[RefSetFileType])
- Return type:
bool
- class medcat.model_creation.preprocess_snomed.SupportedBundles
Bases:
enum.EnumGeneric enumeration.
Derive from this class to define new enumerations.
- UK_CLIN
- UK_DRUG_EXT
- __new__(value)
- _generate_next_value_(start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None
- classmethod _missing_(value)
- __repr__()
- __str__()
- __dir__()
Returns all members and all public methods
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- __hash__()
- __reduce_ex__(proto)
- name()
The name of the Enum member.
- value()
The value of the Enum member.
- medcat.model_creation.preprocess_snomed.match_partials_with_folders(exp_names, folder_names, _group_nr1=1, _group_nr2=2)
- Parameters:
exp_names (List[Tuple[str, Optional[str]]])
folder_names (List[str])
_group_nr1 (int)
_group_nr2 (int)
- Return type:
bool
- class medcat.model_creation.preprocess_snomed.Snomed(data_path)
Pre-process SNOMED CT release files.
This class is used to create a SNOMED CT concept DataFrame ready for MedCAT CDB creation.
- data_path
Path to the unzipped SNOMED CT folder.
- Type:
str
- release
Release of SNOMED CT folder.
- Type:
str
- uk_ext
Specifies whether the version is a SNOMED UK extension released after 2021. Defaults to False.
- Type:
bool, optional
- uk_drug_ext
Specifies whether the version is a SNOMED UK drug extension. Defaults to False.
- Type:
bool, optional
- au_ext
Specifies whether the version is a AU release. Defaults to False.
- Type:
bool, optional
- NO_VERSION_DETECTED = 'N/A'
- __init__(data_path)
- data_path
- bundle = None
- classmethod _determine_bundle(data_path)
- Return type:
Optional[SupportedBundles]
- _set_extension(release, extension)
- Parameters:
release (str)
extension (SupportedExtension)
- Return type:
None
- classmethod _determine_extension(folder_path, _group_nr1=1, _group_nr2=2)
- Parameters:
folder_path (str)
_group_nr1 (int)
_group_nr2 (int)
- Return type:
- classmethod _determine_release(folder_path, strict=True, _group_nr=3, _keep_chars=8)
- Parameters:
folder_path (str)
strict (bool)
_group_nr (int)
_keep_chars (int)
- Return type:
str
- to_concept_df()
Create a SNOMED CT concept DataFrame.
Creates a SNOMED CT concept DataFrame ready for MEDCAT CDB creation. Checks if the version is a UK extension release and sets the correct file names for the concept and description snapshots accordingly. Additionally, handles the divergent release format of the UK Drug Extension >v2021 with the uk_drug_ext variable.
- Returns:
pandas.DataFrame – SNOMED CT concept DataFrame.
- list_all_relationships()
List all SNOMED CT relationships.
SNOMED CT provides a rich set of inter-relationships between concepts.
- Returns:
list – List of all SNOMED CT relationships.
- relationship2json(relationshipcode, output_jsonfile)
Convert a single relationship map structure to JSON file.
- Parameters:
relationshipcode (str) – A single SCTID or unique concept identifier of the relationship type.
output_jsonfile (str) – Name of JSON file output.
- Returns:
file – JSON file of relationship mapping.
- map_snomed2icd10()
This function maps SNOMED CT concepts to ICD-10 codes using the refset mappings provided in the SNOMED CT release package.
- Returns:
dict – A dictionary containing the SNOMED CT to ICD-10 mappings including metadata.
- map_snomed2opcs4()
This function maps SNOMED CT concepts to OPCS-4 codes using the refset mappings provided in the SNOMED CT release package.
- Then it calls the internal function _map_snomed2refset() to get the
DataFrame containing the OPCS-4 mappings.
- The function then converts the DataFrame to a dictionary using the
internal function _refset_df2dict()
- Raises:
AttributeError – If OPCS-4 mappings aren’t available.
- Returns:
dict – A dictionary containing the SNOMED CT to OPCS-4 mappings including metadata.
- Return type:
dict
- _check_path_and_release()
This function checks the path and release of the SNOMED CT data provided.
It looks for the “Snapshot” folder within the data path, and if it’s not found, it looks for any folder containing the name “SnomedCT”. It then stores the path and release in separate lists. If no valid paths are found, it raises a FileNotFoundError.
- Returns:
tuple – a tuple containing two lists, the first one is a list of the paths where the data is located and the second is a list of the releases of the data.
- Raises:
FileNotFoundError – If the path to the SNOMED CT directory is incorrect.
- _refset_df2dict(refset_df)
This function takes a SNOMED refset DataFrame as an input and converts it into a dictionary.
The DataFrame should contain the columns: ‘referencedComponentId’,’mapTarget’,’mapGroup’,’mapPriority’,’mapRule’,’mapAdvice’.
- Parameters:
refset_df (pd.DataFrame) – DataFrame containing the refset data
- Returns:
dict – mapping from SNOMED CT codes as key and the refset metadata list of dictionaries as values.
- Return type:
dict
- _map_snomed2refset()
Maps SNOMED CT concepts to refset mappings provided in the SNOMED CT release package.
This function maps SNOMED CT concepts using the refset mappings in the Snapshot/Refset/Map directory. The refset mappings can either be ICD-10 codes in international releases or OPCS4 codes for SNOMED UK_extension, if available.
- Returns:
pd.DataFrame – Dataframe containing SNOMED CT to refset mappings and metadata.
OR
tuple – Tuple of dataframes containing SNOMED CT to refset mappings and metadata (ICD-10, OPCS4), if uk_ext is True.
- exception medcat.model_creation.preprocess_snomed.UnkownSnomedReleaseException(*args)
Bases:
ValueErrorInappropriate argument value (of correct type).
- __init__(*args)
Initialize self. See help(type(self)) for accurate signature.
- Return type:
None
- class __cause__
exception cause
- class __context__
exception context
- __delattr__()
Implement delattr(self, name).
- __dir__()
Default dir() implementation.
- __eq__()
Return self==value.
- __format__()
Default object formatter.
- __ge__()
Return self>=value.
- __getattribute__()
Return getattr(self, name).
- __gt__()
Return self>value.
- __hash__()
Return hash(self).
- __le__()
Return self<=value.
- __lt__()
Return self<value.
- __ne__()
Return self!=value.
- __new__()
Create and return a new object. See help(type) for accurate signature.
- __reduce__()
- __reduce_ex__()
Helper for pickle.
- __repr__()
Return repr(self).
- __setattr__()
Implement setattr(self, name, value).
- __setstate__()
- __sizeof__()
Size of object in memory, in bytes.
- __str__()
Return str(self).
- __subclasshook__()
Abstract classes can override this to customize issubclass().
This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).
- class __suppress_context__
- class __traceback__
- class args
- with_traceback()
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.