triplets.rdf_parser module

triplets.rdf_parser.clean_ID(ID)[source]

Remove common CIM ID prefixes from a string.

Parameters:

ID (str) – The input ID string to clean.

Returns:

The ID with prefixes (’urn:uuid:’, ‘#_’, ‘_’) removed from the start.

Return type:

str

Notes

  • Sequentially removes ‘urn:uuid:’, ‘#_’, and ‘_’ prefixes using removeprefix.

  • TODO: Verify if these characters are absent in UUIDs to ensure safe removal.

Examples

>>> clean_ID("urn:uuid:1234")
'1234'
>>> clean_ID("#_abc")
'abc'
triplets.rdf_parser.export_to_cimxml(data, rdf_map=None, namespace_map={'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}, class_KEY='Type', export_undefined=True, export_type='xml_per_instance_zip_per_xml', global_zip_filename='Export.zip', debug=False, export_to_memory=False, export_base_path='', comment=None, max_workers=None)[source]

Export a triplet dataset to CIM RDF XML or ZIP files.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • rdf_map (dict, optional) – Dictionary mapping classes and keys to RDF namespaces and attributes.

  • namespace_map (dict, optional) – Dictionary of namespace prefixes and URIs (default includes RDF namespace).

  • class_KEY (str, optional) – Key used to identify object types (default is ‘Type’).

  • export_undefined (bool, optional) – If True, export undefined classes and tags with default settings (default is True).

  • export_type (str, optional) – Export format: ‘xml_per_instance’, ‘xml_per_instance_zip_per_all’, or ‘xml_per_instance_zip_per_xml’ (default is ‘xml_per_instance_zip_per_xml’).

  • global_zip_filename (str, optional) – Filename for the global ZIP archive (default is ‘Export.zip’).

  • debug (bool, optional) – If True, log timing information for debugging (default is False).

  • export_to_memory (bool, optional) – If True, return file objects in memory; otherwise, save to disk (default is False).

  • export_base_path (str, optional) – Directory path to save exported files (default is empty, uses current directory).

  • comment (str, optional) – Comment to include in the XML output (default is None).

  • max_workers (int, optional) – Number of worker processes for parallel processing (default is None).

Returns:

List of file-like objects (if export_to_memory=True) or filenames (if export_to_memory=False).

Return type:

list

Examples

>>> files = data.export_to_cimxml(rdf_map, export_type="xml_per_instance")
triplets.rdf_parser.export_to_excel(data, path=None)[source]

Export triplet data to an Excel file, with each type on a separate sheet.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • path (str, optional) – Directory path to save the Excel file (default is current working directory).

Notes

  • Uses ‘label’ key to determine the filename for each INSTANCE_ID.

  • Each object type is exported to a separate sheet.

  • TODO: Add support for XlsxWriter properties for better formatting.

Examples

>>> data.export_to_excel("output_dir")
triplets.rdf_parser.export_to_networkx(data)[source]

Convert a triplet dataset to a NetworkX graph.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.

Returns:

A NetworkX graph with nodes (IDs with Type attributes) and edges (references).

Return type:

networkx.Graph

Notes

  • TODO: Add all node data and support additional graph export formats.

Examples

>>> graph = data.to_networkx()
triplets.rdf_parser.filter_triplet_by_type(triplet, type)[source]

Filter triplet dataset by objects of a specific type.

Parameters:
  • triplet (pandas.DataFrame) – Triplet dataset containing RDF data.

  • type (str) – Object type to filter by (e.g., ‘ACLineSegment’).

Returns:

Filtered triplet dataset containing only objects of the specified type.

Return type:

pandas.DataFrame

Examples

>>> filtered = filter_triplet_by_type(data, "ACLineSegment")
triplets.rdf_parser.find_all_xml(list_of_paths_to_zip_globalzip_xml, debug=False)[source]

Extract XML files from a list of paths or ZIP archives.

Parameters:
  • list_of_paths_to_zip_globalzip_xml (list) – List of paths to XML files, ZIP archives, or file-like objects.

  • debug (bool, optional) – If True, log file processing details for debugging (default is False).

Returns:

List of file-like objects for XML files found in the input paths or ZIPs.

Return type:

list

Notes

  • Supports XML, RDF, and ZIP files; other file types are logged as unsupported.

  • TODO: Add support for random folders.

Examples

>>> xml_files = find_all_xml(["data.zip", "file.xml"])
triplets.rdf_parser.generate_xml(instance_data, rdf_map=None, namespace_map={'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}, class_KEY='Type', export_undefined=True, comment=None, debug=False)[source]

Generate an RDF XML file from a triplet dataset instance.

Parameters:
  • instance_data (pandas.DataFrame) – Triplet dataset for a single instance.

  • rdf_map (dict, optional) – Dictionary mapping classes and keys to RDF namespaces and attributes.

  • namespace_map (dict, optional) – Dictionary of namespace prefixes and URIs (default includes RDF namespace).

  • class_KEY (str, optional) – Key used to identify object types (default is ‘Type’).

  • export_undefined (bool, optional) – If True, export undefined classes and tags with default settings (default is True).

  • comment (str, optional) – Comment to include in the XML output (default is None).

  • debug (bool, optional) – If True, log timing information for debugging (default is False).

Returns:

Dictionary with ‘filename’ (str) and ‘file’ (bytes) containing the XML output.

Return type:

dict

Examples

>>> xml_data = generate_xml(instance_data, rdf_map, namespace_map)
triplets.rdf_parser.get_object_data(data, object_UUID)[source]

Retrieve data for a specific object by its UUID.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • object_UUID (str) – UUID of the object to retrieve.

Returns:

Series with keys as index and values for the specified object.

Return type:

pandas.Series

Examples

>>> obj_data = data.get_object_data("uuid1")
triplets.rdf_parser.get_qname(namespace, tag=None)[source]

Generate a QName for a given namespace and tag, with caching.

Parameters:
  • namespace (str) – The namespace URI.

  • tag (str, optional) – The tag name (default is None).

Returns:

The qualified name object for the namespace and tag.

Return type:

lxml.etree.QName

Examples

>>> qname = get_qname("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "RDF")
triplets.rdf_parser.key_tableview(data, key_name, string_to_number=True)[source]

Create a table view of all objects with a specified key.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • key_name (str) – The key to filter objects by (e.g., ‘GeneratingUnit.maxOperatingP’).

  • string_to_number (bool, optional) – If True, convert columns containing numbers to numeric types (default is True).

Returns:

Pivoted DataFrame with IDs as index and keys as columns, or None if no data is found.

Return type:

pandas.DataFrame or None

Examples

>>> table = data.key_tableview("GeneratingUnit.maxOperatingP")
triplets.rdf_parser.load_RDF_objects_from_XML(path_or_fileobject, debug=False)[source]

Parse an XML file and return an iterator of RDF objects with instance ID and namespace map.

Parameters:
  • path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.

  • debug (bool, optional) – If True, log timing information for debugging (default is False).

Returns:

A tuple containing: - RDF_objects (iterator): Iterator over RDF objects in the XML. - instance_id (str): Unique UUID for the loaded instance. - namespace_map (dict): Dictionary of namespace prefixes and URIs.

Return type:

tuple

Examples

>>> rdf_objects, instance_id, ns_map = load_RDF_objects_from_XML("file.xml")
triplets.rdf_parser.load_RDF_to_dataframe(path_or_fileobject, debug=False, data_type='string')[source]

Parse a single RDF XML file into a Pandas DataFrame.

Parameters:
  • path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.

  • debug (bool, optional) – If True, log timing information for debugging (default is False).

  • data_type (str, optional) – Data type for DataFrame columns (default is ‘string’).

Returns:

DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’, ‘INSTANCE_ID’] representing the triplestore.

Return type:

pandas.DataFrame

Examples

>>> df = load_RDF_to_dataframe("file.xml")
triplets.rdf_parser.load_RDF_to_list(path_or_fileobject, debug=False, keep_ns=False)[source]

Parse a single RDF XML file into a triplestore list.

Parameters:
  • path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.

  • debug (bool, optional) – If True, log timing information for debugging (default is False).

  • keep_ns (bool, optional) – If True, retain namespace information in the output (default is False, unused).

Returns:

List of tuples in the format (ID, KEY, VALUE, INSTANCE_ID) representing the triplestore.

Return type:

list

Examples

>>> triples = load_RDF_to_list("file.xml")
triplets.rdf_parser.load_all_to_dataframe(list_of_paths_to_zip_globalzip_xml, debug=False, data_type='string', max_workers=None)[source]

Parse multiple RDF XML files or ZIP archives into a single Pandas DataFrame.

Parameters:
  • list_of_paths_to_zip_globalzip_xml (list or str) – List of paths to XML files, ZIP archives, or a single path.

  • debug (bool, optional) – If True, log timing information for debugging (default is False).

  • data_type (str, optional) – Data type for DataFrame columns (default is ‘string’).

  • max_workers (int, optional) – Number of worker threads for parallel processing (default is None).

Returns:

DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’, ‘INSTANCE_ID’] containing all parsed data.

Return type:

pandas.DataFrame

Examples

>>> df = load_all_to_dataframe(["data.zip", "file.xml"], max_workers=4)
triplets.rdf_parser.print_duration(text, start_time)[source]

Print duration between now and start time.

Parameters:
  • text (str) – Description of the timed operation to include in the log message.

  • start_time (datetime.datetime) – Start time of the operation.

Returns:

A tuple containing: - duration (timedelta): Time elapsed since start_time. - end_time (datetime.datetime): Current time when the function is called.

Return type:

tuple

Examples

>>> start = datetime.datetime.now()
>>> duration, end = print_duration("Operation completed", start)
triplets.rdf_parser.print_triplet_diff(old_data, new_data, file_id_object='Distribution', file_id_key='label', exclude_objects=None)[source]

Print a human-readable diff of two triplet datasets.

Parameters:
  • old_data (pandas.DataFrame) – Original triplet dataset.

  • new_data (pandas.DataFrame) – New triplet dataset to compare against.

  • file_id_object (str, optional) – Object type containing file identifiers (default is ‘Distribution’).

  • file_id_key (str, optional) – Key containing file identifiers (default is ‘label’).

  • exclude_objects (list, optional) – List of object types to exclude from the diff (default is None).

Notes

  • Outputs a diff format showing removed, added, and changed objects.

  • Nice diff viewer https://diffy.org/

  • TODO: Add name field for better reporting with Type.

Examples

>>> print_triplet_diff(old_data, new_data, exclude_objects=["NamespaceMap"])
triplets.rdf_parser.references(data, ID, levels=1)[source]

Retrieve all references (to and from) a specified object.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • ID (str) – ID of the object to find references for.

  • levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

DataFrame containing triplets of all references to and from the object.

Return type:

pandas.DataFrame

Examples

>>> refs = data.references("99722373_VL_TN1", levels=2)
triplets.rdf_parser.references_all(data)[source]

Find all unique references (links) in the dataset.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.

Returns:

DataFrame with columns [‘ID_FROM’, ‘KEY’, ‘ID_TO’] representing all references.

Return type:

pandas.DataFrame

Notes

  • Does not consider INSTANCE_ID in reference matching.

Examples

>>> refs = data.references_all()
triplets.rdf_parser.references_from(data, reference, levels=1)[source]

Retrieve all objects a specified object points to.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • reference (str) – ID of the reference object.

  • levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

DataFrame containing triplets of objects referenced by the input, with a ‘level’ column.

Return type:

pandas.DataFrame

Notes

  • TODO: Add the key on which the connection was made.

Examples

>>> refs = data.references_from("99722373_VL_TN1", levels=2)
triplets.rdf_parser.references_from_simple(data, reference, columns=['Type'])[source]

Create a simplified table view of objects a specified object refers to.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • reference (str) – ID of the object to find references from.

  • columns (list, optional) – Columns to include in the output table (default is [‘Type’]).

Returns:

Pivoted DataFrame with IDs of referenced objects and specified columns.

Return type:

pandas.DataFrame

Examples

>>> table = data.references_from_simple("99722373_VL_TN1")
triplets.rdf_parser.references_simple(data, reference, columns=None, levels=1)[source]

Create a simplified table view of all references to and from a specified object.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • reference (str) – ID of the object to find references for.

  • columns (list, optional) – Columns to include in the output table (default is [‘Type’, ‘IdentifiedObject.name’] if available).

  • levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

Pivoted DataFrame with IDs, specified columns, and reference levels.

Return type:

pandas.DataFrame

Examples

>>> table = data.references_simple("99722373_VL_TN1", columns=["Type"])
triplets.rdf_parser.references_to(data, reference, levels=1)[source]

Retrieve all objects pointing to a specified reference object.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • reference (str) – ID of the reference object.

  • levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

DataFrame containing triplets of objects pointing to the reference, with a ‘level’ column.

Return type:

pandas.DataFrame

Notes

  • TODO: Add the key on which the connection was made.

Examples

>>> refs = data.references_to("99722373_VL_TN1", levels=2)
triplets.rdf_parser.references_to_simple(data, reference, columns=['Type'])[source]

Create a simplified table view of objects referencing a specified object.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • reference (str) – ID of the object to find references to.

  • columns (list, optional) – Columns to include in the output table (default is [‘Type’]).

Returns:

Pivoted DataFrame with IDs of referencing objects and specified columns.

Return type:

pandas.DataFrame

Examples

>>> table = data.references_to_simple("99722373_VL_TN1")
triplets.rdf_parser.remove_prefix(original_string, prefix_string)[source]

Remove a specified prefix from a string.

Parameters:
  • original_string (str) – The input string to process.

  • prefix_string (str) – The prefix to remove from the input string.

Returns:

The input string with the prefix removed if present; otherwise, the original string.

Return type:

str

Examples

>>> remove_prefix("urn:uuid:1234", "urn:uuid:")
'1234'
>>> remove_prefix("abc", "xyz")
'abc'
triplets.rdf_parser.remove_triplet_from_triplet(from_triplet, what_triplet, columns=['ID', 'KEY', 'VALUE'])[source]

Remove triplets from one dataset that match another.

Parameters:
  • from_triplet (pandas.DataFrame) – Original triplet dataset.

  • what_triplet (pandas.DataFrame) – Triplet dataset to remove from the original.

  • columns (list, optional) – Columns to match for removal (default is [‘ID’, ‘KEY’, ‘VALUE’]).

Returns:

Dataset with matching triplets removed.

Return type:

pandas.DataFrame

Examples

>>> result = remove_triplet_from_triplet(data, to_remove)
triplets.rdf_parser.set_VALUE_at_KEY(data, key, value)[source]

Set the value for all instances of a specified key.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • key (str) – The key to update.

  • value (str) – The new value to set for the specified key.

Notes

  • TODO: Add debug logging for key, initial value, and new value.

  • TODO: Store changes in a changes DataFrame.

Examples

>>> data.set_VALUE_at_KEY("label", "new_label")
triplets.rdf_parser.set_VALUE_at_KEY_and_ID(data, key, value, id)[source]

Set the value for a specific key and ID.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • key (str) – The key to update.

  • value (str) – The new value to set.

  • id (str) – The ID of the object to update.

Examples

>>> data.set_VALUE_at_KEY_and_ID("label", "new_label", "uuid1")
triplets.rdf_parser.tableview_to_triplet(data)[source]

Convert a table view back to a triplet format.

Parameters:

data (pandas.DataFrame) – Pivoted DataFrame (table view) to convert.

Returns:

Triplet DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’].

Return type:

pandas.DataFrame

Notes

  • TODO: Ensure this is only used on valid table views.

Examples

>>> triplet = tableview_to_triplet(table_view)
triplets.rdf_parser.triplet_diff(old_data, new_data)[source]

Compute the difference between two triplet datasets.

Parameters:
  • old_data (pandas.DataFrame) – Original triplet dataset.

  • new_data (pandas.DataFrame) – New triplet dataset to compare against.

Returns:

DataFrame containing triplets unique to old_data or new_data, with an ‘_merge’ column indicating ‘left_only’ (in old_data) or ‘right_only’ (in new_data).

Return type:

pandas.DataFrame

Examples

>>> diff = triplet_diff(old_data, new_data)
triplets.rdf_parser.type_tableview(data, type_name, string_to_number=True, type_key='Type')[source]

Create a table view of all objects of a specified type.

Parameters:
  • data (pandas.DataFrame) – Triplet dataset containing RDF data.

  • type_name (str) – The type of objects to filter (e.g., ‘ACLineSegment’).

  • string_to_number (bool, optional) – If True, convert columns containing numbers to numeric types (default is True).

  • type_key (str, optional) – Key used to identify object types in the dataset (default is ‘Type’).

Returns:

Pivoted DataFrame with IDs as index and keys as columns, or None if no data is found.

Return type:

pandas.DataFrame or None

Examples

>>> table = data.type_tableview("ACLineSegment")
triplets.rdf_parser.types_dict(data)[source]

Return a dictionary of object types and their occurrence counts.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.

Returns:

Dictionary with object types as keys and their counts as values.

Return type:

dict

Examples

>>> types = data.types_dict()
>>> print(types)
{'ACLineSegment': 10, 'PowerTransformer': 5, ...}
triplets.rdf_parser.update_triplet_from_tableview(data, tableview, update=True, add=True, instance_id=None)[source]

Update or add triplets from a table view.

Parameters:
  • data (pandas.DataFrame) – Original triplet dataset to update.

  • tableview (pandas.DataFrame) – Table view containing updates or new data.

  • update (bool, optional) – If True, update existing ID-KEY pairs (default is True).

  • add (bool, optional) – If True, add new ID-KEY pairs (default is True).

  • instance_id (str, optional) – Instance ID to assign to new triplets (default is None).

Returns:

Updated triplet dataset.

Return type:

pandas.DataFrame

Examples

>>> updated_data = data.update_triplet_from_tableview(table_view, instance_id="uuid1")
triplets.rdf_parser.update_triplet_from_triplet(data, update_data, update=True, add=True)[source]

Update or add triplets from another triplet dataset.

Parameters:
  • data (pandas.DataFrame) – Original triplet dataset to update.

  • update_data (pandas.DataFrame) – Triplet dataset containing updates or new data.

  • update (bool, optional) – If True, update existing ID-KEY pairs (default is True).

  • add (bool, optional) – If True, add new ID-KEY pairs (default is True).

Returns:

Updated triplet dataset.

Return type:

pandas.DataFrame

Notes

  • TODO: Add a changes DataFrame to track modifications.

  • TODO: Support updating ID and KEY fields.

Examples

>>> updated_data = data.update_triplet_from_triplet(update_data)