triplets.rdf_parser module

triplets.rdf_parser.clean_ID(ID)[source]

Remove common CIM ID prefixes from a string.

Parameters:: ID (str) – The input ID string to clean.
Returns:: The ID with prefixes (’urn:uuid:’, ‘#_’, ‘_’) removed from the start.
Return type:: str

Notes

Sequentially removes ‘urn:uuid:’, ‘#_’, and ‘_’ prefixes using removeprefix.
TODO: Verify if these characters are absent in UUIDs to ensure safe removal.

Examples

>>> clean_ID("urn:uuid:1234")
'1234'
>>> clean_ID("#_abc")
'abc'

triplets.rdf_parser.export_to_cimxml(data, rdf_map=None, namespace_map={'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}, class_KEY='Type', export_undefined=True, export_type='xml_per_instance_zip_per_xml', global_zip_filename='Export.zip', debug=False, export_to_memory=False, export_base_path='', comment=None, max_workers=None)[source]

Export a triplet dataset to CIM RDF XML or ZIP files.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
rdf_map (dict, optional) – Dictionary mapping classes and keys to RDF namespaces and attributes.
namespace_map (dict, optional) – Dictionary of namespace prefixes and URIs (default includes RDF namespace).
class_KEY (str, optional) – Key used to identify object types (default is ‘Type’).
export_undefined (bool, optional) – If True, export undefined classes and tags with default settings (default is True).
export_type (str, optional) – Export format: ‘xml_per_instance’, ‘xml_per_instance_zip_per_all’, or ‘xml_per_instance_zip_per_xml’ (default is ‘xml_per_instance_zip_per_xml’).
global_zip_filename (str, optional) – Filename for the global ZIP archive (default is ‘Export.zip’).
debug (bool, optional) – If True, log timing information for debugging (default is False).
export_to_memory (bool, optional) – If True, return file objects in memory; otherwise, save to disk (default is False).
export_base_path (str, optional) – Directory path to save exported files (default is empty, uses current directory).
comment (str, optional) – Comment to include in the XML output (default is None).
max_workers (int, optional) – Number of worker processes for parallel processing (default is None).

Returns:

List of file-like objects (if export_to_memory=True) or filenames (if export_to_memory=False).

Return type:

list

Examples

>>> files = data.export_to_cimxml(rdf_map, export_type="xml_per_instance")

triplets.rdf_parser.export_to_excel(data, path=None)[source]

Export triplet data to an Excel file, with each type on a separate sheet.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
path (str, optional) – Directory path to save the Excel file (default is current working directory).

Notes

Uses ‘label’ key to determine the filename for each INSTANCE_ID.
Each object type is exported to a separate sheet.
TODO: Add support for XlsxWriter properties for better formatting.

Examples

>>> data.export_to_excel("output_dir")

triplets.rdf_parser.export_to_networkx(data)[source]

Convert a triplet dataset to a NetworkX graph.

Parameters:: data (pandas.DataFrame) – Triplet dataset containing RDF data.
Returns:: A NetworkX graph with nodes (IDs with Type attributes) and edges (references).
Return type:: networkx.Graph

Notes

TODO: Add all node data and support additional graph export formats.

Examples

>>> graph = data.to_networkx()

triplets.rdf_parser.filter_triplet_by_type(triplet, type)[source]

Filter triplet dataset by objects of a specific type.

Parameters:

triplet (pandas.DataFrame) – Triplet dataset containing RDF data.
type (str) – Object type to filter by (e.g., ‘ACLineSegment’).

Returns:

Filtered triplet dataset containing only objects of the specified type.

Return type:

pandas.DataFrame

Examples

>>> filtered = filter_triplet_by_type(data, "ACLineSegment")

triplets.rdf_parser.find_all_xml(list_of_paths_to_zip_globalzip_xml, debug=False)[source]

Extract XML files from a list of paths or ZIP archives.

Parameters:

list_of_paths_to_zip_globalzip_xml (list) – List of paths to XML files, ZIP archives, or file-like objects.
debug (bool, optional) – If True, log file processing details for debugging (default is False).

Returns:

List of file-like objects for XML files found in the input paths or ZIPs.

Return type:

list

Notes

Supports XML, RDF, and ZIP files; other file types are logged as unsupported.
TODO: Add support for random folders.

Examples

>>> xml_files = find_all_xml(["data.zip", "file.xml"])

triplets.rdf_parser.generate_xml(instance_data, rdf_map=None, namespace_map={'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}, class_KEY='Type', export_undefined=True, comment=None, debug=False)[source]

Generate an RDF XML file from a triplet dataset instance.

Parameters:

instance_data (pandas.DataFrame) – Triplet dataset for a single instance.
rdf_map (dict, optional) – Dictionary mapping classes and keys to RDF namespaces and attributes.
namespace_map (dict, optional) – Dictionary of namespace prefixes and URIs (default includes RDF namespace).
class_KEY (str, optional) – Key used to identify object types (default is ‘Type’).
export_undefined (bool, optional) – If True, export undefined classes and tags with default settings (default is True).
comment (str, optional) – Comment to include in the XML output (default is None).
debug (bool, optional) – If True, log timing information for debugging (default is False).

Returns:

Dictionary with ‘filename’ (str) and ‘file’ (bytes) containing the XML output.

Return type:

dict

Examples

>>> xml_data = generate_xml(instance_data, rdf_map, namespace_map)

triplets.rdf_parser.get_object_data(data, object_UUID)[source]

Retrieve data for a specific object by its UUID.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
object_UUID (str) – UUID of the object to retrieve.

Returns:

Series with keys as index and values for the specified object.

Return type:

pandas.Series

Examples

>>> obj_data = data.get_object_data("uuid1")

triplets.rdf_parser.get_qname(namespace, tag=None)[source]

Generate a QName for a given namespace and tag, with caching.

Parameters:

namespace (str) – The namespace URI.
tag (str, optional) – The tag name (default is None).

Returns:

The qualified name object for the namespace and tag.

Return type:

lxml.etree.QName

Examples

>>> qname = get_qname("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "RDF")

triplets.rdf_parser.key_tableview(data, key_name, string_to_number=True)[source]

Create a table view of all objects with a specified key.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
key_name (str) – The key to filter objects by (e.g., ‘GeneratingUnit.maxOperatingP’).
string_to_number (bool, optional) – If True, convert columns containing numbers to numeric types (default is True).

Returns:

Pivoted DataFrame with IDs as index and keys as columns, or None if no data is found.

Return type:

pandas.DataFrame or None

Examples

>>> table = data.key_tableview("GeneratingUnit.maxOperatingP")

triplets.rdf_parser.load_RDF_objects_from_XML(path_or_fileobject, debug=False)[source]

Parse an XML file and return an iterator of RDF objects with instance ID and namespace map.

Parameters:

path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.
debug (bool, optional) – If True, log timing information for debugging (default is False).

Returns:

A tuple containing: - RDF_objects (iterator): Iterator over RDF objects in the XML. - instance_id (str): Unique UUID for the loaded instance. - namespace_map (dict): Dictionary of namespace prefixes and URIs.

Return type:

tuple

Examples

>>> rdf_objects, instance_id, ns_map = load_RDF_objects_from_XML("file.xml")

triplets.rdf_parser.load_RDF_to_dataframe(path_or_fileobject, debug=False, data_type='string')[source]

Parse a single RDF XML file into a Pandas DataFrame.

Parameters:

path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.
debug (bool, optional) – If True, log timing information for debugging (default is False).
data_type (str, optional) – Data type for DataFrame columns (default is ‘string’).

Returns:

DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’, ‘INSTANCE_ID’] representing the triplestore.

Return type:

pandas.DataFrame

Examples

>>> df = load_RDF_to_dataframe("file.xml")

triplets.rdf_parser.load_RDF_to_list(path_or_fileobject, debug=False, keep_ns=False)[source]

Parse a single RDF XML file into a triplestore list.

Parameters:

path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.
debug (bool, optional) – If True, log timing information for debugging (default is False).
keep_ns (bool, optional) – If True, retain namespace information in the output (default is False, unused).

Returns:

List of tuples in the format (ID, KEY, VALUE, INSTANCE_ID) representing the triplestore.

Return type:

list

Examples

>>> triples = load_RDF_to_list("file.xml")

triplets.rdf_parser.load_all_to_dataframe(list_of_paths_to_zip_globalzip_xml, debug=False, data_type='string', max_workers=None)[source]

Parse multiple RDF XML files or ZIP archives into a single Pandas DataFrame.

Parameters:

list_of_paths_to_zip_globalzip_xml (list or str) – List of paths to XML files, ZIP archives, or a single path.
debug (bool, optional) – If True, log timing information for debugging (default is False).
data_type (str, optional) – Data type for DataFrame columns (default is ‘string’).
max_workers (int, optional) – Number of worker threads for parallel processing (default is None).

Returns:

DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’, ‘INSTANCE_ID’] containing all parsed data.

Return type:

pandas.DataFrame

Examples

>>> df = load_all_to_dataframe(["data.zip", "file.xml"], max_workers=4)

triplets.rdf_parser.print_duration(text, start_time)[source]

Print duration between now and start time.

Parameters:

text (str) – Description of the timed operation to include in the log message.
start_time (datetime.datetime) – Start time of the operation.

Returns:

A tuple containing: - duration (timedelta): Time elapsed since start_time. - end_time (datetime.datetime): Current time when the function is called.

Return type:

tuple

Examples

>>> start = datetime.datetime.now()
>>> duration, end = print_duration("Operation completed", start)

triplets.rdf_parser.print_triplet_diff(old_data, new_data, file_id_object='Distribution', file_id_key='label', exclude_objects=None)[source]

Print a human-readable diff of two triplet datasets.

Parameters:

old_data (pandas.DataFrame) – Original triplet dataset.
new_data (pandas.DataFrame) – New triplet dataset to compare against.
file_id_object (str, optional) – Object type containing file identifiers (default is ‘Distribution’).
file_id_key (str, optional) – Key containing file identifiers (default is ‘label’).
exclude_objects (list, optional) – List of object types to exclude from the diff (default is None).

Notes

Outputs a diff format showing removed, added, and changed objects.
Nice diff viewer https://diffy.org/
TODO: Add name field for better reporting with Type.

Examples

>>> print_triplet_diff(old_data, new_data, exclude_objects=["NamespaceMap"])

triplets.rdf_parser.references(data, ID, levels=1)[source]

Retrieve all references (to and from) a specified object.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
ID (str) – ID of the object to find references for.
levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

DataFrame containing triplets of all references to and from the object.

Return type:

pandas.DataFrame

Examples

>>> refs = data.references("99722373_VL_TN1", levels=2)

triplets.rdf_parser.references_all(data)[source]

Find all unique references (links) in the dataset.

Parameters:: data (pandas.DataFrame) – Triplet dataset containing RDF data.
Returns:: DataFrame with columns [‘ID_FROM’, ‘KEY’, ‘ID_TO’] representing all references.
Return type:: pandas.DataFrame

Notes

Does not consider INSTANCE_ID in reference matching.

Examples

>>> refs = data.references_all()

triplets.rdf_parser.references_from(data, reference, levels=1)[source]

Retrieve all objects a specified object points to.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the reference object.
levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

DataFrame containing triplets of objects referenced by the input, with a ‘level’ column.

Return type:

pandas.DataFrame

Notes

TODO: Add the key on which the connection was made.

Examples

>>> refs = data.references_from("99722373_VL_TN1", levels=2)

triplets.rdf_parser.references_from_simple(data, reference, columns=['Type'])[source]

Create a simplified table view of objects a specified object refers to.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the object to find references from.
columns (list, optional) – Columns to include in the output table (default is [‘Type’]).

Returns:

Pivoted DataFrame with IDs of referenced objects and specified columns.

Return type:

pandas.DataFrame

Examples

>>> table = data.references_from_simple("99722373_VL_TN1")

triplets.rdf_parser.references_simple(data, reference, columns=None, levels=1)[source]

Create a simplified table view of all references to and from a specified object.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the object to find references for.
columns (list, optional) – Columns to include in the output table (default is [‘Type’, ‘IdentifiedObject.name’] if available).
levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

Pivoted DataFrame with IDs, specified columns, and reference levels.

Return type:

pandas.DataFrame

Examples

>>> table = data.references_simple("99722373_VL_TN1", columns=["Type"])

triplets.rdf_parser.references_to(data, reference, levels=1)[source]

Retrieve all objects pointing to a specified reference object.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the reference object.
levels (int, optional) – Number of reference levels to traverse (default is 1).

Returns:

DataFrame containing triplets of objects pointing to the reference, with a ‘level’ column.

Return type:

pandas.DataFrame

Notes

TODO: Add the key on which the connection was made.

Examples

>>> refs = data.references_to("99722373_VL_TN1", levels=2)

triplets.rdf_parser.references_to_simple(data, reference, columns=['Type'])[source]

Create a simplified table view of objects referencing a specified object.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the object to find references to.
columns (list, optional) – Columns to include in the output table (default is [‘Type’]).

Returns:

Pivoted DataFrame with IDs of referencing objects and specified columns.

Return type:

pandas.DataFrame

Examples

>>> table = data.references_to_simple("99722373_VL_TN1")

triplets.rdf_parser.remove_prefix(original_string, prefix_string)[source]

Remove a specified prefix from a string.

Parameters:

original_string (str) – The input string to process.
prefix_string (str) – The prefix to remove from the input string.

Returns:

The input string with the prefix removed if present; otherwise, the original string.

Return type:

str

Examples

>>> remove_prefix("urn:uuid:1234", "urn:uuid:")
'1234'
>>> remove_prefix("abc", "xyz")
'abc'

triplets.rdf_parser.remove_triplet_from_triplet(from_triplet, what_triplet, columns=['ID', 'KEY', 'VALUE'])[source]

Remove triplets from one dataset that match another.

Parameters:

from_triplet (pandas.DataFrame) – Original triplet dataset.
what_triplet (pandas.DataFrame) – Triplet dataset to remove from the original.
columns (list, optional) – Columns to match for removal (default is [‘ID’, ‘KEY’, ‘VALUE’]).

Returns:

Dataset with matching triplets removed.

Return type:

pandas.DataFrame

Examples

>>> result = remove_triplet_from_triplet(data, to_remove)

triplets.rdf_parser.set_VALUE_at_KEY(data, key, value)[source]

Set the value for all instances of a specified key.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
key (str) – The key to update.
value (str) – The new value to set for the specified key.

Notes

TODO: Add debug logging for key, initial value, and new value.
TODO: Store changes in a changes DataFrame.

Examples

>>> data.set_VALUE_at_KEY("label", "new_label")

triplets.rdf_parser.set_VALUE_at_KEY_and_ID(data, key, value, id)[source]

Set the value for a specific key and ID.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
key (str) – The key to update.
value (str) – The new value to set.
id (str) – The ID of the object to update.

Examples

>>> data.set_VALUE_at_KEY_and_ID("label", "new_label", "uuid1")

triplets.rdf_parser.tableview_to_triplet(data)[source]

Convert a table view back to a triplet format.

Parameters:: data (pandas.DataFrame) – Pivoted DataFrame (table view) to convert.
Returns:: Triplet DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’].
Return type:: pandas.DataFrame

Notes

TODO: Ensure this is only used on valid table views.

Examples

>>> triplet = tableview_to_triplet(table_view)

triplets.rdf_parser.triplet_diff(old_data, new_data)[source]

Compute the difference between two triplet datasets.

Parameters:

old_data (pandas.DataFrame) – Original triplet dataset.
new_data (pandas.DataFrame) – New triplet dataset to compare against.

Returns:

DataFrame containing triplets unique to old_data or new_data, with an ‘_merge’ column indicating ‘left_only’ (in old_data) or ‘right_only’ (in new_data).

Return type:

pandas.DataFrame

Examples

>>> diff = triplet_diff(old_data, new_data)

triplets.rdf_parser.type_tableview(data, type_name, string_to_number=True, type_key='Type')[source]

Create a table view of all objects of a specified type.

Parameters:

data (pandas.DataFrame) – Triplet dataset containing RDF data.
type_name (str) – The type of objects to filter (e.g., ‘ACLineSegment’).
string_to_number (bool, optional) – If True, convert columns containing numbers to numeric types (default is True).
type_key (str, optional) – Key used to identify object types in the dataset (default is ‘Type’).

Returns:

Pivoted DataFrame with IDs as index and keys as columns, or None if no data is found.

Return type:

pandas.DataFrame or None

Examples

>>> table = data.type_tableview("ACLineSegment")

triplets.rdf_parser.types_dict(data)[source]

Return a dictionary of object types and their occurrence counts.

Parameters:: data (pandas.DataFrame) – Triplet dataset containing RDF data.
Returns:: Dictionary with object types as keys and their counts as values.
Return type:: dict

Examples

>>> types = data.types_dict()
>>> print(types)
{'ACLineSegment': 10, 'PowerTransformer': 5, ...}

triplets.rdf_parser.update_triplet_from_tableview(data, tableview, update=True, add=True, instance_id=None)[source]

Update or add triplets from a table view.

Parameters:

data (pandas.DataFrame) – Original triplet dataset to update.
tableview (pandas.DataFrame) – Table view containing updates or new data.
update (bool, optional) – If True, update existing ID-KEY pairs (default is True).
add (bool, optional) – If True, add new ID-KEY pairs (default is True).
instance_id (str, optional) – Instance ID to assign to new triplets (default is None).

Returns:

Updated triplet dataset.

Return type:

pandas.DataFrame

Examples

>>> updated_data = data.update_triplet_from_tableview(table_view, instance_id="uuid1")

triplets.rdf_parser.update_triplet_from_triplet(data, update_data, update=True, add=True)[source]

Update or add triplets from another triplet dataset.

Parameters:

data (pandas.DataFrame) – Original triplet dataset to update.
update_data (pandas.DataFrame) – Triplet dataset containing updates or new data.
update (bool, optional) – If True, update existing ID-KEY pairs (default is True).
add (bool, optional) – If True, add new ID-KEY pairs (default is True).

Returns:

Updated triplet dataset.

Return type:

pandas.DataFrame

Notes

TODO: Add a changes DataFrame to track modifications.
TODO: Support updating ID and KEY fields.

Examples

>>> updated_data = data.update_triplet_from_triplet(update_data)