triplets.rdf_parser module
- triplets.rdf_parser.clean_ID(ID)[source]
Remove common CIM ID prefixes from a string.
- Parameters:
ID (str) – The input ID string to clean.
- Returns:
The ID with prefixes (’urn:uuid:’, ‘#_’, ‘_’) removed from the start.
- Return type:
str
Notes
Sequentially removes ‘urn:uuid:’, ‘#_’, and ‘_’ prefixes using removeprefix.
TODO: Verify if these characters are absent in UUIDs to ensure safe removal.
Examples
>>> clean_ID("urn:uuid:1234") '1234' >>> clean_ID("#_abc") 'abc'
- triplets.rdf_parser.export_to_cimxml(data, rdf_map=None, namespace_map={'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}, class_KEY='Type', export_undefined=True, export_type='xml_per_instance_zip_per_xml', global_zip_filename='Export.zip', debug=False, export_to_memory=False, export_base_path='', comment=None, max_workers=None)[source]
Export a triplet dataset to CIM RDF XML or ZIP files.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
rdf_map (dict, optional) – Dictionary mapping classes and keys to RDF namespaces and attributes.
namespace_map (dict, optional) – Dictionary of namespace prefixes and URIs (default includes RDF namespace).
class_KEY (str, optional) – Key used to identify object types (default is ‘Type’).
export_undefined (bool, optional) – If True, export undefined classes and tags with default settings (default is True).
export_type (str, optional) – Export format: ‘xml_per_instance’, ‘xml_per_instance_zip_per_all’, or ‘xml_per_instance_zip_per_xml’ (default is ‘xml_per_instance_zip_per_xml’).
global_zip_filename (str, optional) – Filename for the global ZIP archive (default is ‘Export.zip’).
debug (bool, optional) – If True, log timing information for debugging (default is False).
export_to_memory (bool, optional) – If True, return file objects in memory; otherwise, save to disk (default is False).
export_base_path (str, optional) – Directory path to save exported files (default is empty, uses current directory).
comment (str, optional) – Comment to include in the XML output (default is None).
max_workers (int, optional) – Number of worker processes for parallel processing (default is None).
- Returns:
List of file-like objects (if export_to_memory=True) or filenames (if export_to_memory=False).
- Return type:
list
Examples
>>> files = data.export_to_cimxml(rdf_map, export_type="xml_per_instance")
- triplets.rdf_parser.export_to_excel(data, path=None)[source]
Export triplet data to an Excel file, with each type on a separate sheet.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
path (str, optional) – Directory path to save the Excel file (default is current working directory).
Notes
Uses ‘label’ key to determine the filename for each INSTANCE_ID.
Each object type is exported to a separate sheet.
TODO: Add support for XlsxWriter properties for better formatting.
Examples
>>> data.export_to_excel("output_dir")
- triplets.rdf_parser.export_to_networkx(data)[source]
Convert a triplet dataset to a NetworkX graph.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
- Returns:
A NetworkX graph with nodes (IDs with Type attributes) and edges (references).
- Return type:
networkx.Graph
Notes
TODO: Add all node data and support additional graph export formats.
Examples
>>> graph = data.to_networkx()
- triplets.rdf_parser.filter_triplet_by_type(triplet, type)[source]
Filter triplet dataset by objects of a specific type.
- Parameters:
triplet (pandas.DataFrame) – Triplet dataset containing RDF data.
type (str) – Object type to filter by (e.g., ‘ACLineSegment’).
- Returns:
Filtered triplet dataset containing only objects of the specified type.
- Return type:
pandas.DataFrame
Examples
>>> filtered = filter_triplet_by_type(data, "ACLineSegment")
- triplets.rdf_parser.find_all_xml(list_of_paths_to_zip_globalzip_xml, debug=False)[source]
Extract XML files from a list of paths or ZIP archives.
- Parameters:
list_of_paths_to_zip_globalzip_xml (list) – List of paths to XML files, ZIP archives, or file-like objects.
debug (bool, optional) – If True, log file processing details for debugging (default is False).
- Returns:
List of file-like objects for XML files found in the input paths or ZIPs.
- Return type:
list
Notes
Supports XML, RDF, and ZIP files; other file types are logged as unsupported.
TODO: Add support for random folders.
Examples
>>> xml_files = find_all_xml(["data.zip", "file.xml"])
- triplets.rdf_parser.generate_xml(instance_data, rdf_map=None, namespace_map={'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}, class_KEY='Type', export_undefined=True, comment=None, debug=False)[source]
Generate an RDF XML file from a triplet dataset instance.
- Parameters:
instance_data (pandas.DataFrame) – Triplet dataset for a single instance.
rdf_map (dict, optional) – Dictionary mapping classes and keys to RDF namespaces and attributes.
namespace_map (dict, optional) – Dictionary of namespace prefixes and URIs (default includes RDF namespace).
class_KEY (str, optional) – Key used to identify object types (default is ‘Type’).
export_undefined (bool, optional) – If True, export undefined classes and tags with default settings (default is True).
comment (str, optional) – Comment to include in the XML output (default is None).
debug (bool, optional) – If True, log timing information for debugging (default is False).
- Returns:
Dictionary with ‘filename’ (str) and ‘file’ (bytes) containing the XML output.
- Return type:
dict
Examples
>>> xml_data = generate_xml(instance_data, rdf_map, namespace_map)
- triplets.rdf_parser.get_object_data(data, object_UUID)[source]
Retrieve data for a specific object by its UUID.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
object_UUID (str) – UUID of the object to retrieve.
- Returns:
Series with keys as index and values for the specified object.
- Return type:
pandas.Series
Examples
>>> obj_data = data.get_object_data("uuid1")
- triplets.rdf_parser.get_qname(namespace, tag=None)[source]
Generate a QName for a given namespace and tag, with caching.
- Parameters:
namespace (str) – The namespace URI.
tag (str, optional) – The tag name (default is None).
- Returns:
The qualified name object for the namespace and tag.
- Return type:
lxml.etree.QName
Examples
>>> qname = get_qname("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "RDF")
- triplets.rdf_parser.key_tableview(data, key_name, string_to_number=True)[source]
Create a table view of all objects with a specified key.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
key_name (str) – The key to filter objects by (e.g., ‘GeneratingUnit.maxOperatingP’).
string_to_number (bool, optional) – If True, convert columns containing numbers to numeric types (default is True).
- Returns:
Pivoted DataFrame with IDs as index and keys as columns, or None if no data is found.
- Return type:
pandas.DataFrame or None
Examples
>>> table = data.key_tableview("GeneratingUnit.maxOperatingP")
- triplets.rdf_parser.load_RDF_objects_from_XML(path_or_fileobject, debug=False)[source]
Parse an XML file and return an iterator of RDF objects with instance ID and namespace map.
- Parameters:
path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.
debug (bool, optional) – If True, log timing information for debugging (default is False).
- Returns:
A tuple containing: - RDF_objects (iterator): Iterator over RDF objects in the XML. - instance_id (str): Unique UUID for the loaded instance. - namespace_map (dict): Dictionary of namespace prefixes and URIs.
- Return type:
tuple
Examples
>>> rdf_objects, instance_id, ns_map = load_RDF_objects_from_XML("file.xml")
- triplets.rdf_parser.load_RDF_to_dataframe(path_or_fileobject, debug=False, data_type='string')[source]
Parse a single RDF XML file into a Pandas DataFrame.
- Parameters:
path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.
debug (bool, optional) – If True, log timing information for debugging (default is False).
data_type (str, optional) – Data type for DataFrame columns (default is ‘string’).
- Returns:
DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’, ‘INSTANCE_ID’] representing the triplestore.
- Return type:
pandas.DataFrame
Examples
>>> df = load_RDF_to_dataframe("file.xml")
- triplets.rdf_parser.load_RDF_to_list(path_or_fileobject, debug=False, keep_ns=False)[source]
Parse a single RDF XML file into a triplestore list.
- Parameters:
path_or_fileobject (str or file-like object) – Path to the XML file or a file-like object containing RDF XML data.
debug (bool, optional) – If True, log timing information for debugging (default is False).
keep_ns (bool, optional) – If True, retain namespace information in the output (default is False, unused).
- Returns:
List of tuples in the format (ID, KEY, VALUE, INSTANCE_ID) representing the triplestore.
- Return type:
list
Examples
>>> triples = load_RDF_to_list("file.xml")
- triplets.rdf_parser.load_all_to_dataframe(list_of_paths_to_zip_globalzip_xml, debug=False, data_type='string', max_workers=None)[source]
Parse multiple RDF XML files or ZIP archives into a single Pandas DataFrame.
- Parameters:
list_of_paths_to_zip_globalzip_xml (list or str) – List of paths to XML files, ZIP archives, or a single path.
debug (bool, optional) – If True, log timing information for debugging (default is False).
data_type (str, optional) – Data type for DataFrame columns (default is ‘string’).
max_workers (int, optional) – Number of worker threads for parallel processing (default is None).
- Returns:
DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’, ‘INSTANCE_ID’] containing all parsed data.
- Return type:
pandas.DataFrame
Examples
>>> df = load_all_to_dataframe(["data.zip", "file.xml"], max_workers=4)
- triplets.rdf_parser.print_duration(text, start_time)[source]
Print duration between now and start time.
- Parameters:
text (str) – Description of the timed operation to include in the log message.
start_time (datetime.datetime) – Start time of the operation.
- Returns:
A tuple containing: - duration (timedelta): Time elapsed since start_time. - end_time (datetime.datetime): Current time when the function is called.
- Return type:
tuple
Examples
>>> start = datetime.datetime.now() >>> duration, end = print_duration("Operation completed", start)
- triplets.rdf_parser.print_triplet_diff(old_data, new_data, file_id_object='Distribution', file_id_key='label', exclude_objects=None)[source]
Print a human-readable diff of two triplet datasets.
- Parameters:
old_data (pandas.DataFrame) – Original triplet dataset.
new_data (pandas.DataFrame) – New triplet dataset to compare against.
file_id_object (str, optional) – Object type containing file identifiers (default is ‘Distribution’).
file_id_key (str, optional) – Key containing file identifiers (default is ‘label’).
exclude_objects (list, optional) – List of object types to exclude from the diff (default is None).
Notes
Outputs a diff format showing removed, added, and changed objects.
Nice diff viewer https://diffy.org/
TODO: Add name field for better reporting with Type.
Examples
>>> print_triplet_diff(old_data, new_data, exclude_objects=["NamespaceMap"])
- triplets.rdf_parser.references(data, ID, levels=1)[source]
Retrieve all references (to and from) a specified object.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
ID (str) – ID of the object to find references for.
levels (int, optional) – Number of reference levels to traverse (default is 1).
- Returns:
DataFrame containing triplets of all references to and from the object.
- Return type:
pandas.DataFrame
Examples
>>> refs = data.references("99722373_VL_TN1", levels=2)
- triplets.rdf_parser.references_all(data)[source]
Find all unique references (links) in the dataset.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
- Returns:
DataFrame with columns [‘ID_FROM’, ‘KEY’, ‘ID_TO’] representing all references.
- Return type:
pandas.DataFrame
Notes
Does not consider INSTANCE_ID in reference matching.
Examples
>>> refs = data.references_all()
- triplets.rdf_parser.references_from(data, reference, levels=1)[source]
Retrieve all objects a specified object points to.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the reference object.
levels (int, optional) – Number of reference levels to traverse (default is 1).
- Returns:
DataFrame containing triplets of objects referenced by the input, with a ‘level’ column.
- Return type:
pandas.DataFrame
Notes
TODO: Add the key on which the connection was made.
Examples
>>> refs = data.references_from("99722373_VL_TN1", levels=2)
- triplets.rdf_parser.references_from_simple(data, reference, columns=['Type'])[source]
Create a simplified table view of objects a specified object refers to.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the object to find references from.
columns (list, optional) – Columns to include in the output table (default is [‘Type’]).
- Returns:
Pivoted DataFrame with IDs of referenced objects and specified columns.
- Return type:
pandas.DataFrame
Examples
>>> table = data.references_from_simple("99722373_VL_TN1")
- triplets.rdf_parser.references_simple(data, reference, columns=None, levels=1)[source]
Create a simplified table view of all references to and from a specified object.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the object to find references for.
columns (list, optional) – Columns to include in the output table (default is [‘Type’, ‘IdentifiedObject.name’] if available).
levels (int, optional) – Number of reference levels to traverse (default is 1).
- Returns:
Pivoted DataFrame with IDs, specified columns, and reference levels.
- Return type:
pandas.DataFrame
Examples
>>> table = data.references_simple("99722373_VL_TN1", columns=["Type"])
- triplets.rdf_parser.references_to(data, reference, levels=1)[source]
Retrieve all objects pointing to a specified reference object.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the reference object.
levels (int, optional) – Number of reference levels to traverse (default is 1).
- Returns:
DataFrame containing triplets of objects pointing to the reference, with a ‘level’ column.
- Return type:
pandas.DataFrame
Notes
TODO: Add the key on which the connection was made.
Examples
>>> refs = data.references_to("99722373_VL_TN1", levels=2)
- triplets.rdf_parser.references_to_simple(data, reference, columns=['Type'])[source]
Create a simplified table view of objects referencing a specified object.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
reference (str) – ID of the object to find references to.
columns (list, optional) – Columns to include in the output table (default is [‘Type’]).
- Returns:
Pivoted DataFrame with IDs of referencing objects and specified columns.
- Return type:
pandas.DataFrame
Examples
>>> table = data.references_to_simple("99722373_VL_TN1")
- triplets.rdf_parser.remove_prefix(original_string, prefix_string)[source]
Remove a specified prefix from a string.
- Parameters:
original_string (str) – The input string to process.
prefix_string (str) – The prefix to remove from the input string.
- Returns:
The input string with the prefix removed if present; otherwise, the original string.
- Return type:
str
Examples
>>> remove_prefix("urn:uuid:1234", "urn:uuid:") '1234' >>> remove_prefix("abc", "xyz") 'abc'
- triplets.rdf_parser.remove_triplet_from_triplet(from_triplet, what_triplet, columns=['ID', 'KEY', 'VALUE'])[source]
Remove triplets from one dataset that match another.
- Parameters:
from_triplet (pandas.DataFrame) – Original triplet dataset.
what_triplet (pandas.DataFrame) – Triplet dataset to remove from the original.
columns (list, optional) – Columns to match for removal (default is [‘ID’, ‘KEY’, ‘VALUE’]).
- Returns:
Dataset with matching triplets removed.
- Return type:
pandas.DataFrame
Examples
>>> result = remove_triplet_from_triplet(data, to_remove)
- triplets.rdf_parser.set_VALUE_at_KEY(data, key, value)[source]
Set the value for all instances of a specified key.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
key (str) – The key to update.
value (str) – The new value to set for the specified key.
Notes
TODO: Add debug logging for key, initial value, and new value.
TODO: Store changes in a changes DataFrame.
Examples
>>> data.set_VALUE_at_KEY("label", "new_label")
- triplets.rdf_parser.set_VALUE_at_KEY_and_ID(data, key, value, id)[source]
Set the value for a specific key and ID.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
key (str) – The key to update.
value (str) – The new value to set.
id (str) – The ID of the object to update.
Examples
>>> data.set_VALUE_at_KEY_and_ID("label", "new_label", "uuid1")
- triplets.rdf_parser.tableview_to_triplet(data)[source]
Convert a table view back to a triplet format.
- Parameters:
data (pandas.DataFrame) – Pivoted DataFrame (table view) to convert.
- Returns:
Triplet DataFrame with columns [‘ID’, ‘KEY’, ‘VALUE’].
- Return type:
pandas.DataFrame
Notes
TODO: Ensure this is only used on valid table views.
Examples
>>> triplet = tableview_to_triplet(table_view)
- triplets.rdf_parser.triplet_diff(old_data, new_data)[source]
Compute the difference between two triplet datasets.
- Parameters:
old_data (pandas.DataFrame) – Original triplet dataset.
new_data (pandas.DataFrame) – New triplet dataset to compare against.
- Returns:
DataFrame containing triplets unique to old_data or new_data, with an ‘_merge’ column indicating ‘left_only’ (in old_data) or ‘right_only’ (in new_data).
- Return type:
pandas.DataFrame
Examples
>>> diff = triplet_diff(old_data, new_data)
- triplets.rdf_parser.type_tableview(data, type_name, string_to_number=True, type_key='Type')[source]
Create a table view of all objects of a specified type.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
type_name (str) – The type of objects to filter (e.g., ‘ACLineSegment’).
string_to_number (bool, optional) – If True, convert columns containing numbers to numeric types (default is True).
type_key (str, optional) – Key used to identify object types in the dataset (default is ‘Type’).
- Returns:
Pivoted DataFrame with IDs as index and keys as columns, or None if no data is found.
- Return type:
pandas.DataFrame or None
Examples
>>> table = data.type_tableview("ACLineSegment")
- triplets.rdf_parser.types_dict(data)[source]
Return a dictionary of object types and their occurrence counts.
- Parameters:
data (pandas.DataFrame) – Triplet dataset containing RDF data.
- Returns:
Dictionary with object types as keys and their counts as values.
- Return type:
dict
Examples
>>> types = data.types_dict() >>> print(types) {'ACLineSegment': 10, 'PowerTransformer': 5, ...}
- triplets.rdf_parser.update_triplet_from_tableview(data, tableview, update=True, add=True, instance_id=None)[source]
Update or add triplets from a table view.
- Parameters:
data (pandas.DataFrame) – Original triplet dataset to update.
tableview (pandas.DataFrame) – Table view containing updates or new data.
update (bool, optional) – If True, update existing ID-KEY pairs (default is True).
add (bool, optional) – If True, add new ID-KEY pairs (default is True).
instance_id (str, optional) – Instance ID to assign to new triplets (default is None).
- Returns:
Updated triplet dataset.
- Return type:
pandas.DataFrame
Examples
>>> updated_data = data.update_triplet_from_tableview(table_view, instance_id="uuid1")
- triplets.rdf_parser.update_triplet_from_triplet(data, update_data, update=True, add=True)[source]
Update or add triplets from another triplet dataset.
- Parameters:
data (pandas.DataFrame) – Original triplet dataset to update.
update_data (pandas.DataFrame) – Triplet dataset containing updates or new data.
update (bool, optional) – If True, update existing ID-KEY pairs (default is True).
add (bool, optional) – If True, add new ID-KEY pairs (default is True).
- Returns:
Updated triplet dataset.
- Return type:
pandas.DataFrame
Notes
TODO: Add a changes DataFrame to track modifications.
TODO: Support updating ID and KEY fields.
Examples
>>> updated_data = data.update_triplet_from_triplet(update_data)