Skip to content

adding graph export: csv, tsv and N-Triples#82

Merged
arun1729 merged 1 commit intomasterfrom
v3.7.4
Mar 9, 2026
Merged

adding graph export: csv, tsv and N-Triples#82
arun1729 merged 1 commit intomasterfrom
v3.7.4

Conversation

@arun1729
Copy link
Copy Markdown
Owner

@arun1729 arun1729 commented Mar 9, 2026

No description provided.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the CogDB graph library by introducing robust data export capabilities. Users can now easily extract all triples from a graph and save them to files in widely-used formats like N-Triples, CSV, and TSV. This addition improves data interoperability and facilitates integration with other tools and systems for analysis or further processing.

Highlights

  • New Export Module: A new module, cog/export.py, has been introduced to centralize graph triple extraction and export logic.
  • Graph Export Functionality: The Graph class in cog/torque.py now includes triples() and export() methods, providing direct access to graph data and export capabilities.
  • Multiple Export Formats: Graphs can now be exported to N-Triples (with optional W3C strict formatting), CSV, and TSV formats.
  • Comprehensive Testing: A new test file, test/test_export.py, has been added to ensure the reliability and correctness of all new export features.
  • Version Update: The project version in setup.py has been incremented to 3.7.4.
Changelog
  • cog/export.py
    • Added _is_iri and _is_blank_node helper functions for term identification.
    • Implemented _to_nt_term for formatting terms according to W3C N-Triples strict rules, including escaping special characters.
    • Introduced get_triples function to extract all (subject, predicate, object) tuples from a given graph.
    • Provided export_triples function to write extracted triples to a specified file path in N-Triples, CSV, or TSV format.
  • cog/torque.py
    • Added a new triples() method to the Graph class, which returns all triples as a list of tuples by calling cog.export.get_triples.
    • Added a new export() method to the Graph class, allowing users to export graph data to a file in various formats by calling cog.export.export_triples.
  • setup.py
    • Updated the project version from 3.7.3 to 3.7.4.
  • test/test_export.py
    • Added a new test suite ExportTest to validate the graph export functionalities.
    • Included tests for the Graph.triples() method, verifying its return type, content, and behavior with empty graphs.
    • Implemented tests for N-Triples export, covering default behavior, W3C strict mode, and correct handling of IRIs, blank nodes, and escaped literals.
    • Added tests for CSV and TSV export formats, checking header presence and data integrity.
    • Verified the standalone get_triples and export_triples functions from the cog.export module.
    • Included a test case for error handling when an unsupported export format is provided.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for exporting graph data into various formats. However, the N-Triples export implementation lacks sufficient escaping, which can lead to injection vulnerabilities, and the direct use of the file export path without validation poses a risk of arbitrary file write. Beyond these security concerns, there are critical performance and scalability issues in how triples are extracted, and further attention is needed for API compatibility and improving test portability.

Comment thread cog/export.py Outdated
Comment thread cog/torque.py Outdated
# [("alice", "follows", "bob"), ("bob", "follows", "charlie")]
"""
from cog.export import get_triples
return get_triples(self)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The docstring for triples() states that it returns a list. If cog.export.get_triples is changed to a generator (as suggested for performance and memory improvements), this method will return a generator, breaking the API contract and likely causing issues for existing callers. To maintain backward compatibility, you should convert the generator to a list before returning.

Suggested change
return get_triples(self)
return list(get_triples(self))

Comment thread cog/export.py Outdated
Comment on lines +125 to +126
for s, p, o in all_triples:
f.write('{} {} {} .\n'.format(s, p, o))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The export_triples function is vulnerable to N-Triples injection due to a lack of escaping or validation when strict=False (the default). This can lead to data corruption or security issues if the graph contains special characters. Beyond this, the function should be refactored to stream triples using a generator for better memory efficiency, and the csv and tsv format logic can be consolidated to improve clarity and maintainability.

Comment thread cog/export.py Outdated
Comment thread cog/export.py
Comment thread test/test_export.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 73c01dcb1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cog/export.py Outdated
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87.37%. Comparing base (197ce68) to head (e1ff2cf).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
cog/export.py 98.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #82      +/-   ##
==========================================
+ Coverage   87.12%   87.37%   +0.25%     
==========================================
  Files          12       13       +1     
  Lines        2376     2432      +56     
==========================================
+ Hits         2070     2125      +55     
- Misses        306      307       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@arun1729 arun1729 merged commit d5bbab6 into master Mar 9, 2026
6 checks passed
@arun1729 arun1729 deleted the v3.7.4 branch March 9, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant