Skip to content

harish876/CodeRefineAI

Repository files navigation

CodeRefineAI

Title : Can LLM's identify and remove Software Inefficiencies?

  1. Research Question 1: How accurately LLMs can detect and reason code inefficiencies on LLM?
  2. Research Question 2: How can we use different prompt engineering techniques on LLM models to produce efficient code?
  3. Research Question 3: Which LLM model and model settings are suitable to generate the most efficient code?

Dataset

Dataset Description

The balanced_samples dataset contains 200 programming problems from LeetCode, with multiple reference implementations for each problem including:

  • Runtime efficient solutions
  • Runtime inefficient solutions
  • Memory efficient solutions
  • Memory inefficient solutions

Each problem is categorized by difficulty level (Easy, Medium, Hard) and annotated with relevant programming topics (e.g., Dynamic Programming, Arrays, Graphs).

This balanced collection ensures comprehensive testing across different algorithmic concepts, complexity levels, and inefficiency patterns, providing a robust benchmark for evaluating LLM capabilities in code optimization.

Dataset has been augmented to categorise the solutions into the category mentioned above . Here is the link to the dataset and the script used for augmentation.

Tools Used

Code Executor Utility

Code Submission Utility

Overview

This utility script provides a command-line interface for submitting code solutions to a Judge0 execution environment and retrieving the results. It is designed to work with both reference solutions and LLM-generated code (Gemini or Llama).

Installation

Usage

The script supports two primary operations:

  1. Submitting code solutions for execution
  2. Retrieving results of previously submitted solutions

Command Format

Parameters

Parameter Description Required Default
action Either "submit" or "get" -
model Either "gemini" or "llama" -
--file Source dataset filename (without extension) "dataset_preview"
--dir Base directory for files Current P2 directory
--solution_file File containing LLM solutions None
--solution_metric Metric to evaluate (runtime/memory) "runtime"
--solution_type Solution type (efficient/inefficient/moderate) "efficient"

Examples

Help Command

python src/main.py --help

Submit Reference Solutions

python src/main.py --action submit gemini --file my_dataset --solution_type efficient --solution_metric runtime

Submit LLM Generated Solutions

python src/main.py --action submit llama --file my_dataset --solution_file ref_solutions.py

Retrieve Results

python src/main.py --action get gemini --model gemini --file my_dataset

Output Files

The script generates JSON output files with the following naming conventions:

  • For LLM solutions: {file_name}_{model_name}_codegen_submissions.json
  • For reference solutions: {file_name}_{model_name}_reference_{metric_type}_submissions.json

Where metric_type will be one of:

  • rt_eff
  • rt_ineff
  • mem_eff
  • mem_ineff

Requirements

  • Python 3.7+
  • Access to a Judge0 instance (self-hosted or cloud)
  • The coderefineai_executor package

Notes

  • The script requires properly formatted input files with question IDs.
  • Each submission gets a token for tracking and later result retrieval.
  • Configure the Judge0 instance in the settings section of the script.

Research Question 1

Dataset

  • Dataset for Prompting:
    • Script to sample one solution per efficiency type : RQ1_data

Methodolgy

Research Question 2

Dataset

Methodolgy

  • Vanilla Prompting

    • Description: Direct single-shot prompting that provides the LLM with only the problem statement and asks for an optimized solution.
    • Implementation: The model receives the problem description and is asked to generate a solution with a focus on efficiency, without additional guidance or context.
    • Purpose: Establishes a baseline for how well LLMs can generate efficient code with minimal intervention.
    • Gemini Vanilla Prompting Script - Gemini_Vanilla_Prompting
    • Llama Vanilla Prompting Script -Llama_Vanilla_Prompting
  • Reasoning Prompting :

    • Description: A multi-stage prompting approach that leverages explicit reasoning about algorithmic efficiency before code generation.
    • Implementation:
      • For Gemini: Implements a self-feedback loop where the model first reasons about optimal algorithmic approaches, time/space complexity considerations, and potential inefficiencies before generating the final solution.
      • For Llama: Uses Gemini as a reasoning engine to generate efficiency insights, which are then distilled and provided to Llama as context for its code generation.
    • Purpose: Tests whether explicit reasoning about algorithmic efficiency improves the quality of generated solutions compared to vanilla prompting.
    • Technical Details: This approach simulates a human developer's thought process by explicitly considering algorithmic trade-offs before implementation.
    • Gemini Reasoning Prompting Script - Gemini_Vanilla_Prompting
    • Llama Gemini Reasoning Script -Llama_Vanilla_Prompting
  • Analysis:

Research Question 3

Contact

For any questions or inquiries, please contact the very handsome harish876.

About

Can LLM's identify Software Inefficiencies?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors