The Smart Diff Engine provides comprehensive cross-file refactoring detection capabilities that go beyond simple line-by-line comparison. This document describes the advanced features for detecting code movements, renames, splits, and merges across multiple files.
The FileRefactoringDetector identifies structural changes at the file level:
Detects when a file is renamed but maintains similar content:
- Content Similarity: Compares file content using fingerprinting
- Path Similarity: Analyzes file path and name similarity
- Symbol Migration: Tracks how symbols move between files
Detects when a single file is split into multiple files:
- Identifies when content from one source file appears in multiple target files
- Tracks which symbols migrated to which files
- Provides confidence scores for each split detection
Detects when multiple files are merged into one:
- Identifies when content from multiple source files appears in a single target file
- Tracks symbol consolidation
- Provides confidence scores for merge detection
Detects when files are moved to different directories:
- Distinguishes between pure moves (same name, different directory)
- Detects combined move+rename operations
- Tracks directory structure changes
The SymbolMigrationTracker provides detailed tracking of how symbols move between files:
- Tracks individual functions, classes, and variables
- Detects symbol renames during migration
- Provides confidence scores for each migration
- Groups symbol migrations by file pairs
- Calculates migration percentages
- Identifies patterns in refactoring
- Tracks how references change when symbols move
- Identifies broken references
- Detects reference updates that follow moved symbols
Enhanced integration with the semantic analysis layer:
- Uses
SymbolResolverto track symbols across files - Resolves cross-file references
- Builds import dependency graphs
- Tracks all symbol references across the codebase
- Identifies which files reference which symbols
- Detects when references need to be updated
use smart_diff_engine::{FileRefactoringDetector, FileRefactoringDetectorConfig};
use std::collections::HashMap;
// Create detector with default configuration
let detector = FileRefactoringDetector::with_defaults();
// Prepare source and target file contents
let mut source_files = HashMap::new();
source_files.insert("Calculator.java".to_string(), source_content);
let mut target_files = HashMap::new();
target_files.insert("MathCalculator.java".to_string(), target_content);
// Detect refactorings
let result = detector.detect_file_refactorings(&source_files, &target_files)?;
// Analyze results
for rename in &result.file_renames {
println!("Renamed: {} -> {}", rename.source_path, rename.target_path);
println!("Confidence: {:.2}%", rename.confidence * 100.0);
}
for split in &result.file_splits {
println!("Split: {} into {} files", split.source_path, split.target_files.len());
}
for merge in &result.file_merges {
println!("Merged: {} files into {}", merge.source_files.len(), merge.target_path);
}use smart_diff_engine::{SymbolMigrationTracker, SymbolMigrationTrackerConfig};
use smart_diff_semantic::SymbolResolver;
// Create symbol resolvers for source and target
let mut source_resolver = SymbolResolver::with_defaults();
let mut target_resolver = SymbolResolver::with_defaults();
// Process files
source_resolver.process_file("Calculator.java", &source_parse_result)?;
target_resolver.process_file("MathCalculator.java", &target_parse_result)?;
// Track migrations
let tracker = SymbolMigrationTracker::with_defaults();
let migration_result = tracker.track_migrations(&source_resolver, &target_resolver)?;
// Analyze migrations
for migration in &migration_result.symbol_migrations {
println!("Symbol {} migrated from {} to {}",
migration.symbol_name,
migration.source_file,
migration.target_file
);
}use smart_diff_engine::FileRefactoringDetectorConfig;
let config = FileRefactoringDetectorConfig {
min_rename_similarity: 0.8, // Higher threshold for renames
min_split_similarity: 0.6, // Higher threshold for splits
min_merge_similarity: 0.6, // Higher threshold for merges
use_path_similarity: true,
use_content_fingerprinting: true,
use_symbol_migration: true,
max_split_merge_candidates: 5, // Limit candidates
};
let detector = FileRefactoringDetector::new(config);The system uses multiple techniques to create content fingerprints:
- Content Hash: Full content hash for exact matching
- Normalized Hash: Hash of content with whitespace removed
- Identifier Set: Set of unique identifiers (classes, functions, variables)
- Line Counts: Total lines and non-empty lines
File similarity is calculated using weighted combination:
similarity = (identifier_similarity * 0.7) + (line_similarity * 0.3)
For rename detection:
combined_score = (content_sim * 0.6) + (path_sim * 0.2) + (symbol_migration * 0.2)
- Exact Match Phase: Find files with identical content
- Fingerprint Match Phase: Find files with matching normalized content
- Identifier Match Phase: Find files with high identifier overlap
- Path Analysis Phase: Determine if it's a move, rename, or both
- Identifier Overlap: Calculate Jaccard similarity of identifier sets
- Candidate Selection: Find files with significant overlap
- Grouping: Group candidates by similarity scores
- Confidence Scoring: Calculate confidence based on overlap and patterns
| Option | Default | Description |
|---|---|---|
min_rename_similarity |
0.7 | Minimum similarity for rename detection |
min_split_similarity |
0.5 | Minimum similarity for split detection |
min_merge_similarity |
0.5 | Minimum similarity for merge detection |
use_path_similarity |
true | Enable path similarity analysis |
use_content_fingerprinting |
true | Enable content fingerprinting |
use_symbol_migration |
true | Enable symbol migration tracking |
max_split_merge_candidates |
10 | Maximum candidates for split/merge |
| Option | Default | Description |
|---|---|---|
min_migration_threshold |
0.3 | Minimum migration percentage |
track_functions |
true | Track function migrations |
track_classes |
true | Track class migrations |
track_variables |
false | Track variable migrations |
analyze_cross_file_references |
true | Analyze reference changes |
- File Count: Optimized for up to 50 files per comparison
- File Size: Efficient fingerprinting for files up to 10,000 lines
- Symbol Count: Can handle thousands of symbols per file
- Early Termination: Stop searching when high-confidence match found
- Fingerprint Caching: Cache fingerprints to avoid recomputation
- Parallel Processing: Use rayon for parallel file analysis
- Threshold Filtering: Skip low-similarity candidates early
Different languages may require different similarity thresholds:
- Verbose languages (Java, C#): Higher thresholds (0.8+)
- Concise languages (Python, Ruby): Lower thresholds (0.6+)
Symbol migration tracking significantly improves detection accuracy:
config.use_symbol_migration = true;If your codebase has consistent naming conventions:
config.use_path_similarity = true;For large codebases, limit split/merge candidates:
config.max_split_merge_candidates = 5;See the following examples for detailed usage:
examples/enhanced_cross_file_detection_demo.rs- Comprehensive demoexamples/cross_file_tracking_demo.rs- Function-level trackingexamples/symbol_resolution_demo.rs- Symbol resolution
use smart_diff_engine::{CrossFileTracker, FileRefactoringDetector};
// Detect file-level refactorings first
let file_detector = FileRefactoringDetector::with_defaults();
let file_result = file_detector.detect_file_refactorings(&source_files, &target_files)?;
// Then detect function-level moves
let mut tracker = CrossFileTracker::with_defaults(Language::Java);
let function_result = tracker.track_cross_file_changes(&source_functions, &target_functions)?;
// Combine results for comprehensive analysis// Build symbol tables
let mut source_resolver = SymbolResolver::with_defaults();
let mut target_resolver = SymbolResolver::with_defaults();
// Process all files
for (path, parse_result) in source_files {
source_resolver.process_file(&path, &parse_result)?;
}
// Use with cross-file tracker
let mut tracker = CrossFileTracker::with_defaults(Language::Java);
tracker.set_symbol_resolver(source_resolver);If detection accuracy is low:
- Lower similarity thresholds
- Enable all detection features
- Check if files have sufficient identifiers
- Verify language-specific parsing is working
If getting too many false positives:
- Increase similarity thresholds
- Enable path similarity filtering
- Increase
min_migration_threshold - Reduce
max_split_merge_candidates
If performance is slow:
- Reduce
max_split_merge_candidates - Disable symbol migration for initial pass
- Filter files by extension before processing
- Use parallel processing
Planned improvements:
- Machine learning-based similarity scoring
- Language-specific refactoring patterns
- IDE integration for real-time detection
- Visualization of refactoring flows
- Automatic refactoring suggestion