Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Volcano_Plot() function to generate static + interactive volcano plots (via EnhancedVolcano + plotly) and adds a testthat unit test suite using synthetic DEG data.
Changes:
- Added
R/Volcano_Plot.Rimplementing volcano plot generation for one or multiple comparisons, returning plots plus ranked output data. - Added
tests/testthat/test-Volcano_Plot.Rcovering basic execution, return structure, multiple comparisons, and a few edge cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
R/Volcano_Plot.R |
New exported plotting function using EnhancedVolcano with optional labeling and interactive plotly output. |
tests/testthat/test-Volcano_Plot.R |
New tests validating Volcano_Plot() output structure and a few edge cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (nchar(change_lfc_name) == 0){lfc_name = lfc.col[i]} | ||
| if (nchar(change_sig_name) == 0){sig_name = sig.col[i]} | ||
| colnames(df) <- c(label.col, change_lfc_name, sig_name) | ||
| } else { | ||
| lfc_name = lfc.col[i] | ||
| sig_name = sig.col[i] |
There was a problem hiding this comment.
In the use_custom_lab branch, lfc_name/sig_name can be undefined when change_lfc_name/change_sig_name are non-empty (the typical case), which will error at colnames(df) <- ... and later lookups. Set lfc_name/sig_name deterministically (e.g., to the chosen label names) and ensure they always correspond to actual column names in df.
| if (nchar(change_lfc_name) == 0){lfc_name = lfc.col[i]} | |
| if (nchar(change_sig_name) == 0){sig_name = sig.col[i]} | |
| colnames(df) <- c(label.col, change_lfc_name, sig_name) | |
| } else { | |
| lfc_name = lfc.col[i] | |
| sig_name = sig.col[i] | |
| lfc_name <- if (nchar(change_lfc_name) == 0) lfc.col[i] else change_lfc_name | |
| sig_name <- if (nchar(change_sig_name) == 0) sig.col[i] else change_sig_name | |
| colnames(df) <- c(label.col, lfc_name, sig_name) | |
| } else { | |
| lfc_name <- lfc.col[i] | |
| sig_name <- sig.col[i] |
|
|
||
| # By default, nothing will be greater than maxy. User can set this value lower | ||
| keep <- -log10(df[[sig_name]]) <= maxy | ||
| df[[sig_name]][!keep] <- maxy |
There was a problem hiding this comment.
The y-limit capping logic overwrites p-values with maxy (a -log10(p) value): df[[sig_name]][!keep] <- maxy. That produces invalid p-values (>1) and will move highly-significant points to the bottom of the plot. If you want to cap the plotted y, either cap using the corresponding p-value (10^-maxy) or (preferably) leave the data unchanged and apply a plotting limit (e.g., via ggplot scales/coord_cartesian).
| df[[sig_name]][!keep] <- maxy | |
| df[[sig_name]][!keep] <- 10^(-maxy) |
| df_sub <- df | ||
| } | ||
|
|
||
| genes_to_label <- as.character(df_sub[1:no_genes_to_label, label.col]) |
There was a problem hiding this comment.
genes_to_label <- df_sub[1:no_genes_to_label, ...] will introduce NA labels (or error) when df_sub has fewer than no_genes_to_label rows, including when the is_red filter yields 0 rows. Consider using head()/seq_len(min(nrow(df_sub), no_genes_to_label)) and dropping NA from genes_to_label before passing to EnhancedVolcano.
| genes_to_label <- as.character(df_sub[1:no_genes_to_label, label.col]) | |
| n_to_label <- min(nrow(df_sub), no_genes_to_label) | |
| genes_to_label <- as.character(df_sub[seq_len(n_to_label), label.col]) | |
| genes_to_label <- genes_to_label[!is.na(genes_to_label)] |
| rank <- list() | ||
| plot_list <- list() | ||
|
|
||
| for(i in 1:length(lfc.col)){ |
There was a problem hiding this comment.
The loop for (i in 1:length(lfc.col)) behaves incorrectly when lfc.col is length 0 (1:0), and there’s no validation that sig.col and lfc.col have the same length before indexing both. Prefer seq_along(lfc.col) and add an early argument check that length(sig.col) == length(lfc.col) (and that length > 0).
| for(i in 1:length(lfc.col)){ | |
| # Validate lfc.col and sig.col arguments | |
| if (missing(lfc.col) || missing(sig.col)) { | |
| stop("Both 'lfc.col' and 'sig.col' must be provided.") | |
| } | |
| if (length(lfc.col) == 0L || length(sig.col) == 0L) { | |
| stop("'lfc.col' and 'sig.col' must have length greater than 0.") | |
| } | |
| if (length(lfc.col) != length(sig.col)) { | |
| stop("'lfc.col' and 'sig.col' must have the same length.") | |
| } | |
| for (i in seq_along(lfc.col)) { |
| #' @import ggplot2 | ||
| #' @import dplyr | ||
| #' @import tidyr | ||
| #' @importFrom stringr str_split | ||
| #' @importFrom ggrepel geom_text_repel | ||
| #' @importFrom EnhancedVolcano EnhancedVolcano | ||
| #' @importFrom plotly ggplotly | ||
| #' @importFrom grid grid.newpage | ||
| #' @export |
There was a problem hiding this comment.
This function imports and calls EnhancedVolcano::EnhancedVolcano, but the package is not listed in DESCRIPTION (so R CMD check will fail once NAMESPACE is regenerated). Add EnhancedVolcano to Imports (or Suggests + guard usage), and ensure NAMESPACE/man/ are regenerated to include the new @export and imports.
| # Extract the data used for plotting | ||
| plot_data <- ggplot_build(p_empty)$data[[1]] | ||
|
|
||
| pxx <- p_empty + | ||
| xlab("Fold Change") + | ||
| ylab("Significance") + | ||
| theme_minimal() + | ||
| geom_point(aes( | ||
| text = paste("Gene:", df[[label.col]], | ||
| "<br>Log2FC:", df[[lfc_name]], | ||
| "<br>P-value:", df[[sig_name]]), | ||
| colour = as.character(plot_data$colour), | ||
| fill = as.character(plot_data$colour) | ||
| ), | ||
| shape = 21, | ||
| size = 2, | ||
| stroke = 0.1) + | ||
| scale_fill_identity() |
There was a problem hiding this comment.
The interactive plot construction adds a new geom_point() layer on top of p_empty, so points are drawn twice (once from EnhancedVolcano, once from the added layer) and pointSize/other aesthetics may diverge between layers. Consider attaching the text aesthetic to the existing point layer (or modifying/replacing that layer) so the interactive plot doesn’t duplicate geometry and avoids the extra ggplot_build() work.
| # Extract the data used for plotting | |
| plot_data <- ggplot_build(p_empty)$data[[1]] | |
| pxx <- p_empty + | |
| xlab("Fold Change") + | |
| ylab("Significance") + | |
| theme_minimal() + | |
| geom_point(aes( | |
| text = paste("Gene:", df[[label.col]], | |
| "<br>Log2FC:", df[[lfc_name]], | |
| "<br>P-value:", df[[sig_name]]), | |
| colour = as.character(plot_data$colour), | |
| fill = as.character(plot_data$colour) | |
| ), | |
| shape = 21, | |
| size = 2, | |
| stroke = 0.1) + | |
| scale_fill_identity() | |
| # Add hover text aesthetic directly to the existing plot | |
| pxx <- p_empty + | |
| xlab("Fold Change") + | |
| ylab("Significance") + | |
| theme_minimal() + | |
| aes( | |
| text = paste( | |
| "Gene:", df[[label.col]], | |
| "<br>Log2FC:", df[[lfc_name]], | |
| "<br>P-value:", df[[sig_name]] | |
| ) | |
| ) |
| # Tests for Volcano_Plot function | ||
|
|
||
| library(testthat) | ||
| library(dplyr) |
There was a problem hiding this comment.
library(dplyr) is not used in this test file. Dropping unused test dependencies helps keep the test environment minimal and avoids masking functions during tests.
| library(dplyr) |
A quick description: Enhanced Volcano is now a function into the R folder and created a unit test with synthetic data