Bioinformatics Guide

Setting Up R & RStudio for Bioinformatics Analysis

A complete walkthrough for installing R, RStudio, and essential Bioconductor packages on macOS โ€” from zero to a fully configured bioinformatics workstation.

๐Ÿ“… March 2026 โฑ 10 min read ๐Ÿงฌ ChIP-seq & Genomics

Prerequisites

What you need before getting started

This guide is written for macOS (both Apple Silicon M1/M2/M3/M4 and Intel Macs). The same packages work on Linux and Windows, but the installation commands will differ slightly.

Before we begin, make sure you have:

Apple Silicon users: R 4.5+ has native ARM64 support, so everything runs natively on M-series chips โ€” no Rosetta needed.

Removing Old R & RStudio Installations

Start with a clean slate to avoid version conflicts

If you have previous versions of R or RStudio installed, it's best to remove them completely before installing fresh. This avoids package conflicts and ensures a clean environment.

Remove RStudio

Terminal
# Remove RStudio application
sudo rm -rf /Applications/RStudio.app

Remove R

Terminal
# Remove R framework and binaries
sudo rm -rf /Library/Frameworks/R.framework
sudo rm -f /usr/local/bin/R /usr/local/bin/Rscript

Remove Configuration & Cache Files

Terminal
# Remove R packages, config, and history
rm -rf ~/Library/R
rm -rf ~/.R ~/.RData ~/.Rhistory ~/.Rprofile ~/.Renviron
rm -rf ~/Library/Application\ Support/RStudio
rm -rf ~/Library/Caches/org.R-project.R

Verify Clean Removal

Terminal
# All three should return "not found"
which R
which Rscript
ls /Applications/RStudio.app

Installing the Latest R

R 4.5.3 โ€” "Reassured Reassurer" (March 2026)

1

Download & Install R

Use the official CRAN installer for your Mac architecture. For Apple Silicon (M1/M2/M3/M4):

Terminal โ€” Apple Silicon
# Download R 4.5.3 for ARM64
curl -L -O https://cran.r-project.org/bin/macosx/big-sur-arm64/base/R-4.5.3-arm64.pkg

# Install
sudo installer -pkg R-4.5.3-arm64.pkg -target /

For Intel Macs, replace arm64 with x86_64 in the URL.

2

Verify Installation

Terminal
R --version | head -1
# Expected: R version 4.5.3 (2026-03-11) -- "Reassured Reassurer"

Installing the Latest RStudio

RStudio 2026.01.1 โ€” "Apple Blossom"

1

Download RStudio

Terminal
# Download RStudio Desktop
curl -L -o RStudio.dmg "https://download1.rstudio.org/electron/macos/RStudio-2026.01.1-403.dmg"

# Open the DMG
open RStudio.dmg
2

Install & Launch

Drag RStudio into the Applications folder in the window that opens. Then launch it:

Terminal
# Open RStudio
open /Applications/RStudio.app

# Clean up installer files
rm ~/R-4.5.3-arm64.pkg
rm ~/RStudio.dmg

Pro tip: You can also download RStudio from posit.co/downloads if you prefer a graphical installer.

Installing Bioconductor Packages

Essential packages for ChIP-seq and genomics analysis

Bioconductor is the primary repository for bioinformatics packages in R. First, install BiocManager, then use it to install the packages you need.

Run the following in the RStudio Console:

R Console
# Install BiocManager (the package manager for Bioconductor)
install.packages("BiocManager")

# Install core Bioconductor packages for bioinformatics
BiocManager::install(c(
  "DESeq2",                                # Differential expression analysis
  "DiffBind",                              # Differential binding analysis
  "ChIPseeker",                            # ChIP-seq peak annotation
  "ChIPQC",                                # ChIP-seq quality control
  "clusterProfiler",                       # GO & pathway enrichment
  "GenomicRanges",                         # Genomic interval operations
  "GenomicFeatures",                       # Gene model manipulation
  "rtracklayer",                           # Import/export genomic files
  "org.Hs.eg.db",                          # Human gene annotation
  "TxDb.Hsapiens.UCSC.hg38.knownGene"     # hg38 transcript database
))

This will take 15โ€“30 minutes depending on your internet speed. When prompted with "Update all/some/none?" type a for all. When asked "Install from sources?" type no for faster binary installs.

What Each Package Does

Package Source Purpose
DESeq2 Bioconductor Differential expression/binding analysis using negative binomial models
DiffBind Bioconductor Differential binding analysis for ChIP-seq peak data
ChIPseeker Bioconductor Annotate peaks to nearest genes and genomic features
ChIPQC Bioconductor Quality metrics and reporting for ChIP-seq experiments
clusterProfiler Bioconductor Gene Ontology (GO) and KEGG pathway enrichment analysis
GenomicRanges Bioconductor Represent and manipulate genomic intervals in R
GenomicFeatures Bioconductor Work with gene models and transcript annotations
rtracklayer Bioconductor Import/export BED, BigWig, GFF, and other genomic formats
org.Hs.eg.db Bioconductor Human gene annotation database (Entrez IDs, symbols, GO terms)
TxDb.Hsapiens.UCSC.hg38.knownGene Bioconductor Pre-built transcript database for the hg38 human genome

Installing CRAN Packages for Data Analysis

Essential tools for data wrangling and visualization

R Console
# Data analysis & visualization packages from CRAN
install.packages(c(
  "tidyverse",        # Data wrangling (dplyr, ggplot2, tidyr, etc.)
  "ggplot2",          # Publication-quality plots
  "pheatmap",         # Beautiful heatmaps
  "RColorBrewer",     # Color palettes for plots
  "ggrepel",          # Non-overlapping text labels in plots
  "VennDiagram",      # Venn diagram visualizations
  "openxlsx"          # Read/write Excel files
))
Package Source Purpose
tidyverse CRAN Collection of packages for data science (includes ggplot2, dplyr, tidyr)
pheatmap CRAN Create clustered heatmaps with dendrograms
RColorBrewer CRAN ColorBrewer palettes for scientific visualization
ggrepel CRAN Smart label placement in ggplot2 (avoids overlapping text)
VennDiagram CRAN Create publication-quality Venn diagrams
openxlsx CRAN Read and write Excel files without Java dependency

Verifying Your Setup

Make sure everything is working correctly

Run this verification script in RStudio to confirm all packages load successfully:

R Console
# Verification script โ€” load all key packages
library(DESeq2)
library(DiffBind)
library(ChIPseeker)
library(clusterProfiler)
library(GenomicRanges)
library(rtracklayer)
library(tidyverse)
library(pheatmap)

# Print versions
cat("โœ… R version:", R.version.string, "\n")
cat("โœ… Bioconductor:", as.character(BiocManager::version()), "\n")
cat("โœ… All packages loaded successfully!\n")

Seeing "masked" warnings? That's completely normal โ€” it just means some packages share function names. R will use the most recently loaded version by default. You can always specify the package explicitly, e.g., dplyr::filter() vs stats::filter().

Missing dependency errors? If a package fails to load due to a missing dependency (e.g., mvtnorm for DiffBind), simply install the missing package with install.packages("mvtnorm") and try again.

Recommended Bioinformatics Workflow

How it all fits together for ChIP-seq analysis

A typical ChIP-seq analysis spans both command-line tools and R. Here's how the pieces connect:

Stage Where Tools
Quality Control HPC / Terminal FastQC, MultiQC
Read Trimming HPC / Terminal fastp, Trimmomatic
Alignment HPC / Terminal Bowtie2
BAM Processing HPC / Terminal samtools, Picard
Peak Calling HPC / Terminal MACS3
Visualization HPC / Terminal deeptools
Peak Annotation RStudio ChIPseeker
Differential Binding RStudio DiffBind, DESeq2
Pathway Analysis RStudio clusterProfiler
Publication Figures RStudio ggplot2, pheatmap

Best practice: Run computationally heavy steps (alignment, peak calling) on your HPC cluster. Download the results (peak files, count matrices) to your local machine and perform downstream analysis in RStudio.

That's it โ€” your R environment is now fully configured for bioinformatics analysis.
Happy analysing! ๐Ÿงฌ