Changelog
All notable changes to edge-gwas will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Version 0.1.1 (2025-12-25) - Current Release
Status: ⚠️ Public Testing Phase
Added
Population Structure Control:
load_grm_gcta()- Load GRM calculated by GCTAcalculate_grm_gcta()- Calculate GRM using GCTAGRM integration in
calculate_alpha()andapply_alpha()for mixed model analysisidentify_related_samples()- Identify related sample pairs from GRMfilter_related_samples()- Remove related samples
Principal Component Analysis:
calculate_pca_plink()- PCA using PLINK2 with LD pruningcalculate_pca_pcair()- PC-AiR for related samplescalculate_pca_sklearn()- Basic PCA using scikit-learnattach_pcs_to_phenotype()- Merge PCs with phenotype dataget_pc_covariate_list()- Helper for generating PC covariate names
Outcome Transformations:
outcome_transformparameter inEDGEAnalysisfor continuous outcomesSupport for ‘log’, ‘log10’, ‘inverse_normal’, and ‘rank_inverse_normal’ transformations
Additional Data Formats:
load_pgen_data()- Load PLINK2 .pgen/.pvar/.psam filesload_bgen_data()- Load BGEN format with dosagesload_vcf_data()- Load VCF/VCF.gz files
Tool Installation:
edge-gwas-install-tools- Interactive installer for PLINK2, GCTA, and R packagesedge-gwas-check-tools- Verify external tool installationsSupport for Linux and macOS (including ARM64)
Quality Control:
filter_samples_by_call_rate()- Filter samples by genotype call ratecalculate_hwe_pvalues()- Calculate Hardy-Weinberg Equilibrium p-valuesfilter_variants_by_hwe()- Filter variants by HWE p-valuecheck_case_control_balance()- Check case/control ratio
Analysis Methods:
cross_validated_edge_analysis()- K-fold cross-validation for EDGEadditive_gwas()- Standard additive model for comparison
Changed
Data Handling:
Breaking Change: Replaced Koalas DataFrames with pandas DataFrames throughout codebase
Updated
load_participant_data()to use pandas instead ofparticipant.retrieve_fields().to_koalas()All PCA functions now return DataFrames with ‘IID’ column and IID as index
Consistent sample ID handling (all IDs converted to strings for matching)
Core Analysis:
EDGEAnalysisclass now supports GRM as optional input in all methodscalculate_alpha()andapply_alpha()acceptgrm_matrixandgrm_sample_idsparametersLinear mixed model implementation for continuous outcomes with GRM
Logistic mixed model for binary outcomes with GRM
Dependencies:
Removed dependency on
databricks.koalasAll operations now use native pandas (>= 1.2.0)
Added optional dependencies:
pgenlib,bgen-reader,cyvcf2External tool requirements: PLINK2, GCTA, R (all optional)
Migration from v0.1.0
Breaking Changes:
Code using
.to_koalas()must be updated to use pandas DataFramesRemove any
import databricks.koalasstatementsPCA functions now return DataFrames indexed by IID
Migration Example:
# Old code (v0.1.0)
import databricks.koalas as ks
participant_df = participant.retrieve_fields(fields=fields, engine=dxdata.connect()).to_koalas()
# New code (v0.1.1)
import pandas as pd
participant_df = participant.retrieve_fields(fields=fields, engine=dxdata.connect())
# New features in v0.1.1
from edge_gwas.utils import calculate_grm_gcta, load_grm_gcta, calculate_pca_plink
# Calculate GRM for population structure control
grm_prefix = calculate_grm_gcta('genotypes', maf_threshold=0.01)
grm_matrix, grm_ids = load_grm_gcta(grm_prefix)
# Calculate PCA
pca_df = calculate_pca_plink('genotypes', n_pcs=10)
# Use outcome transformation and GRM in analysis
edge = EDGEAnalysis(outcome_type='continuous', outcome_transform='rank_inverse_normal')
alpha_df, gwas_df = edge.run_full_analysis(
train_g, train_p, test_g, test_p,
outcome='trait', covariates=['age', 'sex', 'PC1', 'PC2', 'PC3'],
grm_matrix=grm_matrix, grm_sample_ids=grm_ids
)
Bug Fixes
Resolved compatibility issues with recent pandas versions
Improved memory efficiency in data loading operations
Fixed GRM sample alignment when samples differ between genotype and GRM
Fixed ID type inconsistencies (all IDs now converted to strings)
Improved convergence in mixed model fitting
Version 0.1.0 (2025-12-24)
Status: Superseded by v0.1.1
This is the first packaged release of EDGE-GWAS, transitioning from standalone scripts (v0.0.0) to a proper Python package.
Added
Core Functionality:
EDGEAnalysisclass for two-stage GWAS analysiscalculate_alpha()- Calculate encoding parameters from training dataapply_alpha()- Apply encoding to test data for GWASrun_full_analysis()- Complete end-to-end workflowSupport for binary outcomes (logistic regression)
Support for quantitative outcomes (linear regression)
Data Handling:
load_plink_data()- Load PLINK binary format (.bed/.bim/.fam)prepare_phenotype_data()- Load and prepare phenotype filesstratified_train_test_split()- Train/test data splitting with stratificationfilter_variants_by_maf()- Minor allele frequency filteringfilter_variants_by_missing()- Missingness filtering
Statistical Functions:
calculate_genomic_inflation()- Calculate genomic inflation factor (λ)Two-stage analysis framework (calculate alpha → apply alpha)
Parallel processing support with configurable CPU cores
Convergence monitoring and skipped SNP tracking
Visualization:
manhattan_plot()- Create Manhattan plots with customizationqq_plot()- Create QQ plots with lambda calculationplot_alpha_distribution()- Visualize alpha value distribution
Documentation:
Complete Sphinx documentation with Read the Docs integration
Installation guide, quick start guide, and API reference
Dependencies
numpy >= 1.19.0
pandas >= 1.2.0
scipy >= 1.6.0
statsmodels >= 0.12.0
scikit-learn >= 0.24.0
matplotlib >= 3.3.0
pandas-plink >= 2.0.0
databricks-koalas (deprecated in v0.1.1)
—
Version 0.0.0 (2024-04-02) - Legacy
Status: 🔒 Deprecated - No Maintenance
Original EDGE implementation as standalone Python scripts.
Warning
Version 0.0.0 is deprecated and no longer maintained. Users should migrate to v0.1.1 or later.
—
Version History Summary
Version |
Date |
Status |
Key Features |
|---|---|---|---|
0.1.1 |
2025-12-25 |
Current |
GRM support, PCA methods, outcome transformations, pandas migration |
0.1.0 |
2025-12-24 |
Superseded |
First packaged release, core EDGE functionality |
0.0.0 |
2024-04-02 |
Deprecated |
Original standalone scripts |
—
See Also
Documentation:
Documentation Home - Home
Installation Guide - Installation instructions and requirements
Quick Start Guide - Getting started guide with simple examples
User Guide - Comprehensive user guide and tutorials
API Reference - Complete API documentation
Example Workflows - Example analyses and case studies
Visualization Guide - Plotting and visualization guide
Statistical Model - Statistical methods and mathematical background
Troubleshooting Guide - Troubleshooting guide and common issues
Frequently Asked Questions (FAQ) - Frequently asked questions
Citation - How to cite EDGE in publications
Changelog - Version history and release notes
Advanced Topics for Further Updates - Planned features and roadmap
—
Last updated: 2025-12-25 for edge-gwas v0.1.1
For questions or issues, visit: https://github.com/nicenzhou/edge-gwas/issues