Welcome to edge-gwas Documentation
EDGE-GWAS (Elastic Data-Driven Encoding GWAS) identifies nonadditive SNP effects using flexible genetic encoding, rather than assuming additive inheritance.
Warning
⚠️ Current Version 0.1.1 - Under Public Testing
Recommended to use v0.1.1 - more stable and more functions.
Note
The original EDGE implementation (v0.0.0) is available at https://github.com/nicenzhou/EDGE. Version 0.0.0 is no longer maintained; users are encouraged to migrate to v0.1.0+.
Key Features
Two-stage analysis: Calculate alpha on training data, apply to test data
Flexible encoding: Detects under-recessive, recessive, additive, dominant, and over-dominant effects
Multiple outcomes: Binary and quantitative traits
Multiple genotype formats: PLINK binary (.bed/.bim/.fam), PLINK2 (.pgen/.pvar/.psam), BGEN, and VCF/VCF.GZ files
Visualization: Manhattan, QQ, and alpha distribution plots
Quick Start
from edge_gwas import EDGEAnalysis
from edge_gwas.utils import load_plink_data, prepare_phenotype_data
# Load data
geno, info = load_plink_data('data.bed', 'data.bim', 'data.fam')
pheno = prepare_phenotype_data('pheno.txt', 'disease', ['age', 'sex'])
# Run analysis
edge = EDGEAnalysis(outcome_type='binary')
alpha_df, gwas_df = edge.run_full_analysis(
train_geno, train_pheno, test_geno, test_pheno,
outcome='disease', covariates=['age', 'sex']
)
Support
Issues: GitHub Issues
Code Questions: jyzhou@stanford.edu
Research Questions: molly.hall@pennmedicine.upenn.edu
Contents:
- Installation Guide
- System Requirements
- Dependencies
- Quick Installation
- Installation from GitHub
- Installation from Source
- Virtual Environment Setup
- External Tools Installation (NEW in v0.1.1)
- Verify Installation
- Supported File Formats
- Upgrading from v0.1.0
- Troubleshooting
- Platform-Specific Notes
- Development Installation
- Docker Installation (Optional)
- Singularity/Apptainer (for HPC)
- Testing Your Installation
- Installation Checklist
- Configuration
- Uninstallation
- Getting Help
- Next Steps
- See Also
- Quick Start Guide
- User Guide
- API Reference
- Core Module
- EDGEAnalysis Class
- Utilities Module
- load_plink_data()
- load_pgen_data()
- load_vcf_data()
- load_bgen_data()
- prepare_phenotype_data()
- filter_genotype_data()
- validate_and_align_data()
- validate_genotype_df()
- validate_and_fix_encoding()
- Visualization Module
- I/O Handlers Module
- Data Structures
- Constants and Defaults
- Exceptions
- Version Information
- Command-Line Tools (NEW in v0.1.1)
- Function Reference Summary
- Complete Workflow Example
- Migration Guide (v0.1.0 → v0.1.1)
- Function Index
- Quick Function Finder
- See Also
- Statistical Model
- Overview
- Regression Model
- Encoding Parameter
- Interpretation of Alpha
- Two-Stage Analysis
- Outcome Types
- Optimization Methods
- Outcome Transformations
- Statistical Testing
- Quality Control Considerations
- Sample Size Considerations
- Comparison with Other Methods
- Practical Considerations
- Implementation Details
- Extensions and Future Directions
- Limitations
- Validation Studies
- Mathematical Proofs
- Glossary of Terms
- Best Practices Summary
- See Also
- Example Workflows
- Example 1: Basic Binary Outcome Analysis with PCA
- Example 2: Quantitative Trait with Outcome Transformation
- Example 3: Analysis with GRM for Related Samples
- Example 4: Multi-Format Data Loading
- Example 5: Comprehensive Quality Control Pipeline
- Example 6: Comparing EDGE with Additive GWAS
- Example 7: Cross-Validation for Model Stability
- Example 8: Complete Analysis with All v0.1.1 Features
- Example 9: Multi-Chromosome Genome-Wide Analysis
- Example 10: Batch Processing for Very Large Datasets
- Example 11: Using Pre-Calculated Alpha Values
- Example 12: Detailed Alpha Interpretation and Reporting
- Tips and Best Practices
- See Also
- Visualization Guide
- Changelog
- Citation
- Troubleshooting Guide
- Common Issues and Solutions
- Error 1: “ValueError: No common samples found between analysis data and GRM”
- Error 2: “LinAlgError: Matrix is singular”
- Error 3: “ConvergenceWarning: Maximum iterations reached”
- Error 4: “ValueError: Log transformation requires all positive values”
- Error 5: “KeyError: ‘variant_id’ not found”
- Error 6: “MemoryError: Unable to allocate array”
- Error 7: “RuntimeWarning: invalid value encountered in divide”
- Speed Optimization
- Memory Optimization
- Accuracy Optimization
- Setting Random Seeds
- Version Control
- Parameter Documentation
- Appendix A: Troubleshooting Decision Tree
- Frequently Asked Questions (FAQ)
- Advanced Topics for Further Updates
See Also
Documentation:
Documentation Home - Home
Installation Guide - Installation instructions and requirements
Quick Start Guide - Getting started guide with simple examples
User Guide - Comprehensive user guide and tutorials
API Reference - Complete API documentation
Example Workflows - Example analyses and case studies
Visualization Guide - Plotting and visualization guide
Statistical Model - Statistical methods and mathematical background
Troubleshooting Guide - Troubleshooting guide and common issues
Frequently Asked Questions (FAQ) - Frequently asked questions
Citation - How to cite EDGE in publications
Changelog - Version history and release notes
Advanced Topics for Further Updates - Planned features and roadmap
—
Last updated: 2025-12-25 for edge-gwas v0.1.1
For questions or issues, visit: https://github.com/nicenzhou/edge-gwas/issues