Releases: broadinstitute/gnomad_methods
Releases · broadinstitute/gnomad_methods
v0.8.1
What's Changed
Bug fixes
- Fix
annotate_with_ht
to only use a semi-join whenfilter_missing
is True by @jkgoodrich in #709 - Fix bug in
process_consequences
that was introduced when adding support for VEP without polyphen by @jkgoodrich in #710
New Features
- Add explode_downsamplings function by @klaricch in #694
- Update VEP csqs in impact categories to match VEP by @mike-w-wilson in #703
- Add
get_summary_stats_variant_filter_expr
andget_summary_stats_csq_filter_expr
to build filtering expressions for summary stats by @jkgoodrich in #701 - Add
filter_vep_transcript_csqs_expr
, a version offilter_vep_transcript_csqs
that takes and returns an ArrayExpression by @jkgoodrich in #713 - Add create_vds function that only supports creating from gvcfs by @mike-w-wilson in #716
- Add functions
fill_missing_key_combinations
andmissing_struct_expr
by @jkgoodrich in #718
Other Changes
- Add a space in joint filter info dict by @KoalaQin in #698
- Change the number of values for stat_union_gen_ancs to unknown by @KoalaQin in #699
- Bump idna from 3.4 to 3.7 in /docs by @dependabot in #692
- Bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dependabot in #700
- Bump requests from 2.31.0 to 2.32.2 in /docs by @dependabot in #708
- Update setup.py for v0.8.1 by @mike-w-wilson in #720
Full Changelog: v0.8.0...v0.8.1
v0.8.0
What's Changed
Breaking Changes
- Add mid to FAF and grpmax calcs by @mike-w-wilson in #658
- Update POPS constant to contain a dictionary of both exomes and genomes by @klaricch in #690
Bug fixes
- Account for missingness in int64 to int32 VCF type conversion by @mike-w-wilson in #668
- Fix
generic_field_check
in validity_checks.py print of failed checks by @jkgoodrich in #693
New Features
- Add RSEM summary function by @jkgoodrich in #647
- Function to get expression proportion by @KoalaQin in #649
- Add GTEx import resources by @KoalaQin in #646
- Add function
agg_by_strata
, which is a generalized version of thecompute_freq_by_strata
by @jkgoodrich in #659 - Clean up
compute_coverage_stats
, change it to useagg_by_strata
and have an optionalgroup_membership_ht
parameter by @jkgoodrich in #660 - Add
densify_all_reference_sites
to perform a densify at all sites in a reference HT by @jkgoodrich in #661 - Add
compute_stats_per_ref_site
to generalize computation of aggregate stats at all sites in a reference Table by @jkgoodrich in #662 - Functions to process, filter, annotate and aggregate variants by transcript expression (get the pext scores per variant) by @KoalaQin in #651
- Add gnomAD all sites allele number resource by @jkgoodrich in #669
- Add
read_args
parameter to the read functions of Resource Classes by @jkgoodrich in #672 - Add
get_is_haploid_expr
,get_dp_gq_adj_expr
,get_adj_het_ab_expr
, and some helpful parameters toagg_by_strata
andcompute_stats_per_ref_site
by @jkgoodrich in #673 - Add
sex_karyotype_field
as an argument tocompute_stats_per_ref_site
to include sex ploidy adjustment after densify by @jkgoodrich in #677 - Add function for adding gencode annotation by @klaricch in #681
- Update vcf.py to work on joint freq release Table by @KoalaQin in #688
- Change
get_downsampling_freq_indices
anddownsampling_counts_expr
to support both 'pop' and 'gen_anc' keys in metadata by @jkgoodrich in #633
Other Changes
- Suggestions to get_expression_proportion PR by @jkgoodrich in #653
- Suggestions to tx_annotate_mt PR by @jkgoodrich in #654
- Suggestions to tx_annotate_mt by @jkgoodrich in #655
- Rearrange and enforce adj_group and group_membership being on the sam… by @mike-w-wilson in #666
- Bump jinja2 from 3.1.2 to 3.1.3 in /docs by @dependabot in #665
- Add v4 to genome release constants by @klaricch in #671
- Pull ploidy optimization into a function by @mike-w-wilson in #676
- Fix sex ploidy adjustment so XX samples still get set to missing on chrY by @jkgoodrich in #678
- Minor GKS formatting changes and addition of gnomAD flags to annotation by @theferrit32 in #617
- Add option to exclude polyphen from process consequences by @KoalaQin in #685
- Bump black from 23.7.0 to 24.3.0 by @dependabot in #686
- Add Stat Union to the info dict by @KoalaQin in #695
Full Changelog: v0.7.1...v0.8.0
v0.7.1
This release uses Hail 0.2.122
What's Changed
Bug fixes
- Drop async file exists function by @mike-w-wilson in #643
Full Changelog: v0.7.0...v0.7.1
v0.7.0
This release contained a function that required Hail >= 0.2.126. Please use a newer release
What's Changed
Breaking Changes
- Update some gnomAD resources from lists to version dictionaries by @mike-w-wilson in #522
- Modifications to
annotate_freq
to improve memory use by @jkgoodrich in #577
Bug fixes
- Add
get_slope_int_relationship_expr
to get relationship between a pair of samples given slope and intercepts of lines to use as cutoffs. by @jkgoodrich in #511 - Fix access to version's SUBSETS and POPS within repo by @mike-w-wilson in #529
- Small changes to bokeh module imports in
utils.plotting
that were failing with Hail update by @jkgoodrich in #540 - Fix
filter_x_nonpar
andfilter_y_nonpar
to use reference genome by @jkgoodrich in #553 - Fix callstats order in
merge_freq_arrays
by @jkgoodrich in #574 - Avoid DeprecationWarnings from superseded hail function and import [minor] by @jmarshall in #576
- Fix
merge_freq_arrays
for cases with more than two arrays by @jkgoodrich in #587 - Fix negative values issue with 'diff' by @KoalaQin in #590
- Fix ValueError for
count_arrays
inmerge_freq_arrays
function by @KoalaQin in #591 - Modify
apply_rf_model
to usevector_to_array
frompyspark.ml.functions
instead ofudf
by @matren395 in #592 - Fix to drop 'AS_SB' after converting to 'AS_SB_TABLE' in
get_as_info_expr
by @jkgoodrich in #602 - Fix to GKS Seqloc
new_temp_file
by @matren395 in #612 - Move ga4gh imports to their functions by @mike-w-wilson in #626
New Features
- Add generic constraint function
annotate_constraint_groupings()
by @averywpx in #497 - Add an option for samples that must be kept to
compute_related_samples_to_drop
by @jkgoodrich in #506 - Add
determine_nearest_neighbors
to find nearest neighbors for each sample. Modifycompute_stratified_metrics_filter
to work with acomparison_sample_expr
that specifies what samples to compare to for filtering, this works well with the output ofdetermine_nearest_neighbor
. by @jkgoodrich in #509 - Add utility function to repartition HTs prior to join by @ch-kr in #512
- Add VEP 105 init script and its docker image by @KoalaQin in #516
- Add VEP 105 GRCh38 context HT resource by @jkgoodrich in #524
- Add additional groupings to optional stratified allele frequencies by @KoalaQin in #523
- Add 'strata' and 'qc_metrics' as globals on the table returned by
compute_stratified_metrics_filter
by @jkgoodrich in #521 - Modify
annotate_mutation_type
to take optional context length as a parameter. by @jkgoodrich in #530 - Add generic constraint functions:
oe_aggregation_expr()
,compute_pli()
,oe_confidence_interval()
,calculate_raw_z_score()
,calculate_raw_z_score_sd()
by @averywpx in #505 - Add dbSNP b156 to resources for v4 by @KoalaQin in #525
- Add
pab_max_expr
function and modifydefault_compute_info
to add 'AS_pab_max' annotation by @jkgoodrich in #531 - Add generic constraint functions:
get_downsamplings()
,remove_coverage_outliers()
, andfilter_for_mu()
by @averywpx in #507 - Add
ac_filter_groups
todefault_compute_info
allowing additional allele count groupings by @jkgoodrich in #534 - Add global annotations for 'vep_version', 'vep_help', and 'vep_config ' to the returned Table in
vep_or_lookup_vep
by @jkgoodrich in #536 - Add
annotate_allele_info
function toutils.annotations
by @jkgoodrich in #535 - Add validity check code of VEP annotations in protein-coding genes by @KoalaQin in #548
- Merge freq array function and new frequency dictionary builder by @mike-w-wilson in #551
- Add GRCh38 methylation sites resource by @jkgoodrich in #552
- Modify
comparison_sample_expr
parameter ofcompute_stratified_metrics_filter
to also accept a BooleanExpression by @jkgoodrich in #557 - Add parameters
apply_model_func
andconvert_model_func
toassign_population_pcs
so it has the ability to work with other models types by @jkgoodrich in #558 - Add
sample_list_stratification
option tocreate_fake_pedigree
function by @jkgoodrich in #564 - Modify
default_compute_info
with the option to use theAS_
annotations in gvcf_info for allele specific aggregations by @jkgoodrich in #560 - Modify
annotate_adj
to support LGT and LAD by @jkgoodrich in #567 - Function to annotate downsamplings onto HT/MT by @mike-w-wilson in #570
- Add function to merge histograms with the same bin_edges by @mike-w-wilson in #572
- Add option to also merge an array of counts/ints in the freq array merge by @mike-w-wilson in #565
- Update
annotate_freq
andqual_hists
, addsplit_vds
andcompute_freq_by_strata
by @mike-w-wilson in #571 - Add function
update_structured_annotations
to update structured annotations on a Table by @KoalaQin in #580 - Make naive_coalesce optional in
default_compute_info
by @jkgoodrich in #584 - Add function to remove items from freq and freq_meta by @KoalaQin in #582
- Add a
select_fields
option tocompute_freq_by_strata
by @jkgoodrich in #595 - Modify
split_info_annotation
to allow for splitting an info expression that doesn't includeAS_SB_TABLE
by @jkgoodrich in #594 - Update to allow for grouping and filtering by MANE transcripts by @klaricch in #605
- Add gnomad_gks() and get_gks() for extracting gks information for a specified variant by @matren395 in #596
- Add aggregations to variant QC evaluation for additional plots by @jkgoodrich in #609
- Add function to get max FAF from
faf_expr
by @KoalaQin in #608 - Add optional stratification parameter to coverage by @jkgoodrich in #615
- Add methylation resource for chrX by @klaricch in #622
- Add pop_label option to
pop_max_expr
,faf_expr
, andgen_anc_faf_max_expr
by @jkgoodrich in #623 - Add
apply_keep_to_only_items_in_filter
option tofilter_arrays_by_meta
by @jkgoodrich in #624 - Add pprint globals and a global/row length comparison, updates monoallelic expr in validity checks by @mike-w-wilson in #630
- Add MANE Select filtering option to
get_summary_counts
by @jkgoodrich in #634 - Add optional parameters to
set_female_y_metrics_to_na_expr
to use other frequency fields by @jkgoodrich in #635 - Update resource paths by @klaricch in #642
Other Changes
- Update doc requirements.doc.txt by @jkgoodrich in #520
- Bump requests from 2.28.2 to 2.31.0 in /docs by @dependabot in #543
- Add VEP 105 CSQ FIELDs by @KoalaQin in #546
- Update python 3.8 -> 3.11 by @jkgoodrich in #578
- Add ability ...
v0.6.4
What's Changed
This release uses Hail 0.2.105
Bug fixes
- Fix
assign_population_pcs
error when parameterpc_cols
is a Hail ArrayExpression by @jkgoodrich in #503
Other Changes
- Modifying
assign_population_pcs
to be more flexible by accepting an array expression in 'pc_cols' and adding a 'pc_expr' parameter instead of always using 'scores' by @jkgoodrich in #500 - add
.he
to file extensions list infile_exists()
by @averywpx in #501 - add generic constraint functions:
build_models()
,build_plateau_models_pop()
,build_plateau_models_total()
,build_coverage_model()
,get_all_pop_lengths()
by @averywpx in #485
Full Changelog: v0.6.3...v0.6.4
v0.6.3
What's Changed
This release uses Hail 0.2.104
Breaking Changes
- Change type of "pc_cols" param in ancestry function from hl.expr.ArrayExpression to List[int] to help track PCs that were used in RF model by @klaricch in #448
- Add additional_samples_to_drop option to
run_pca_with_relateds
by @klaricch in #489
Bug fixes
- Fix to only add the
error_rate
annotation iffit
is not supplied toassign_population_pcs
by @klaricch in #453 - Modify
merge_sample_qc_expr
to work with the additional VDS sample QC metrics: n_singleton_ti, n_singleton_tv, and r_ti_tv_singleton by @jkgoodrich in #454 - Fix
vep_or_lookup_vep
to dropvep_proc_id
if it exists by @konradjk in #439 - Fix to paths for VEP 101 resources in init script by @jkgoodrich in #488
- Changed tqdm to SimpleRichProgressBar in file_utils by @ch-kr in #495
New Features
- Add an
n_pcs
option torun_platform_pca
by @jkgoodrich in #468 - Add n_partitions option to get_qc_mt before LD pruning by @klaricch in #472
- Add block_size option to get_qc_mt for LD pruning by @klaricch in #473
- Add
gaussian_mixture_model_karyotype_assignment
function to assign sex karyotype using Gaussian mixture models by @jkgoodrich in #478 - Add
variants_filter_lcr
,variants_filter_segdup
andvariants_snv_only
options toannotate_sex
to filter variants prior to variant only ploidy imputation by @jkgoodrich in #479 - Add an option
compute_x_frac_variants_hom_alt
toannotate_sex
that computes the fraction of variants on chromosome X that are homozygous alternate per sample by @jkgoodrich in #480 - Add generic constraint functions - annotate_mutation_type(), trimer_from_heptamer(), collapse_strand(), add_most_severe_csq_to_tc_within_vep_root() by @averywpx in #474
- Add more file types to
file_exists
for checking '_SUCCESS' by @jkgoodrich in #486 - Add
coverage_mt
option toannotate_sex
which takes an optional precomputed coverage MT to use for ploidy imputation instead of remaking it. by @jkgoodrich in #484 - Add function
get_chr_x_hom_alt_cutoffs
, add arguments toinfer_sex_karyotype
andget_sex_expr
to use the new function and it's output. by @jkgoodrich in #492 - Add
bi_allelic_only
andsnv_only
options toget_qc_mt
by @jkgoodrich in #471 - Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions(), filter_x_nonpar(), and filter_y_nonpar() by @averywpx in #481
Other Changes
- Handle tags created through GitHub in publish release workflow by @nawatts in #451
- Change branch name in CI workflow configuration by @nawatts in #452
New Contributors
Full Changelog: v0.6.2...v0.6.3
v0.6.2
What's Changed
New Features
- Use Google Cloud Public Datasets as default source for public resources by @nawatts in #431
- Add options for reading public resources from Registry of Open Data on AWS and Azure Open Datasets by @nawatts in #430
- Allow setting the default source for public resources with an environment variable by @nawatts in #435
- Use hl.utils.guess_cloud_spark_provider to set default resources source by @nawatts in #436
- add checkpoint option to get_qc_mt by @klaricch in #437
- Modification to the
annotate_sex
pipeline to allow sex ploidy estimation using only variants instead of ref blocks by @jkgoodrich in #445
Other Changes
- Document selecting resource source by @nawatts in #408
- Add VEP 101 init by @jkgoodrich in #411
- Small fix to docstrings for make_freq_index_dict() by @gtiao in #412
- Tiny fix to assign_population_pcs use of known label by @jkgoodrich in #413
- Added option to get file stats for requester-pays files by @ch-kr in #414
- fix to faf description text by @jkgoodrich in #415
- Update current gnomAD GRCh38 genome release v3.1.2 by @jkgoodrich in #416
- Update to new RouterAsyncFS interface in Hail 0.2.79 by @nawatts in #425
- add vds resource by @klaricch in #423
- Modified subset_samples_and_variants() by @wlu04 in #421
- Modified compute_stratified_sample_qc() by @wlu04 in #420
- Modified annotate_sex() by @wlu04 in #427
New Contributors
Full Changelog: v0.6.0...v0.6.2
v0.6.1
v0.6.0
Released September 3rd, 2021
All resources have been moved to a requester pays bucket.
Fixed
- Fix
annotation_type_is_numeric
andannotation_type_in_vcf_info
(#379)
Changed
- VersionedResource objects are no longer subclasses of BaseResource (#359)
- gnomAD resources can now be imported from different sources (#373)
- Replaced
ht_to_vcf_mt
withadjust_vcf_incompatible_types
which maintains all functionality except turning the ht into a mt because it is no longer needed for use of the Hail moduleexport_vcf
(#365) - Modified
SEXES
in utils/vcf to be 'XX' and 'XY' instead of 'female' and 'male' (#381) - Changed module
sanity_checks
tovalidity_checks
, modified functionsgeneric_field_check
,make_filters_expr_dict
(previouslymake_filters_sanity_check_expr
), andmake_group_sum_expr_dict
(previouslysample_sum_check
) (#395)
Added
- Added function
region_flag_expr
to flag problematic regions (#349) - Added function
missing_callstats_expr
to create a Hail Struct with missing values that is inserted into frequency annotation arrays when data is missing (#349) - Added function
set_female_y_metrics_to_na_expr
to set Y-variant frequency callstats for female-specific metrics to missing (#349) - Added function
make_faf_index_dict
to create a look-up Dictionary for entries contained in the filter allele frequency annotation array (#349) - Added function
make_freq_index_dict
to create a look-up Dictionary for entries contained in the frequency annotation array (#349) - Added function
remove_fields_from_constant
to remove fields from a list and notify which requested fields to remove were missing (#381) - Added function
create_label_groups
to generate a list of label group dictionaries needed to populate the info dictionary for vcf export (#381) - Added function
build_vcf_export_reference
to create a subset reference based on an existing reference genome (#381) - Added function
rekey_new_reference
to re-key a Table or MatrixTable with a new reference genome (#381) - Added function
parallel_file_exists
to check whether a large number of files exist (#394) - Added functions
summarize_variant_filters
,generic_field_check_loop
,compare_subset_freqs
,sum_group_callstats
,summarize_variants
,check_raw_and_adj_callstats
,check_sex_chr_metrics
,compute_missingness
,vcf_field_check
, andvalidate_release_t
(#395)
v0.5.0
Released April 22nd, 2021
Fixed
- Fix for error in
generate_trio_stats_expr
that led to an incorrect untransmitted count. (#238) - Fix for error in
compute_quantile_bin
that caused incorrect binning when a single score overlapped multiple bins (#238) - Fixed
create_binned_ht
because it produced a "Cannot combine expressions from different source objects error" (#238) - Fixed handling of missing entries (not within a ref block / alt site) when computing
coverage_stats
insparse_mt.py
[#242] - Fix for error in
compute_stratified_sample_qc
wheregt_expr
caused error (#259) - Fix for error in
default_lift_data
caused by missingresults
field innew_locus
(#270) - Fix to dbSNP b154 resource (resources.grch38.reference_data) import to allow for multiple rsIDs per variant (#345)
- Fix to
set_female_metrics_to_na
to correctly update chrY metrics to be missing (#347) - Fixed available versions for gnomAD v2
coverage
andliftover
resources (#352) - Removed side effect of accessing gnomAD v2
coverage
andliftover
exome resources that would edit available versions for other resources (#352) - Use
overwrite
argument for importing a BlockMatrixResource (#342)
Changed
- Removed assumption of
snv
annotation fromcompute_quantile_bin
. (#238) - Modified
compute_binned_truth_sample_concordance
to handle additional binning for subsets of variants. (#240) - Updated liftover functions to be more generic (#246)
- Changed quality histograms to label histograms calculated on raw and not adj data (#247)
- Updated some VCF export constants (#249)
- Changed default DP threshold to 5 for hemi genotype calls in
annotate_adj
andget_adj_expr
(#252) - Updated coverage resources to version 3.0.1 [#242]
- Update to
compute_last_ref_block_end
, removing assumption that sparse MatrixTables are keyed only bylocus
by default (#279) - Update
generic_field_check
to have option to show percentage of sites that fail checks. (#284) - Modified
vep_or_lookup_vep
to support the use of different VEP versions (#282) - Modified
create_truth_sample_ht
to add adj annotation information in the returned Table if present in the supplied MatrixTables (#300)
Added
- Added constants and functions relevant to VCF export (#241)
- Add reference genome to call of
has_liftover
inget_liftover_genome
(#259) - Added fix for MQ calculation in
_get_info_agg_expr
, switchedRAW_MQ
andMQ_DP
in calculation (#262) - Add importable method for filtering clinvar to pathogenic sites (#257)
- Added common variant QC functions
get_rf_runs
andget_run_data
torandom_forest.py
(#278) - Add calculation for the strand odds ratio (SOR) to
get_site_info_expr
andget_as_info_expr
(#281) - Added VEPed context HT to resource files and included support for versioning (#282)
- Added code to generate summary statistics (total number of variants, number of LoF variants, LOFTEE summaries) (#285)
- Added additional counts to summary statistics (added autosome/sex chromosome counts, allele counts, counts for missense and synomymous variants) (#289)
- Added function,
default_generate_gene_lof_matrix
, to generate gene matrix (#290) - Added function
default_generate_gene_lof_summary
to summarize gene matrix results (#292) - Add resource for v3.1.1 release (#364)
Removed
- Removed
rep_on_read
; this function is no longer necessary, as MatrixTables/Tables can be repartitioned on read with_n_partitions
added by this hail update (#283) - Removed
compute_quantile_bin
and addedcompute_ranked_bin
as an alternative that provides more even binning. This is now used bycreate_binned_ht
instead. (#288) - Removed
prefix
parameter from tomake_combo_header_text
, as this was only used to check if samples were from gnomAD (#348)