Skip to content

Releases: broadinstitute/gnomad_methods

v0.8.1

29 Jul 17:52
6411b7e
Compare
Choose a tag to compare

What's Changed

Bug fixes

  • Fix annotate_with_ht to only use a semi-join when filter_missing is True by @jkgoodrich in #709
  • Fix bug in process_consequences that was introduced when adding support for VEP without polyphen by @jkgoodrich in #710

New Features

  • Add explode_downsamplings function by @klaricch in #694
  • Update VEP csqs in impact categories to match VEP by @mike-w-wilson in #703
  • Add get_summary_stats_variant_filter_expr and get_summary_stats_csq_filter_expr to build filtering expressions for summary stats by @jkgoodrich in #701
  • Add filter_vep_transcript_csqs_expr, a version of filter_vep_transcript_csqs that takes and returns an ArrayExpression by @jkgoodrich in #713
  • Add create_vds function that only supports creating from gvcfs by @mike-w-wilson in #716
  • Add functions fill_missing_key_combinations and missing_struct_expr by @jkgoodrich in #718

Other Changes

Full Changelog: v0.8.0...v0.8.1

v0.8.0

19 Apr 14:23
Compare
Choose a tag to compare

What's Changed

Breaking Changes

Bug fixes

  • Account for missingness in int64 to int32 VCF type conversion by @mike-w-wilson in #668
  • Fix generic_field_check in validity_checks.py print of failed checks by @jkgoodrich in #693

New Features

  • Add RSEM summary function by @jkgoodrich in #647
  • Function to get expression proportion by @KoalaQin in #649
  • Add GTEx import resources by @KoalaQin in #646
  • Add function agg_by_strata, which is a generalized version of the compute_freq_by_strata by @jkgoodrich in #659
  • Clean up compute_coverage_stats, change it to use agg_by_strata and have an optional group_membership_ht parameter by @jkgoodrich in #660
  • Add densify_all_reference_sites to perform a densify at all sites in a reference HT by @jkgoodrich in #661
  • Add compute_stats_per_ref_site to generalize computation of aggregate stats at all sites in a reference Table by @jkgoodrich in #662
  • Functions to process, filter, annotate and aggregate variants by transcript expression (get the pext scores per variant) by @KoalaQin in #651
  • Add gnomAD all sites allele number resource by @jkgoodrich in #669
  • Add read_args parameter to the read functions of Resource Classes by @jkgoodrich in #672
  • Add get_is_haploid_expr, get_dp_gq_adj_expr, get_adj_het_ab_expr, and some helpful parameters to agg_by_strata and compute_stats_per_ref_site by @jkgoodrich in #673
  • Add sex_karyotype_field as an argument to compute_stats_per_ref_site to include sex ploidy adjustment after densify by @jkgoodrich in #677
  • Add function for adding gencode annotation by @klaricch in #681
  • Update vcf.py to work on joint freq release Table by @KoalaQin in #688
  • Change get_downsampling_freq_indices and downsampling_counts_expr to support both 'pop' and 'gen_anc' keys in metadata by @jkgoodrich in #633

Other Changes

Full Changelog: v0.7.1...v0.8.0

v0.7.1

31 Oct 18:53
Compare
Choose a tag to compare

This release uses Hail 0.2.122

What's Changed

Bug fixes

Full Changelog: v0.7.0...v0.7.1

v0.7.0

31 Oct 16:27
Compare
Choose a tag to compare

This release contained a function that required Hail >= 0.2.126. Please use a newer release

What's Changed

Breaking Changes

  • Update some gnomAD resources from lists to version dictionaries by @mike-w-wilson in #522
  • Modifications to annotate_freq to improve memory use by @jkgoodrich in #577

Bug fixes

  • Add get_slope_int_relationship_expr to get relationship between a pair of samples given slope and intercepts of lines to use as cutoffs. by @jkgoodrich in #511
  • Fix access to version's SUBSETS and POPS within repo by @mike-w-wilson in #529
  • Small changes to bokeh module imports in utils.plotting that were failing with Hail update by @jkgoodrich in #540
  • Fix filter_x_nonpar and filter_y_nonpar to use reference genome by @jkgoodrich in #553
  • Fix callstats order in merge_freq_arrays by @jkgoodrich in #574
  • Avoid DeprecationWarnings from superseded hail function and import [minor] by @jmarshall in #576
  • Fix merge_freq_arrays for cases with more than two arrays by @jkgoodrich in #587
  • Fix negative values issue with 'diff' by @KoalaQin in #590
  • Fix ValueError for count_arrays in merge_freq_arrays function by @KoalaQin in #591
  • Modify apply_rf_model to use vector_to_array from pyspark.ml.functions instead of udf by @matren395 in #592
  • Fix to drop 'AS_SB' after converting to 'AS_SB_TABLE' in get_as_info_expr by @jkgoodrich in #602
  • Fix to GKS Seqloc new_temp_file by @matren395 in #612
  • Move ga4gh imports to their functions by @mike-w-wilson in #626

New Features

  • Add generic constraint function annotate_constraint_groupings() by @averywpx in #497
  • Add an option for samples that must be kept to compute_related_samples_to_drop by @jkgoodrich in #506
  • Add determine_nearest_neighbors to find nearest neighbors for each sample. Modify compute_stratified_metrics_filter to work with a comparison_sample_expr that specifies what samples to compare to for filtering, this works well with the output of determine_nearest_neighbor. by @jkgoodrich in #509
  • Add utility function to repartition HTs prior to join by @ch-kr in #512
  • Add VEP 105 init script and its docker image by @KoalaQin in #516
  • Add VEP 105 GRCh38 context HT resource by @jkgoodrich in #524
  • Add additional groupings to optional stratified allele frequencies by @KoalaQin in #523
  • Add 'strata' and 'qc_metrics' as globals on the table returned by compute_stratified_metrics_filter by @jkgoodrich in #521
  • Modify annotate_mutation_type to take optional context length as a parameter. by @jkgoodrich in #530
  • Add generic constraint functions: oe_aggregation_expr(), compute_pli(), oe_confidence_interval(), calculate_raw_z_score(), calculate_raw_z_score_sd() by @averywpx in #505
  • Add dbSNP b156 to resources for v4 by @KoalaQin in #525
  • Add pab_max_expr function and modify default_compute_info to add 'AS_pab_max' annotation by @jkgoodrich in #531
  • Add generic constraint functions: get_downsamplings(), remove_coverage_outliers(), and filter_for_mu() by @averywpx in #507
  • Add ac_filter_groups to default_compute_info allowing additional allele count groupings by @jkgoodrich in #534
  • Add global annotations for 'vep_version', 'vep_help', and 'vep_config ' to the returned Table in vep_or_lookup_vep by @jkgoodrich in #536
  • Add annotate_allele_info function to utils.annotations by @jkgoodrich in #535
  • Add validity check code of VEP annotations in protein-coding genes by @KoalaQin in #548
  • Merge freq array function and new frequency dictionary builder by @mike-w-wilson in #551
  • Add GRCh38 methylation sites resource by @jkgoodrich in #552
  • Modify comparison_sample_expr parameter of compute_stratified_metrics_filter to also accept a BooleanExpression by @jkgoodrich in #557
  • Add parameters apply_model_func and convert_model_func to assign_population_pcs so it has the ability to work with other models types by @jkgoodrich in #558
  • Add sample_list_stratification option to create_fake_pedigree function by @jkgoodrich in #564
  • Modify default_compute_info with the option to use the AS_ annotations in gvcf_info for allele specific aggregations by @jkgoodrich in #560
  • Modify annotate_adj to support LGT and LAD by @jkgoodrich in #567
  • Function to annotate downsamplings onto HT/MT by @mike-w-wilson in #570
  • Add function to merge histograms with the same bin_edges by @mike-w-wilson in #572
  • Add option to also merge an array of counts/ints in the freq array merge by @mike-w-wilson in #565
  • Update annotate_freq and qual_hists, add split_vds and compute_freq_by_strata by @mike-w-wilson in #571
  • Add function update_structured_annotations to update structured annotations on a Table by @KoalaQin in #580
  • Make naive_coalesce optional in default_compute_info by @jkgoodrich in #584
  • Add function to remove items from freq and freq_meta by @KoalaQin in #582
  • Add a select_fields option to compute_freq_by_strata by @jkgoodrich in #595
  • Modify split_info_annotation to allow for splitting an info expression that doesn't include AS_SB_TABLE by @jkgoodrich in #594
  • Update to allow for grouping and filtering by MANE transcripts by @klaricch in #605
  • Add gnomad_gks() and get_gks() for extracting gks information for a specified variant by @matren395 in #596
  • Add aggregations to variant QC evaluation for additional plots by @jkgoodrich in #609
  • Add function to get max FAF from faf_expr by @KoalaQin in #608
  • Add optional stratification parameter to coverage by @jkgoodrich in #615
  • Add methylation resource for chrX by @klaricch in #622
  • Add pop_label option to pop_max_expr, faf_expr, and gen_anc_faf_max_expr by @jkgoodrich in #623
  • Add apply_keep_to_only_items_in_filter option to filter_arrays_by_meta by @jkgoodrich in #624
  • Add pprint globals and a global/row length comparison, updates monoallelic expr in validity checks by @mike-w-wilson in #630
  • Add MANE Select filtering option to get_summary_counts by @jkgoodrich in #634
  • Add optional parameters to set_female_y_metrics_to_na_expr to use other frequency fields by @jkgoodrich in #635
  • Update resource paths by @klaricch in #642

Other Changes

Read more

v0.6.4

08 Nov 15:00
608aed2
Compare
Choose a tag to compare

What's Changed

This release uses Hail 0.2.105

Bug fixes

  • Fix assign_population_pcs error when parameter pc_cols is a Hail ArrayExpression by @jkgoodrich in #503

Other Changes

  • Modifying assign_population_pcs to be more flexible by accepting an array expression in 'pc_cols' and adding a 'pc_expr' parameter instead of always using 'scores' by @jkgoodrich in #500
  • add .he to file extensions list in file_exists() by @averywpx in #501
  • add generic constraint functions: build_models(), build_plateau_models_pop(), build_plateau_models_total(), build_coverage_model(), get_all_pop_lengths() by @averywpx in #485

Full Changelog: v0.6.3...v0.6.4

v0.6.3

27 Oct 20:02
f87db40
Compare
Choose a tag to compare

What's Changed

This release uses Hail 0.2.104

Breaking Changes

  • Change type of "pc_cols" param in ancestry function from hl.expr.ArrayExpression to List[int] to help track PCs that were used in RF model by @klaricch in #448
  • Add additional_samples_to_drop option to run_pca_with_relateds by @klaricch in #489

Bug fixes

  • Fix to only add the error_rate annotation if fit is not supplied to assign_population_pcs by @klaricch in #453
  • Modify merge_sample_qc_expr to work with the additional VDS sample QC metrics: n_singleton_ti, n_singleton_tv, and r_ti_tv_singleton by @jkgoodrich in #454
  • Fix vep_or_lookup_vep to drop vep_proc_id if it exists by @konradjk in #439
  • Fix to paths for VEP 101 resources in init script by @jkgoodrich in #488
  • Changed tqdm to SimpleRichProgressBar in file_utils by @ch-kr in #495

New Features

  • Add an n_pcs option to run_platform_pca by @jkgoodrich in #468
  • Add n_partitions option to get_qc_mt before LD pruning by @klaricch in #472
  • Add block_size option to get_qc_mt for LD pruning by @klaricch in #473
  • Add gaussian_mixture_model_karyotype_assignment function to assign sex karyotype using Gaussian mixture models by @jkgoodrich in #478
  • Add variants_filter_lcr, variants_filter_segdup and variants_snv_only options to annotate_sex to filter variants prior to variant only ploidy imputation by @jkgoodrich in #479
  • Add an option compute_x_frac_variants_hom_alt to annotate_sex that computes the fraction of variants on chromosome X that are homozygous alternate per sample by @jkgoodrich in #480
  • Add generic constraint functions - annotate_mutation_type(), trimer_from_heptamer(), collapse_strand(), add_most_severe_csq_to_tc_within_vep_root() by @averywpx in #474
  • Add more file types to file_exists for checking '_SUCCESS' by @jkgoodrich in #486
  • Add coverage_mt option to annotate_sex which takes an optional precomputed coverage MT to use for ploidy imputation instead of remaking it. by @jkgoodrich in #484
  • Add function get_chr_x_hom_alt_cutoffs, add arguments to infer_sex_karyotype and get_sex_expr to use the new function and it's output. by @jkgoodrich in #492
  • Add bi_allelic_only and snv_only options to get_qc_mt by @jkgoodrich in #471
  • Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions(), filter_x_nonpar(), and filter_y_nonpar() by @averywpx in #481

Other Changes

  • Handle tags created through GitHub in publish release workflow by @nawatts in #451
  • Change branch name in CI workflow configuration by @nawatts in #452

New Contributors

Full Changelog: v0.6.2...v0.6.3

v0.6.2

10 May 18:38
ae139ce
Compare
Choose a tag to compare

What's Changed

New Features

  • Use Google Cloud Public Datasets as default source for public resources by @nawatts in #431
  • Add options for reading public resources from Registry of Open Data on AWS and Azure Open Datasets by @nawatts in #430
  • Allow setting the default source for public resources with an environment variable by @nawatts in #435
  • Use hl.utils.guess_cloud_spark_provider to set default resources source by @nawatts in #436
  • add checkpoint option to get_qc_mt by @klaricch in #437
  • Modification to the annotate_sex pipeline to allow sex ploidy estimation using only variants instead of ref blocks by @jkgoodrich in #445

Other Changes

New Contributors

Full Changelog: v0.6.0...v0.6.2

v0.6.1

06 Jan 16:52
Compare
Choose a tag to compare
  • Update for new RouterAsyncFS import/interface in recent Hail versions (55214e8)
  • Fix assign_population_pcs's use of known population label (9c8f089)

v0.6.0

06 Jan 14:58
Compare
Choose a tag to compare

Released September 3rd, 2021

All resources have been moved to a requester pays bucket.

Fixed

  • Fix annotation_type_is_numeric and annotation_type_in_vcf_info (#379)

Changed

  • VersionedResource objects are no longer subclasses of BaseResource (#359)
  • gnomAD resources can now be imported from different sources (#373)
  • Replaced ht_to_vcf_mt with adjust_vcf_incompatible_types which maintains all functionality except turning the ht into a mt because it is no longer needed for use of the Hail module export_vcf (#365)
  • Modified SEXES in utils/vcf to be 'XX' and 'XY' instead of 'female' and 'male' (#381)
  • Changed module sanity_checks to validity_checks, modified functions generic_field_check, make_filters_expr_dict (previously make_filters_sanity_check_expr), and make_group_sum_expr_dict (previously sample_sum_check) (#395)

Added

  • Added function region_flag_expr to flag problematic regions (#349)
  • Added function missing_callstats_expr to create a Hail Struct with missing values that is inserted into frequency annotation arrays when data is missing (#349)
  • Added function set_female_y_metrics_to_na_expr to set Y-variant frequency callstats for female-specific metrics to missing (#349)
  • Added function make_faf_index_dict to create a look-up Dictionary for entries contained in the filter allele frequency annotation array (#349)
  • Added function make_freq_index_dict to create a look-up Dictionary for entries contained in the frequency annotation array (#349)
  • Added function remove_fields_from_constant to remove fields from a list and notify which requested fields to remove were missing (#381)
  • Added function create_label_groups to generate a list of label group dictionaries needed to populate the info dictionary for vcf export (#381)
  • Added function build_vcf_export_reference to create a subset reference based on an existing reference genome (#381)
  • Added function rekey_new_reference to re-key a Table or MatrixTable with a new reference genome (#381)
  • Added function parallel_file_exists to check whether a large number of files exist (#394)
  • Added functions summarize_variant_filters, generic_field_check_loop, compare_subset_freqs, sum_group_callstats, summarize_variants, check_raw_and_adj_callstats, check_sex_chr_metrics, compute_missingness, vcf_field_check, and validate_release_t (#395)

v0.5.0

06 Jan 14:58
0ab8baf
Compare
Choose a tag to compare

Released April 22nd, 2021

Fixed

  • Fix for error in generate_trio_stats_expr that led to an incorrect untransmitted count. (#238)
  • Fix for error in compute_quantile_bin that caused incorrect binning when a single score overlapped multiple bins (#238)
  • Fixed create_binned_ht because it produced a "Cannot combine expressions from different source objects error" (#238)
  • Fixed handling of missing entries (not within a ref block / alt site) when computing coverage_stats in sparse_mt.py [#242]
  • Fix for error in compute_stratified_sample_qc where gt_expr caused error (#259)
  • Fix for error in default_lift_data caused by missing results field in new_locus (#270)
  • Fix to dbSNP b154 resource (resources.grch38.reference_data) import to allow for multiple rsIDs per variant (#345)
  • Fix to set_female_metrics_to_na to correctly update chrY metrics to be missing (#347)
  • Fixed available versions for gnomAD v2 coverage and liftover resources (#352)
  • Removed side effect of accessing gnomAD v2 coverage and liftover exome resources that would edit available versions for other resources (#352)
  • Use overwrite argument for importing a BlockMatrixResource (#342)

Changed

  • Removed assumption of snv annotation from compute_quantile_bin. (#238)
  • Modified compute_binned_truth_sample_concordance to handle additional binning for subsets of variants. (#240)
  • Updated liftover functions to be more generic (#246)
  • Changed quality histograms to label histograms calculated on raw and not adj data (#247)
  • Updated some VCF export constants (#249)
  • Changed default DP threshold to 5 for hemi genotype calls in annotate_adj and get_adj_expr (#252)
  • Updated coverage resources to version 3.0.1 [#242]
  • Update to compute_last_ref_block_end, removing assumption that sparse MatrixTables are keyed only by locus by default (#279)
  • Update generic_field_check to have option to show percentage of sites that fail checks. (#284)
  • Modified vep_or_lookup_vep to support the use of different VEP versions (#282)
  • Modified create_truth_sample_ht to add adj annotation information in the returned Table if present in the supplied MatrixTables (#300)

Added

  • Added constants and functions relevant to VCF export (#241)
  • Add reference genome to call of has_liftover in get_liftover_genome (#259)
  • Added fix for MQ calculation in _get_info_agg_expr, switched RAW_MQ and MQ_DP in calculation (#262)
  • Add importable method for filtering clinvar to pathogenic sites (#257)
  • Added common variant QC functions get_rf_runs and get_run_data to random_forest.py (#278)
  • Add calculation for the strand odds ratio (SOR) to get_site_info_expr and get_as_info_expr (#281)
  • Added VEPed context HT to resource files and included support for versioning (#282)
  • Added code to generate summary statistics (total number of variants, number of LoF variants, LOFTEE summaries) (#285)
  • Added additional counts to summary statistics (added autosome/sex chromosome counts, allele counts, counts for missense and synomymous variants) (#289)
  • Added function, default_generate_gene_lof_matrix, to generate gene matrix (#290)
  • Added function default_generate_gene_lof_summary to summarize gene matrix results (#292)
  • Add resource for v3.1.1 release (#364)

Removed

  • Removed rep_on_read; this function is no longer necessary, as MatrixTables/Tables can be repartitioned on read with _n_partitions added by this hail update (#283)
  • Removed compute_quantile_bin and added compute_ranked_bin as an alternative that provides more even binning. This is now used by create_binned_ht instead. (#288)
  • Removed prefix parameter from to make_combo_header_text, as this was only used to check if samples were from gnomAD (#348)