unified_metric_calculations
calculate_heterozygosity(maf)
Calculate heterozygosity for each locus in the genotype matrix.
Source code in fpg_observational_model/unified_metric_calculations.py
calculate_individual_rh(barcode_heterozygosity, bootstrap_list)
Calculate individual R_h value based on infection heterozygosity and sampled H_Mono distribution.
Source code in fpg_observational_model/unified_metric_calculations.py
calculate_population_rh(df, monogenomic_dict, n_mono_boostraps=200)
Calculate the R_h statistic for the given sampling dataframe and IBS matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame containing infection data, needs to include effective COI information. |
required | |
monogenomic_dict
|
Dictionary containing the IBS distribution data. |
required | |
n_mono_boostraps
|
Number of bootstrap samples to draw for the population H_Mono estimation. |
200
|
Logic from published inferential model (Wong et al 2022, https://doi.org/10.1093/pnasnexus/pgac187): - Identify the baseline co-transmission relatedness of unique monogenomic barcode pairs (H_Mono - 200 pairwise draw) - Identify per sample polygenomic heterozygosity by the number of Ns (H_Poly) - Calculate individual R_h = (H_Mono - H_Poly) / H_Mono - Calculate summary statistics (mean, median, std) from the individual polygenomic R_h values per site per year.
Adapting for the model: - (On hold -tested with too few infections to determine interpretation) For samples super infection samples with an effective COI=2, H_Mono (measured) can be measured directly as defined by the expectation of IBS values between the two unrelated genotypes in a mixed infection. - Replicate the bootstrap sampling of H_Mono by drawing from the IBS distribution of monogenomic samples, excluding IBS=1 values (i.e. identical barcodes) to calculate a H_Mono (inferred) value. - Polygenomic sample heterozygosity is calculated as the proportion of Ns in the barcode, assuming all alleles in an infection are detectable. Updates to make this more or less sensitive to minor alleles can be made in the generate_het_barcode function.
Source code in fpg_observational_model/unified_metric_calculations.py
comprehensive_group_summary(group)
Calculate comprehensive summary statistics for infection data.
Returns mean, median, std, min, max for continuous variables.
Source code in fpg_observational_model/unified_metric_calculations.py
generate_het_barcode(matrix, indices)
Checks for unique alleles at each locus for a specified set of genotypes identified by indices. If all alleles are the same at a locus, returns '0' or '1' for that locus. If there is a mix of alleles at a locus, returns 'N' for that locus. If no indices are provided, returns an empty list.
Note: To update for multi-allelic loci, modify the conditions within the list comprehension.
TODO: Add option to account for densities to potentially mask polygenomic samples due to low density.
Source code in fpg_observational_model/unified_metric_calculations.py
get_matrix(name)
Get a registered matrix
Source code in fpg_observational_model/unified_metric_calculations.py
ibx_distribution(indices, hash_ibx)
Fxn returns counts of pairwise values per group. Key: IBx value, value: counts
Source code in fpg_observational_model/unified_metric_calculations.py
identify_nested_comparisons(df, sampling_column_name, config=None)
Generate a list of infections within sampling schemes for looping through nested comparisons.
Source code in fpg_observational_model/unified_metric_calculations.py
inf_ibx_summary(ibx_matrix, ibx_indices)
Run the IBx summary for a list of genome indices for each polygenomic infection.
Source code in fpg_observational_model/unified_metric_calculations.py
process_nested_fws(nested_indices, sampling_df, ibs_matrix='ibs_matrix')
Calculate Fws for a single sample based on population heterozygosity.
[
Mirrors logic from R package moimix used to calculate F_ws. Specifically following the logic in the function getFws() with this as the comment:
1 2 3 | |
Source code in fpg_observational_model/unified_metric_calculations.py
730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 | |
process_nested_ibx(df, gt_matrix, nested_indices, ibx_prefix, individual_ibx_calculation=True, save_ibx_distributions=True, save_pairwise_ibx=False)
Calculate IBx for nested comparison groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame linking infection information to genotype indices |
required | |
gt_matrix
|
Matrix of genotypes, roots or alleles. |
required | |
nested_indices
|
Dictionary with comparison types as keys and nested group data as values |
required | |
save_ibx_distributions
|
Option to save the dictionary of value with counts for the full pairwise distribution |
True
|
|
individual_ibx_calculation
|
Whether to calculate individual IBx values |
True
|
Returns:
| Type | Description |
|---|---|
|
DataFrame with summary statistics for each group/subgroup |
Source code in fpg_observational_model/unified_metric_calculations.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
register_matrix(name, matrix)
sample_from_distribution(dist_dict, n_bootstraps=200, exclude_keys=[1])
Unpacks the pairwise IBS distribution dictionary to sample from the distribution n times. Excluded keys are values that do not represent the distribution of interest - e.g. IBS=1 for identical barcodes since these would not be detected as a mixed infection.
Source code in fpg_observational_model/unified_metric_calculations.py
update_ibx_index(filter_df)
For year specific IBX calculations, update the recursive_nid to a global order based on their unique values.
Source code in fpg_observational_model/unified_metric_calculations.py
weighted_describe_scipy(summary_dict, ibx_prefix)
Calculate stats by expanding the weighted dictionary