LIDo banner

Apply now

Find out more about the different routes to entry and our eligibility criteria

Linfeng Wang: Mixed infections in genotypic drug-resistant Mycobacterium tuberculosis

XX
Tuberculosis (TB), caused by Mycobacterium tuberculosis, is a major global health problem, responsible for 10.6 million cases and 1.6 million associated deaths in 2021 alone.

Whilst, TB is a treatable disease, resistance to anti-TB drugs, especially first-line rifampicin (RR-TB) and isoniazid (HR-TB), together called multi-drug resistance (MDR-TB), is making infection control more difficult. To acquire resistance to anti-TB drugs, M. tuberculosis drug targets or activating proteins are often mutated2,3, including by single nucleotide polymorphisms (SNPs) and insertions and deletions (indels); a process involving vertical, but not horizontal, gene transfer. It is being increasingly recognised that within-host mixed strain infections (MSIs) are contributing to TB drug resistance, with heteroresistance involving the co-existence of susceptible and resistant strains.

MSIs can arise due to the reinfection of an infected host with a new strain of M. tuberculosis, which is often observed in relapse patients, as well as emerge where there is distinct clonal evolution within the infected host4. MSIs may be driven by inadequate treatment schemes where a diagnosed TB patient will receive combination therapies of sometimes toxic drugs for a minimum of 6 months, and non-compliance or treatment failure can arise. Heteroresistance has been responsible for higher rates of treatment failure, thereby limiting treatment options in TB patients5. Often without proper strain and drug resistance profiling, the treatment of MSI patients may involve second/third-line drugs with less efficacy, more serious adverse drug reactions, and a prolonged treatment period. Therefore, identifying the complete pathogen diversity within the host is useful for achieving favourable clinical treatment outcomes.

The phylogeny of M. tuberculosis consists of 4 major lineages (L1–L4), which consist of different strain types that may vary in their propensity to transmit and cause severe disease6. MSIs of M. tuberculosis can be identified in high-depth whole genome sequencing (WGS) data through the presence of heterozygous genotypes. Strains and SNPs with high numbers of heterozygous sites are typically removed from the analysis, often thought to be the effects of contamination or sequencing errors. The deconvolution of different lineages within MSIs can be determined from such data by estimating the ratios of allele coverage at different lineage-specific SNPs6.

However, for heteroresistance the challenges lie in determining the lineage each resistance-linked SNP belongs to; thereby obtaining information for lineage-specific drug resistance profiling in an MSI. To infer this, we often rely on any overlap between the lineage-specific and drug resistance SNPs, which is not straightforward using short-read sequencing data, and often leads to many orphan drug resistance SNPs that are unassigned to strains. However, this problem can be resolved using data from long-sequencing platforms.

It is possible to profile drug resistance and lineages from WGS data to inform clinical and infection control, for example, using the TB-Profiler tool2. However, whilst it is possible to call mixed genotypes, such software typically lacks the means of disentangling the different SNPs on specific different strains within an MSI, which could enhance profiling. Previous work4,7,8 on mixed infections in TB has provided a means of identifying specific lineages involved in MSI samples and the sample drug resistance.

Nonetheless, the connection between the identified lineage and sample drug resistance is still undetermined. Here we built a statistical tool based on Gaussian mixture models (GMMs) to distinguish different strain lineages’ fractions in an MSI, and assign drug resistance to each lineage, without the need for detecting drug resistance in lineage-specific SNPs on the same sequencing read. In general, a GMM is a probabilistic model representing multiple Gaussian distributions within a population, and the algorithm determines their number and mixing proportions. Amongst many applications, GMMs have been used successfully to identify protein families9, cell types from omics data10, and to classify cancers11. Here, we apply a GMM model to deconvolute 531 MSI samples detected in a large M. tuberculosis WGS "50k" dataset (n = 50,723)6,12 by TB-Profiler software2 (Fig. 1A). W

e test the accuracy of the GMM algorithm in a simulation study and estimate the number of MSIs and heteroresistance across different lineages. Ultimately, the disentanglement of strains and drug resistance involved in MSIs could assist in the optimisation of treatment decisions and potentially prevent the emergence of further resistance.

Read full publication here