Of these different HERV-K (HML) subclades, we are most interested in HERV-K (HML-2), which will be abbreviated as HML-2.  As stated above, HML-2 started colonizing the primate genome approximately 35 MYA and as recently as 100,000 years ago, earning them the title as the subclade containing the youngest HERVs (21, 22, 34, 36, 38, Figure 1.4.4). As a result, the most recently integrated and human-specific primate proviruses are within this subclade. Due to their youth, some HML-2 proviruses have not been devastated by time and possess LTRs that are nearly identical and possess ORFs potentially capable of encoding Gag, Pro, Pol, and Env proteins (20, 34, 36, 38). This indicates that some of these proviruses are potentially capable of producing HML-2 – specific proteins that could participate in pathology, protect the host from infection by similar viruses, or would produce infectious virus that would facilitate their further spread in the population. This last line of thinking was further encouraged due to their polymorphic nature in humans and the presence of human-specific proviruses (22, 35, 36, 42).

Recently, our lab and others have looked for the presence of infectious HML-2s (34-37). There are at least 95 known HML-2 proviruses and approximately 950 solo LTRs, with 91 of these known HML-2 proviruses present in the hg19 human genome build (34-37). Each of these proviruses have several different names, but we will refer to them based upon their chromosomal location (34-36). Thought ERVs from other species can produce infectious virions, this does not appear to be the case with HERV-K (HML-2) – as we have previously shown – or any HERV for that matter (36). 1.4 A Brief Overview of Endogenous Retroviruses (ERVs) and an Introduction to HERV-K (HML-2)

            Transposable Elements encompass 45% of the human genome, which compared to the 1.5% that protein-coding genes comprise is a surprising amount (20, 21, 29, 38, 39, Figure 1.4.1). The most abundant Transposable Elements are Retroelements, which constitute 42.2% of the human genome (38, 39, Figure 1.4.1). Retroelements are further split into LTR and non-LTR elements, where non-LTR elements such as LINEs and SINES account for 34% of the total human genome and LTR elements such as retrotransposons and endogenous retroviruses (ERVs) account for 8% (19, 22, 29, 30, 38, 39, 41, Figure 1.4.1).

As mentioned above, the creation of a provirus by insertion of the retroviral genome into the host cell is a critical step in retroviral replication (1, 2, 21, 30, 31, 38, 40, Figure 1.3.1). Once a provirus is created, it is a permanent fixture within the host cell except for deletion of the internal ORFs and an LTR through homologous recombination, leaving a solo LTR as a reminder of the provirus that once existed (21). While retroviruses typically infect somatic cells, they can occasionally infect gametes (19 – 22, 30, 31, 38, 40, Figure 1.4.2). If an infected gamete results in viable offspring, the provirus – now an ERV – will be present in every cell at the same insertion site of the offspring. This ERV will be treated as any other gene in terms of activation and Mendelian inheritance.

ERVs have been found in all vertebrate species analyzed, such as humans, mice, cats, sheep, chickens, and – more recently – koalas (19, 20, 23 – 28, 30 – 33, 39, 41). Depending on the species, there could be anywhere from a few to a few thousand ERVs and retroviral elements present due to reinfection by a replication-competent ERV (42, 43), retrotransposition (44, 45), or infection of a host by an exogenous retrovirus to produce a new ERV independent of or accompanying a complementing endogenous retrovirus (46, 47) (20, 22, Figure 1.4.3). Their number provides an opportunity for studying their evolution, the evolution of the host, and evolution of the host-virus relationship (40). Due to these mechanisms and their abundance in some species, ERVs have colonized vertebrate genomes over a prolonged period, ranging from decades in koalas (48) to millions of years in primates (36, 49-52) (20).

ERVs can be roughly grouped based on their integration time into “ancient” and “modern” proviruses (20, 30, 31, 40). Ancient proviruses are regarded as retroviruses that integrated into the germline prior to speciation, while modern proviruses are regarded as retroviruses that integrated into the germline after speciation (20, 30, 31, 40). This can be determined by the presence of proviruses within a species and related species at the same location while comparing the approximate integration time of the provirus to relevant speciation dates (20, 30, 31). Due to their advanced age, ancient proviruses are degraded through deletions, frameshifts, and premature stop codons that prevents the production of infectious virus (19, 20, 30, 31). Likewise, modern proviruses are typically more intact with the ability to produce protein products and virions, which are potentially infectious (20, 30, 40). Because of their youth, these proviruses are typically polymorphic in a species (20, 30, 40).

There are several examples of ERVs that are capable of expression in their host species (20, 22). The most well studied avian ERVs are found in domestic chickens and are closely related to ASLV (54). One such ERV, dubbed Rous-associated virus (RAV-0) which is capable of releasing virions, is exclusively found in domestic chickens and red jungle fowl indicating a relatively recent integration (20, 53). Endogenous Jaaksetie Squamous Carcinoma Retrovirus (enJSRV) is also closely related to exogenous JSRV, indicating that enJSRV recently integrated (20, 21, 55). Unlike its exogenous counterpart, enJSRV does not correlate with ovine pulmonary carcinoma and is currently thought to protect sheep from pulmonary carcinoma by blocking cellular entry of exogenous JSRV (19, 20, 30, 31, 56, 57). Mice – which have several ERVs – contain endogenous proviruses originating from ecotropic (mouse only) and xenotropic (mouse and non-mouse) MLVs which can release infectious virions (20). Finally, a more recent example of endogenization has been occurring within the koala population, where the spread of endogenous Koala retroviruses (KoRVs) allows virologists to track the process of endogenization in real time (21 – 24, 26, 27, 48).

In some instances, ERVs are responsible for cancer onset. Bittner observed that female mouse pups from a high incidence background would still develop mammary adenocarcinoma when fostered on mice of a low incidence background. This suggested the presence of an inherited and infectious virus – later identified as MMTV – capable of causing murine breast cancer which Peter Bentvelzen would later confirm (1, 19, 20, 58 – 60). While feline ERVs such as the endogenous feline leukemia virus (FeLV) – which is also very similar to its exogenous counterpart – are not infectious, recombination between an endogenous FeLV and the exogenous FeLV subgroup A (FeLV-A) results in a de novo synthesized FeLV-B which causes neoplastic disease in infected animals (19, 20, 25, 31, 33, 61, 62).

The story of human ERVs (HERVs) detection begins with the identification of these other ERVS (20, 38, 63, 64). Thanks to the identification of replication competent ERVs that were associated with cancers in mammals like mice, this drove research to look for HERVs that were also associated with human cancer (22, 38). The first HERVs cloned were identified by using hybridization probes for Southern blot analysis designed from conserved pol regions specific for MMTV (63, 64) and the hamster endogenous retroelement intracisternal type A particles (IAP) (65) under relaxed conditions (20, 22, 38). Since then, there have been more sensitive methods – PCR and deep sequencing, for example – used to detect HERVs in non-human primates and humans. These methods have shown the presence of several distinct HERV clades across the primate lineage with many HERV sequences integrating approximately tens of millions of years ago (22, 34 – 38, 41, Figure 1.4.4).

Several HERV subclades are very old, heavily mutated, and therefore have a low likelihood of producing infectious virus (22, Figure 1.4.4). In this respect, the subclade known as HERV-K (Human MMTV-Like (HML)) is unique. HERV-K (HML) received its name due to the binding of a lysine tRNA to the PBS during reverse transcription and due to the detection of these sequences with the above-mentioned MMTV pol probes (20, 22, 38, 41, 63-65). HERV-K (HML) is broken down into 11 subclades numbered 1-11 which integrated into the primate genome at different times and have had many different integration events throughout primate evolution (20, 22, 34, 36, Figure 1.4.4). For example, the subclade HERV-K (HML-2) has had several integration events starting approximately 35 million years ago (MYA) up to as recent as 100,000 years ago (21, 22, 34, 36, 41, Figure 1.4.4).