|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pregnancy |
Department of Animal Sciences,3
Center for Reproductive Biology,4
and
Program in Statistics,5 Washington State University, Pullman, Washington 99164
| ABSTRACT |
|---|
|
|
|---|
embryo, female reproductive tract, gene regulation; prolactin, testis
| INTRODUCTION |
|---|
|
|
|---|
The draft sequence of the human genome was released in 2001; its publication was considered a watershed event for modern biology. The initial sequence annotations of the human genome only uncovered about 32 000 genes [7] (or 26 00039 000 genes [8]). The Human Genome Resources section of the National Center for Biotechnology Information (NCBI) documented the identification of approximately 34 000 genes that span a total of 3 030 Mb of sequences in the human genome (data accessed on 1 November 2002). Because these human gene count estimates are widely acknowledged to be very conservative, the human gene sequences will serve as the ultimate references for comparative mapping of mammalian genomes.
Here, we report the use of human gene information to census genes and define their expression patterns by mining the ESTs expressed in embryos and reproductive tissues in pigs.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The cDNA sequences for the human genes were collected from the Human Genome Resources section of the NCBI. Of approximately 34 000 genes identified in the human genome (data accessed on November 1, 2002), 33 308 are coding genes, including 195 on the Y chromosome, 1315 on the X chromosome, and 31 798 on Homo sapiens autosomes (HSAs): 3000 on HSA1, 2461 on HSA2, 1872 on HSA3, 1582 on HSA4, 1766 on HSA5, 1847 on HSA6, 1766 on HSA7, 1395 on HSA8, 1368 on HSA9, 1425 on HSA10, 1904 on HSA11, 1557 on HSA12, 777 on HSA13, 1057 on HSA14, 1152 on HSA15, 1258 on HSA16, 1439 on HSA17, 640 on HSA18, 1644 on HSA19, 828 on HSA20, 386 on HSA21, and 675 on HSA22 (Table 1). As required by the local stand-alone BLAST searching engine, all sequences must be collected in the FASTA format. To do so, a web robot was developed to walk through all human chromosome information web pages to collect mRNA accession numbers. A light query program was then developed and used for automatically querying the NCBI search engine (http://www.ncbi.nlm.nih.gov/entrez/) for the FASTA sequences using mRNA accession numbers as search fields. FASTA format data were then collected from the search result page. The two programs were developed using Java (http://www.ansci.wsu.edu//people/jiang/faculty.asp).
|
Porcine EST Resources
Two major research groups, one led by Dr. T.P.L. Smith (USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE) and one led by Dr. C.K. Tuggle (Iowa State University, Ames, IA) (see GenBank porcine EST submissions), have contributed to EST sequencing in pigs using embryos and reproductive tissues. The individual or pooled cDNA libraries were made using embryos at 11, 12, 13, 14, 15, 20, 30, and 45 days of gestation. The reproductive tissues used to construct cDNA libraries included testis, ovary, endometrium, hypothalamus, anterior pituitary, uterus, and placenta. Recently, ESTs derived from a cDNA library using porcine ovarian follicles were released to the public domain [6]; these data also were included in this study. A total of 98 898 porcine EST sequence entries are listed in GenBank, including 36 927 ESTs derived from pig embryos and 61 971 derived from pig reproductive tissues. All of these porcine EST entries can be found in the "est_others" database at NCBI.
BLAST Search and EST Annotation
A stand-alone BLAST searching program has been installed to perform BLAST searches to annotate porcine ESTs expressed in embryos and reproductive tissues using human gene information as references. The major procedures of BLAST searching and EST annotation include BLAST searches against the GenBank "est_others" database using human cDNA sequences as queries, identification of the porcine orthologous ESTs in the database using a "BLASTFilter" program (http://www.ansci.wsu.edu/people/jiang/faculty.asp), classification of porcine ESTs by resources, and compilation of data for ESTs specifically expressed in pig embryos and reproductive tissues. This "BLASTFilter" was developed and used to filter out the BLAST matches that do not represent the porcine ESTs or do not meet the requirements of sequence identity by >80% within a continuous alignment of sequences longer than 100 base pairs. Plain text BLAST research result pages were processed by a parser/screen program developed by Java. The program first splits a search result page into multiple hits and then parses each hit to obtain multiple data fields. Hits were screened by alignment match length, sequence identity (percentage), and the keywords in the title based on criteria as described above. All computer programs used in this study are available upon request.
Statistical Analysis
The number of the porcine EST hit(s) per human gene was used as a count for the analysis. All human genes that hit the porcine ESTs expressed in either embryos or reproductive tissues are tabulated based on human gene symbol, human chromosome number, and number of porcine EST hit(s) from embryo or reproductive tissues (see supplemental material for details). We divided the genome into three parts: autosomes, X chromosome, and Y chromosome. An analysis was performed to determine which genome parts harbor more or fewer genes derived from embryos or from reproductive tissues. The standard Z-test was used, specifically based on genome-wide averages. We also used the number of ESTs hit(s) per human gene as an indicator of in silico gene expression patterns for pig embryo and reproductive tissues.
| RESULTS |
|---|
|
|
|---|
A total of 33 308 human genes were used to search for the orthologous EST sequences derived from the pig embryos and reproductive tissues. Among them, 13 962 human genes were found to have porcine ESTs from these two sources (Table 1). A total of 70 986 porcine EST entries were retrieved by BLAST searching against the "est_others" database, with an average of 5.08 ESTs per gene. Of these 13 962 genes expressed in these two types of materials, 2167 are expressed specifically in embryos, 4552 are expressed specifically in reproductive tissues, and the remaining 7,243 genes are expressed in both sources (Table 1), for a total of 9410 (2167 + 7243) genes expressed in pig embryos and 11 795 (4552 + 7243) genes expressed in reproductive tissues (Table 1). Thus, the porcine genome, if its size is similar to that of the human genome, has at least 28.3% (9410/33 308) of its coding genes transcribed in embryos and 35.4% (11 795/33 308) transcribed in reproductive tissues.
Autosomes and Sex Chromosomes
A genome can be simply divided into three parts, autosomes, X chromosome, and Y chromosome, because sex chromosomes are conserved in mammals. The number and percentage of genes transcribed in embryos and reproductive tissues from these three chromosome types are listed in Table 1. Chromosome Y is an especially bald chromosome, with the lowest gene density and the fewest genes expressed in the human genome. Among 195 genes on the human Y chromosome, only 4.6% (9/195) were expressed in pig embryos and 8.2% (16/195) were expressed in pig reproductive tissues. These percentages are significantly lower (P < 0.01) than those for autosomes (embryos: 28.4% [9028/31 798]; reproductive tissues: 35.6% [11,329/31 798]) and the X chromosome (embryos: 28.4% [373/1375]; reproductive tissues: 34.2% [450/1315]) (Table 1). Of the 18 genes on the Y chromosome that were expressed in both materials (Table 2), 11.1% (2/18) are embryo specific and 50% (9/18) are reproductive tissue specific. The percentage of embryo-specific genes was significantly lower (P < 0.01) but the percentage of reproductive tissue-specific genes was significantly higher (P < 0.01) than those percentages for autosomes (embryos: 15.5% [2072/13 401]; reproductive tissues: 32.6% [4373/13 401]) and for the X chromosome (embryos: 17.1% [93/543]; reproductive tissues: 31.3% [170/543]), respectively (Table 1). These figures do not show any significant differences between autosomes and the X chromosome (P > 0.05). In addition, these genes may be located in two pseudo-chromosomal regions of the Y chromosome.
|
Putative In Silico Gene Expression Patterns
The number of EST hits per gene was used as an indicator for gene expression to study the gene expression patterns in both embryos and reproductive tissues. A total of 28 219 EST entries were hit from embryos and 42 767 EST entries from reproductive tissues, the cut-off criteria to choose the basic active genes (BAGs) were selected at 14 EST hits per gene for embryos and 22 for reproductive tissues (both about 0.0005 of their total hits). Of 228 BAGs, including 110 selected from embryos and 118 from reproductive tissues (Table 3), only 16 are coexpressed abundantly in both materials: 4 function-known genes (GSTP1, HBZ, HSPA8, and NUMA1) and 12 function-unknown genes. The top 10 BAGs in embryos are IFNG, HBD, HBB, AHSG, AFP, SERPINA1, HBE1, TMSB10, SERPINA2, and LOC253251, and the top 10 BAGs in reproductive tissues are PRL, SLC25A3, IGFBP7, LOC129932, PLP1, LOC133940, CSH2, SCG2, CHGB and LOC220159. These results clearly indicate that expression patterns and gene activities differ significantly in pigs between embryos and reproductive tissues.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
A vast number of EST sequences from different organisms have been generated and released to the public domain. Additional EST/gene sequences probably will be released from the many sequencing projects in different species that are still under way. The rapid expansion of EST information has required new computational approaches for discovering new genes in different genomes. There have been a number of attempts to identify unique gene products represented by EST data. UniGen [9] uses pairwise sequence comparisons at various levels of stringency to group related sequences, placing closely related and alternatively spliced transcripts into clusters. The TIGR Gene Indices use assembly algorithms to produce tentative consensus (TC) sequences that represent the underlying mRNA transcripts [10]. The procedures for annotation of ESTs used by these groups could be described as an EST-oriented approach, i.e., from ESTs to the genes. However, the strategy used in the present study was to catalogue and characterize the gene by BLAST-based retrieval of all ESTs that are orthologous to an identified gene in the human genome. Therefore, this process could be described as a gene-oriented approach to annotate the ESTs derived from different sources. Because EST segments are collected within an ortholog, the process itself is an exact annotation procedure. By using this gene-oriented approach to mine the current porcine ESTs resources, we discovered at least 9410 genes expressed in pig embryos and at least 11 795 genes in reproductive tissues (Table 1). These genes can be used as references in choosing candidate genes to unravel the complexity of quantitative traits for reproduction detected in pigs and in other species.
GenBank EST Entry-Based Gene Expression Profiling
The mRNAs of a typical somatic cell are distributed in three frequency classes: superprevalent (or abundant), intermediate, and rare [11]. Five to 10 species of superprevalent cDNA represent at least 20% of the mass of mRNA, 5002000 species of intermediately expressed mRNA represent 40%60% of the mRNA mass, and 10 00020 000 rare messages may account for <20% to 40% of the mRNA mass. Therefore, sequencing of cDNAs from standard cDNA libraries will be favorable toward finding the intermediately and highly expressed cDNAs and unfavorable toward finding the rarely expressed genes. To reach an ultimate goal set for the EST sequencing projects to collect at least one cDNA for every expressed gene, techniques have been developed to normalize the frequencies of cDNAs from mRNAs that belong to the three different classes of expression [12]. However, research has indicated that even library normalization is only partially effective. For example, Sonstegard et al. [13] produced 23 202 bovine ESTs derived from a normalized cDNA library with pooled bovine mammary tissues, and these ESTs helped to form 5751 TC sequences. The authors found that the majority (87%) of these 5751 assemblies contained only one to three mammary gland-derived ESTs. In contrast, 18% of the mammary gland ESTs assembled with TC sequences corresponded to 12 genes. These findings suggested that the GenBank EST entry could be used as an indicator to evaluate the gene expression level of a given gene in tissue or species. This study could be the first to evaluate the in silico gene expression patterns based on numbers of EST entries per gene in GenBank. Results of this study indicated that gene expression patterns are quite different between embryos and reproductive tissues. Of 228 BAGs expressed in these two types of materials, only 16 are coexpressed abundantly or actively. However, these in silico gene expression patterns need to be further verified or confirmed by gene expression profiling with microarray or semiquantitative polymerase chain reaction assays.
Top 10 BAGs in Embryos
By exploiting the porcine EST database, the top 10 BAGs in embryos were identified as IFNG, HBD, HBB, AHSG, AFP, SERPINA1, HBE1, TMSB10, SERPINA2, and LOC253251. The antiviral activity of interferon
(IFN
, gene INFG) has been described extensively. However, the porcine trophectoderm secretes large amounts of IFN
around the time of implantation [14]. The effect of IFN
on reproduction has not been completely elucidated, but Petroff and coworkers [15] suggested that IFN
may play a role in suppression of activated maternal leukocytes. Hemoglobin
(HBD), hemoglobin ß (HBB), and hemoglobin
(HBEI) are involved in the ontogeny of fetal erythropoiesis [16]. Alpha (2)-HS glycoprotein (AHSG) is an inhibitor of insulin receptor signaling and may negatively regulate embryonic skeletal development [17]. Alpha fetoprotein (AFP) is a major component of embryonic plasma and is thought to play a major role in embryonic development. This protein interacts with estrogens and is required for the development of female fertility [18]. In addition, alpha fetoprotein has immunomodulatory functions and may suppress the maternal immune system [19]. SERPINA1 and SERPINA2, members of the serine proteinase inhibitor superfamily, have a primary role in controlling neutrophil elastase activity within the mammalian circulation [20]. They may regulate the maternal inflammatory response, which in turn allows proper implantation and growth of the embryo [21]. Thymosin ß10 (TMSB10) aids in the control of actin polymerization and is required for cell migration, angiogenesis, and development of the nervous system [22]. The function of LOC253251 protein remains unknown.
Top 10 BAGs in Reproductive Tissues
The analysis revealed the top 10 BAGs in reproductive tissues as PRL, SLC25A3, IGFBP7, LOC129932, PLP1, LOC133940, CSH2, SCG2, CHGB, and LOC220159. The genes for both prolactin (PRL) and chorionic somatomammotropin hormone 2 (CSH2) have been well studied for their functions in reproduction. Prolactin has historically been known as the pituitary hormone of lactation; however, >300 other functions for the hormone have been described [23]. CSH2 is a placental lactogen and is related to prolactin and growth hormone. This hormone stimulates mammogenesis and fetal development [24]. Insulin-like growth factor binding protein 7 (gene IGFBP7) or mac25 is a known tumor inhibitor, but recent evidence suggests that mac25 may promote terminal differentiation of granulosa cells in preovulatory follicles [25]. Secretogranin II (SCG2) and chromogranin B (CHGB) belong to a class of proteins known as chromogranins [26]. Chromogranins appear to influence the storage and secretion of LH and FSH at different stages of the estrous cycle, but their precise functions may be species specific [27]. SLC25A3 is a mitochondrial phosphate carrier required in the terminal steps of oxidative phosphorylation for catalyzing the uptake across the mitochondrial inner membrane of the oxidative phosphorylation substrate phosphate in symport (cotransport) with a proton [28]. Proteolipid protein 1 (PLP1) is the predominate protein in myelin. PLP1 has been associated with Pelizaeus-Merzbacher disease [29] but may just be reflective of the type of tissue collected. The functions of proteins LOC129932, LOC133940, and LOC220159 are unknown.
Gene Expression on the Y Chromosome
The Y chromosome has only 4.6% and 8.2% of its harbored genes expressed in embryos and reproductive tissues, respectively, significantly below the genome averages of 28.3% and 35.4%, respectively (P < 0.01). Why does the Y chromosome contribute so little to embryo development and reproduction compared with other chromosomes? In placental mammals, one of the female's X chromosomes is inactivated. However, humans with the 45, XO karyotype exhibit Turner syndrome. These females have the same number of active X chromosomes (one) as normal XX females, which suggests that a small number of genes on both of the X chromosomes in normal 46, XX females remain active [30]. At least some of these special X-linked genes have been thought to be present on the Y chromosome, which could explain why XY individuals do not show any phenotypic abnormalities, even with just one X chromosome. The results presented in this study provide data in support of the hypothesis that a small number of genes on the Y chromosome are needed for proper growth and development. If the Y chromosome contains pseudo-chromosomal regions similar to those of the X chromosome, genes expressed from the Y chromosome might serve as references to study which genes located on the inactivated X chromosome remain active. Of the 18 genes from the Y chromosome, 9 (50%) are specifically expressed in the reproductive tissues, indicating that the Y chromosome could play an important role in reproduction.
| FOOTNOTES |
|---|
2 Correspondence: Zhihua Jiang, Department of Animal Sciences, Washington State University, Pullman, WA 99164-6351. FAX: 509 335 4246; jiangz{at}wsu.edu ![]()
Received: 9 April 2003.
First decision: 1 May 2003.
Accepted: 29 May 2003.
| REFERENCES |
|---|
|
|
|---|
, secreted by a polarized epithelium, has specific structural and biochemical properties. Eur J Biochem 2002 269:2772-2781[Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |