Genomics: In recent years, it has emerged as a sub-specialization in biology. It deals with the study of genomes of biological entities. Genome of a eukaryotic organism consists of the basic, i.e. monoploid chromosome set of the concerned species. Genomics thus encompasses the study of the structure and the function of the hereditary material contained in the chromosomes composing a genome.
Beginning with the discovery of the chromatin, the study of genome moved from genetic mapping of a few genes/ mutations by the group led by T H Morgan and the famous genome analysis of H Kihara, based on chromosome pairing behaviour to complete sequencing and functional analysis of the whole genome thereby bringing “genomics” to the forefront of biology.
The first foray into genomics was a proposal to use DNA technology to the extend Sturtevant’s original concept of genetic mapping to human being. Instead of tracing visible mutations as had been done in fruit flies, David Botstein and colleagues in 1980, working at the Whitehead Institute of Biomedical Research, USA, proposed that one could construct a genome map i.e. a complete genetic map of all human chromosomes following inheritance of DNA sequence variations, termed DNA polymorphism.
A more expansive vision aiming at study of genome at a very fine scale giving a nucleotide-by-nucleotide account was expounded in 1985. The entire human genome was proposed to be sequenced providing a complete catalogue of every human gene.
Although, on its face, the proposal seemed a logistic impossibility, decades of hard work by vast hordes of researchers at different laboratones of the world transformed this into a reality. The first bacterial genome to be completely sequenced was that of Haemophilus influenzae in 1995. In 1998, the genome of the first multicellular organism-the 97 million base (Mb) DNA sequence of the round worm, Caenorhabditis elegans was published.
The first plant completely sequenced is weed Arabidopsis thaliana, a wild relative of mustard in 2000. Sequencing, sequence assembly and annotation of the whole genome of an organism constitute structural “genomics” while assigning biological function(s) to the DNA sequences using a variety of tools and techniques is functional genomics.
Whole genome sequencing
As exemplifled by the human genome, two distinct approaches are being employed for complete sequencing of eukaryotic genomes:
(i) ordered clone-by-clone approach and
(ii) whole genome shotgun approach.
The public funded Human Genome Project employed the first approach while the private company Celera Genomics followed the second strategy in human genome. In plants, Arabidopsis Genome Initiative followed the same method as the Human Genome Project. The sequencing of the rice genome carried out by the International Rice Genome Sequencing Project (IRGSP) was also based on the first approach.
This approach essentially requires a complete genome map based on a large number of DNA markers and a large fragment (>100 kb) genomic library with at least 10 fold coverage of genome of the given species. In rice for instance, a genetic map containing more than 3,000 genetic markers had been constructed which was used to develop a physical map of the genome using large fragment DNA libraries.
Three different large fragment libraries using yeast artificial chromosomes (YACS), bacterial artificial chromosomes (BACs) and Pl-derived artificial chromosomes (PACs) were generated which carried different rice genomic DNA fragments obtained by partial restriction digestion using enzymes, viz. MboI, Sau3A, EcoRI and HindIII.
The BAC and PAC clones have been mostly used in sequencing, while the YACs were used for gap-filling and anchoring of expressed sequence tags (ESTs). Prior to their use in sequencing, however, the BAC/PAC clones are anchored onto the genetic map to construct physical maps of the genome. For this, the genetic markers are used as probes to screen BAC/PAC libraries. Those clones, which hybridize are then placed against marker(s) along the chromosome to generate a physical map of the marker region.
This way, using markers from all genomic regions, a complete physical map is created. A contiguous set of clones in a genomics region (called a contig) is further characterized for instance, by fingerprinting, to identify clones with minimum overlap at the ends. These clones constitute minimum tiling path and thus become the substrate for sequencing. This method, although laborious, provides opportunity for chromosome sharing and helps in precisely identifying gaps, if any.
Whole genome shotgun approach
In this approach, genomic DNA is sheared to obtain 2 and 10 kb fragments, which are then cloned and sequenced from both ends. Sequences of large number of such clones are then assembled to derive small contigs, which are ordered and oriented into scaffolds. The scaffolds are then mapped to chromosomal locations using known markers.
The assembly process consists of 5 major stages, viz. screener, overlapper, unitigger, scaffolder and repeat masker. The screener marks sequence for microsatellites, and screens out all known long interspersed repeats. The overlapper compares every read of sequence against every other in search bf complete end-to-end overlaps of at least 40 bp. The unitigger identities uniquely assembled contigs, which are then joined by scaffolder.
The repeat masker is used to resolve repeats and thus eliminates errors in the scaffolding process. This method is more straightforward and less time consuming. It requires high level of computing expertise and facility. For resolving some of the conflicts, it requires information from the genetic and physical maps.
The sequence of the whole genome provides a wealth of information on the genome structure, its evolution pattern and its relationship with other genomes. It also enables analysis of biological processes/pathways at the whole genome level.
Once the whole genome is sequenced then it becomes essential to derive meaning so that new and novel genes are isolated and utilized in crop improvement. Different approaches are followed for gene discovery. The first task after getting the sequence is to employ a set of bio-informatic tools and techniques to get sense out of the data.
A variety of gene prediction packages are available that determine, if a segment of DNA contains any gene. This method takes into account structure and ‘functional genomics‘ depends on computation. Actual experiments however, are required to establish correctness of the prediction of genes.
Another method that promises far greater precision in establishing gene function employs a large set of mutants. By employing transposon, retrotransposon and T-DNA insertions, a variety of mutants are created which enable the process of gene identification.
A third method called transcriptomics requires availability of microarrays, which are also known as DNA chips. Short oligonucleotides designes based on the sequence of cDNAs or predicted genes from the genome sequences are placed (arrayed) on a glass slide as microscopic dots by using a robot called arrayer. Thousands of these spots so created on a single slide are then hybridized with fluorescent-labeled cDNAs synthesized from RNA from different tissues or treatments.
Different fluorescent labels are used for different tissues or treatments. When hybridized, differentially expressed genes are easily identified based on the colour pattern of the spots. Transcriptomics thus refers to analysis of the transcriptome, i.e., the total set of transcripts in a given organism or to a specific subset of transcripts in a particular cell type. Ultimately, the genes identified by different approaches validated functionally by using them in genetic transformation experiments to see if the mutant phenotypes are complemented.
The genome function is also studied at the proteins level. The entire complement of proteins produced by an organism constitutes the proteome, and its study is called proteomics. A variety of approaches employing 2-dimensional electrophoresis, mass spectrometry, protein chips, and yeast two-hybrid systems are used to characterize proteome of an organism. When the genome function is understood by analyzing all the metabolites in a biological organism, it is called metabolomics.
Metabolite profiling techniques used are gas chromatography, high performance liquid chromatography, capillary electrophoresis, mass spectrometry and nuclear magnetic resonance spectroscopy. It enables assessing unique chemical fingerprints that specific cellular processes leave behind. This profiling can give an instantaneous snapshot of the physiology of a cell and thus provide additional information required for understanding “functional genomics”.