Gene Identification

ninja icon

Understanding:

•  Bioinformatics plays a role in identifying target genes

•  An open reading frame is a significant length of DNA from a start codon to a stop codon

    
Target genes can be identified by searching online databases for long stretches of DNA that could potentially code for protein

  • These sequences – called open reading frames (ORF) – will be preceded by a start codon and uninterrupted by stop codons
  • Open reading frames will typically consist of at least 100 codons (300 nucleotides)
  • Searches can be refined by looking at regions downstream of known promoter sequences and upstream of termination sites


While open reading frames may predict potential coding regions, they do not automatically guarantee the presence of a gene

  • Some long and uninterrupted sequences DNA may not actually be translated, whilst other short sequences may code protein


ninja icon

Skill:

•  Identification of an open reading frame

    
Any particular stretch of DNA will have six reading frames that could potentially code for a functional protein

  • mRNA is translated in codons (triplets of bases), meaning there are three potential reading frames for a given DNA sequence
  • DNA is double stranded and either strand could include a gene, meaning there are six reading frames in total (2 × 3)


To identify an open reading frame:

  • Locate a sequence corresponding to a start codon in order to determine the reading frame – this will be ATG (sense strand)
  • Read this sequence in base triplets until a stop codon is reached (TGA, TAG or TAA)
  • The longer the sequence, the more significant the likelihood that the sequence corresponds to an open reading frame


Certain bioinformatic programs can automatically identify potential ORFs when provided with a candidate sequence

  • Gene sequences are largely conserved – so if an ORF sequence is present in multiple genomes, it likely represents a gene


Identification of an Open Reading Frame

open reading frame


Link:  ORF Finder


ninja icon

Understanding:

•  The target gene is linked to other sequences that control its expression

    
The expression of a gene is controlled by other additional sequences that regulate transcriptional activity

  • A core promoter sequence functions as an initiation site where a complex of transcription factors are assembled
  • Control elements may serve to regulate the rate of transcription – either increasing (enhancers) or decreasing (silencers)


When a target gene is selected, it is linked to these other sequences to form a recombinant construct capable of expression

  • Including a promoter sequence as part of the construct will ensure the autonomous expression of the target gene
  • Control elements may be included within the construct to allow scientists to determine the rate and timing of expression


Controlling Expression of a Target Gene

target gene regulation