Understanding:
• Bioinformatics plays a role in identifying target genes
• An open reading frame is a significant length of DNA from a start codon to a stop codon
Target genes can be identified by searching online databases for long stretches of DNA that could potentially code for protein
- These sequences – called open reading frames (ORF) – will be preceded by a start codon and uninterrupted by stop codons
- Open reading frames will typically consist of at least 100 codons (300 nucleotides)
- Searches can be refined by looking at regions downstream of known promoter sequences and upstream of termination sites
While open reading frames may predict potential coding regions, they do not automatically guarantee the presence of a gene
- Some long and uninterrupted sequences DNA may not actually be translated, whilst other short sequences may code protein
Skill:
• Identification of an open reading frame
Any particular stretch of DNA will have six reading frames that could potentially code for a functional protein
- mRNA is translated in codons (triplets of bases), meaning there are three potential reading frames for a given DNA sequence
- DNA is double stranded and either strand could include a gene, meaning there are six reading frames in total (2 × 3)
To identify an open reading frame:
- Locate a sequence corresponding to a start codon in order to determine the reading frame – this will be ATG (sense strand)
- Read this sequence in base triplets until a stop codon is reached (TGA, TAG or TAA)
- The longer the sequence, the more significant the likelihood that the sequence corresponds to an open reading frame
Certain bioinformatic programs can automatically identify potential ORFs when provided with a candidate sequence
- Gene sequences are largely conserved – so if an ORF sequence is present in multiple genomes, it likely represents a gene
Identification of an Open Reading Frame
Link: ORF Finder
Understanding:
• The target gene is linked to other sequences that control its expression
The expression of a gene is controlled by other additional sequences that regulate transcriptional activity
- A core promoter sequence functions as an initiation site where a complex of transcription factors are assembled
- Control elements may serve to regulate the rate of transcription – either increasing (enhancers) or decreasing (silencers)
When a target gene is selected, it is linked to these other sequences to form a recombinant construct capable of expression
- Including a promoter sequence as part of the construct will ensure the autonomous expression of the target gene
- Control elements may be included within the construct to allow scientists to determine the rate and timing of expression
Controlling Expression of a Target Gene