Curriculum 2006

 

Lecture Title Date & Time html .ppt Workshop References
Program Overview Mon 6/19
9:00-12:00pm
view download    
Molecular life Science Review Mon 6/19
1:00-4:00pm
view download    
Python I Mon 6/19
1:00-4:00pm
view download    
Literature Databases Tues 6/20
9:00-12:00pm
view download workshop reference
Sequence Comparisons Tues 6/20
1:00-4:00pm
view download workshop reference
Python II Wed 6/21
9:00-12:00pm
view download    
Python III Wed 6/21
1:00-4:00pm
view download    
Database Searching Thurs 6/22
9:00-12:00pm
view download    
Sequence Databases Fri 6/22
1:00-4:00pm
view download workshop reference
Professional Development Fri 6/23
9:00-12:00pm
       
Research Site Visit Fri 6/23
1:00-4:00pm
       
Statistics I Mon 6/26
9:00-12:00pm
  PDF    
Statistics II Mon 6/26
1:00-4:00pm
  PDF    
Statistics III Tues 6/27
9:00-12:00pm
  PDF    
Global and Local Alignment Tues 6/27
1:00-4:00pm
view download workshop  
Longest Common Substrong Algorithm (LCS) Wed 6/28
9:00-12:00pm
view download    
Python IV Wed 6/28
1:00-4:00pm
view download    
Space Efficient Alignment Algorithms Thurs 6/29
9:00-12:00pm
view download    
Local and Global Alignment Thurs 6/29
1:00-4:00pm
view download    
Multiple Sequence Alignment Fri 6/30
9:00-12:00pm
view download workshop references
Research Seminar Fri 6/30
1:00-5:00pm
       
Protein Structure Prediction Mon 7/3
9:00-12:00pm
view download workshop  
Ethics of the Human Genome Mon 7/3
1:00-4:00pm
view download    
Holiday Tues 7/4
Entire Day
       
Protein Structure Wed 7/5
9:00-12:00pm
view download    
Signal Transduction Pathways Wed 7/5
1:00-4:00pm
view download    
Microarrays-Cluster Analysis I Thurs 7/6
9:00-12:00pm
view download    
Microarrays-Cluster Analysis II Thurs 7/6
1:00-4:00pm
       
Microarrays-Cluster Analysis III Fri 7/7
9:00-12:00pm
view download    
Sequence Alignment Project Programming Fri 7/7
1:00-5:00pm
       
Evaluation Mon 7/10
9:00pm
       
Assessment Mon 8/21
9:00am
       
Assessment Thurs 8/24
3:30pm
       

Workshop

Literature Databases - Workshop

Workshop A:
Set up a cubby account on a biological topic of interest. Show the instructor the cubby account you set up. Subscribe to NCBI News.

Workshop B:
Go to OMIM Website and type "Breast cancer". Link to MIM#113705. What does the light bulb represent? What do the links with the numbers lead to?

Obtain the following information on BRCA1:

Back to Top

Sequence Comparisons - Workshop

Consider the sequence GAACTCATACGAATTCACGTCAGCCCATCG

Use a window of 3 nucleotides and slide the window 1 nucleotide at a time. Calculate the %GC as a function of nucleotide number. Draw a graph. Change the window to 2 nucleotides. Then overlap the two plots. You may use Excel. Print out the spread sheet and the graph.

Given the following sequence: PLSQETFSDLWKLLPENNVLSP use the Kyte/Doolittle Hydropathy scale and a sliding window of 7 amino acids to construct a hydropathy plot.

Find the protein sequence for bacteriorhodopsin. Make sure you obtain the full-length sequence. Find the Kyte-Doolittle Hydropathy program software at the Expasy Tools website (TGREASE). Perform Kyte-Doolittle analysis of bacteriorhodopsin. Compare the plot to the one displayed in lecture today. Are there differences in the two plots? If so, why?

Back to Top

Sequence Databases - Workshop

Workshop A:

  1. Use the following accession number to find a sequence in a nucleotide database: Z68198
  2. Print out the first 1000 nucleotides of the sequence and decifer the open reading frame(s) in this segment from the annotations listed in the flat file.
  3. Underline the open reading frame(s) and give the direction of the coding strand (5' to 3'). (Remember to distinguish between template strand and coding strand.) Show to the instructor.

Workshop B:

Genes in eukaryotes are often organized into exons and introns, which require post-transcriptional splicing to produce a mature mRNA with a contiguous open reading frame for translation. This broken organization can make gene identification difficult in eukaryotes and particularly in higher eukaryotes with complex gene organization. Prediction of many genes and their organization has been based on similarity searches between genomic sequence and known protein amino acid sequences and/or genomic sequence and the corresponding full-length cDNAs or even ESTs.

Below is a small portion (~1,500 bp) of the C. elegans genome:

ATTTTTAAAAATGTACAAAATCAAACGCCCTACAAATCATGTGTGTGAAGAAGAATAATAACTAACATAT
CTATTTATATTTACCGAATAAATATATATTCATCAATTAACCTGAAGAACAAACGAATTCGGCTACAGGC
GTCGATCAGTCTCGAATCTAGTAACAACAAGAGAGCAATACGAAAACCGGTAAATCAATAGGGGGAAGCG
AAACAGTAGGTACAAATTGGAGGGGAAGCACCAATACATTAGGTGGGGGGTACGACTTGAAAAATGAGCT
GATTTTCGAATAGTTAAAGCGATGATCGTGTCCGAAAAACAGTTCATTTTTCAAGACAACATTGAGACTG
GGAGTACGGGGAAGCTCATTTACGGTGAGAGGAATTGGTGAGATCTTTAGAATATGCTTAAGGAGTTGGG
GTGGCTGGAGAAGTTCCTGTAGCCTCCGTGCCGGGATTCGATGGAGAAGTCGTTGCGGCTGGTCCCTTTT
CCTTCACTGGTGCTGGATCCTTGGCTGGAAGACATATGCGTGGCTTGACAGTCGATGAGGTGCGAGCCGA
CGAGTCCTTGTGAACTTCGTATCTGGAAATATTTTACTTAGATAGCAAATACTAAAATTGTAAAATTACC
TCAAAATCTCAGTATCCGGAATGCTCAATTTCTGCTTCAAAACCTGTCCGATGCGAAGATTGACATCATC
GCGAGTAGCATCACGAGTCCACAAGGAAACCTTGTCACCCTTTTGACGAACATTCACGACAGCTCCGCAG
ATGTAGTCTCCGTACTCGTCGAATTGCTCTCCAACAATAGCCATCAACAGCTCCAACCAGTAGTGATCGA
GCAATTGCGTTCTTCTCTGAAGCTTCTATGATTCATTGAATAAAATATATTTCTCAAAACGTACTTGCTT
ATCGACAACAACCAACCAACGTCCACCTTGAACGTTGTTGACGTCCTCCCACATTGGCTTGATTCCTTCC
TTGAACAAGTAATAATCGGATCCCCAGTTCAATCCTCCGGCAGACTGAATGTGATTGTACAGCGACCAGA
AGTCCTCGACAGTGTCGAAAAGTGAAACCATCTGGAAAAAATCGATAAAAGACGTATTTAAAAATCTTCT
ACCTTCAGACAATCCTCCCATTCCTTGTTACGGTCAGCTTTCAAGTACCAGAGAGCCCAGCGATTCTGGA
GGGGGTGTCTGGTGAGAAGCTCTGGAGGAACTGAAGCATCGGACGCATTCACATCGCCGGAAGCTGACAA
TGCTTTGTTTTCCGCTACGGATGTGCTCATTTAGCTGAAAATAGGTAATATTATATACGATTAGAGCTCG
GAAAACGATAAAATAGAGAAGAGTATGAATTTGGTTCAAATAACTCGGATTTTATAGGAAATTTTGTTTT
ACTGCACATTTTCGGCTAGTTTCCAAGCTTTTTAGATTTTTCAAGTGTAATTGGTAACATCGGGCACAAT
AAATTGATATTAAAGCTTGGAAAACAATAAA

Use this sequence to carry out the following:

Conduct a blastx search (BLAST) of the Swiss Protein database to attempt to identify regions in this sequence encoding amino acids with similarity to known proteins in this database. You can get to blastx software through Expasy: http://www.expasy.org/sprot/.

Write down the answers to the following questions.

B1. What does the blastx algorithm do with your nucleotide sequence before searching SwissProt for matches?

B2. From the blastx output, to what protein does this region of genomic DNA have significant?

B3. How can you tell that the coding regions for the amino acids within the matched protein are not located within a single contiguous region of the genomic DNA? (There is more than one way to tell.)

B4. How many separate regions of the genomic DNA align with the highest scoring match in the output?

B5. What essential feature of the organization of the gene does the above information provide?

B6. Note the numbering of the sequences in the alignments. Does the database genomic sequence progress in the same direction as the database amino acid sequences in the alignments? In other words is it the same orientation (below):

1.................................114 = query

61...............................98 = subject

or opposite orientation (below):

1.................................114 = query

98...............................61 = subject

B6. What does the orientation of the sequences in the alignment relative to each other tell you about the gene orientation relative to the sequence that was used as the query sequence?

Back to Top

Global and Local Alignment - Workshop

By hand, perform local alignment on the following two sequences:

Use the Blosum 45 matrix for scoring with the default gap penalty value of -5. Determine highest path score and the percent similarity for the local alignment of the highest path score.

Back to Top

Multiple Sequence Alignment - Workshop

Workshop A
Obtain the following protein sequences from any public database you wish and align using CLUSTALW program. The protein sequences are: Human MDM2, Hamster MDM2, Murine MDM2, Xenopus MDM2 and Zebrafish MDM2. The human MDM2 sequences is approximately 490 amino acids in length. Name three areas that are structually conserved amongst these orthologs.

According to the Guide Tree, which two sequences have the highest similarity?

Perform CLUSTALW just using Human MDM2 and Zebrafish MDM2 sequences. Does the human/zebrafish alignment in this run differ from the human/zebrafish alignment obtained in the first run?

Explain why.

Workshop B
There exists a paralog of MDM2. Obtain the sequences of the human paralog MDMX and the mouse paralog MDMX and perform multiple sequence alignment again together with the original five MDM2 sequences. Give the domains (in amino acid number ranges) that are highly conserved within sequences of this entire family. Use the human MDM2 amino acid numbers as the reference when explaining the ranges that are conserved.

Workshop C
The quagga was an African animal that is now extinct. It looked partly like a horse and partly like a zebra. In 1872, the last living quagga was photographed. More recently, mitochondrial DNA was obtained from a museum quagga specimen and sequenced. Perform a multiple sequence alignment of quagga (Equus quagga boehmi), horse (Equus caballus), and zebra (Equus burchelli) mitochondrial DNA. To which animal was the quagga more closely related?

Back to Top

Protein Structure Prediction - Workshop

Workshop A-Check to see if the BLIMPs program in the BLOCK searcher can predict the function of PTEN (NP_000305). PTEN is an abbreviation for a protein called the phosphatase and tensin homolog. Obtain the protein sequence from protein database at NCBI. Convert the sequence to FASTA format. Paste sequence into window in BLOCK Searcher (http://blocks.fhcrc.org/blocks_search.html). Determine the major function based on the BLOCK Searcher output. Find out the actual function of PTEN by performing a text search for PTEN in the OMIM database. Did this BLOCK searcher help assess the function of PTEN?

Workshop B-Find the complete amino acid sequence of human p53 and perform a secondary structure prediction with Psi-PRED, GOR, Chou-Fasman, or another secondary structure prediction algorithm.

Workshop C-Calculation of Q3 value of secondary structure prediction program. Go to the Protein Data Bank and obtain the record for the p53 crystal structure (1TSR). There are three identical p53 polypeptides in the record named A, B and C. Choose one of the polypeptides for this exercise. In the remarks section of the record you will observe an assignment of secondary structure for many of the amino acids. These will either be named "helix" or "sheet". For amino acids in the structure that were not assigned to "helix" or "sheet" class assume that they adopt a "coil" structure. Create a line graph that places the amino acid sequence in one row and the known secondary structure from the PDB record that amino acid in the next row. Next, use the predicted structure from Workshop B. Create a third row on the line graph that shows the predicted structure. The 1TSR file only contains the DNA binding domain of p53 so you will only be able to cover about half of the protein. If you can, obtain other portions of p53 where the structure has been solved from the Protein Data Bank (in different records) and fill in those regions in the second row that were not obtained in the 1TSR record. Show the instructor the line figure and calculate the percent accuracy of the Psi-PRED prediction. A hypothetical example is shown below

Percent accuracy: 14/15 X 100

 

Back to Top

References

Literature Databases

http://adonis.creighton.edu/hsl/Searching/Medline-Fields.html
http://cmgm.stanford.edu/classes/csuh/literature/
http://hml.org/WWW/class/help/medcite.html
http://www.nlm.nih.gov/mesh/meshhome.html

Back to Top

Sequence Comparisons

Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 2001
http://www.infobiogen.fr/doc/dotter.html
Segurado et al., EMBO Reports, 4 1048-1053, 200

Back to Top

Sequence Databases

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
Pevsner, J., Bioinformatics and Functional Genomics, Wiley-Liss, Hoboken, NJ, 2003
Baxevanis and Ouellette, Bioinformatics 2nd Ed, Wiley-Interscience, New York, 2001
Misener and Krawetz, Bioinformatics Methods and Protocols, Humana Press, Totowa, NJ, 2000

Back to Top

Multiple Sequence Alignment

Pevsner, Bioinformatics and Functional Genomics, Wiley-Liss, Hoboken, 2003.
Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 1998.
Feng and Doolittle, J. Mol. Evol. 25, 351-360, 1987.
Thompson et al., Nuc. Acids Res. 22, 4673-4690, 1994.

 

Back to Top