Curriculum 2006
| Lecture Title | Date & Time | html | .ppt | Workshop | References |
| Program Overview | Mon 6/19 9:00-12:00pm |
view | download | ||
| Molecular life Science Review | Mon 6/19 1:00-4:00pm |
view | download | ||
| Python I | Mon 6/19 1:00-4:00pm |
view | download | ||
| Literature Databases | Tues 6/20 9:00-12:00pm |
view | download | workshop | reference |
| Sequence Comparisons | Tues 6/20 1:00-4:00pm |
view | download | workshop | reference |
| Python II | Wed 6/21 9:00-12:00pm |
view | download | ||
| Python III | Wed 6/21 1:00-4:00pm |
view | download | ||
| Database Searching | Thurs 6/22 9:00-12:00pm |
view | download | ||
| Sequence Databases | Fri 6/22 1:00-4:00pm |
view | download | workshop | reference |
| Professional Development | Fri 6/23 9:00-12:00pm |
||||
| Research Site Visit | Fri 6/23 1:00-4:00pm |
||||
| Statistics I | Mon 6/26 9:00-12:00pm |
||||
| Statistics II | Mon 6/26 1:00-4:00pm |
||||
| Statistics III | Tues 6/27 9:00-12:00pm |
||||
| Global and Local Alignment | Tues 6/27 1:00-4:00pm |
view | download | workshop | |
| Longest Common Substrong Algorithm (LCS) | Wed 6/28 9:00-12:00pm |
view | download | ||
| Python IV | Wed 6/28 1:00-4:00pm |
view | download | ||
| Space Efficient Alignment Algorithms | Thurs 6/29 9:00-12:00pm |
view | download | ||
| Local and Global Alignment | Thurs 6/29 1:00-4:00pm |
view | download | ||
| Multiple Sequence Alignment | Fri 6/30 9:00-12:00pm |
view | download | workshop | references |
| Research Seminar | Fri 6/30 1:00-5:00pm |
||||
| Protein Structure Prediction | Mon 7/3 9:00-12:00pm |
view | download | workshop | |
| Ethics of the Human Genome | Mon 7/3 1:00-4:00pm |
view | download | ||
| Holiday | Tues 7/4 Entire Day |
||||
| Protein Structure | Wed 7/5 9:00-12:00pm |
view | download | ||
| Signal Transduction Pathways | Wed 7/5 1:00-4:00pm |
view | download | ||
| Microarrays-Cluster Analysis I | Thurs 7/6 9:00-12:00pm |
view | download | ||
| Microarrays-Cluster Analysis II | Thurs 7/6 1:00-4:00pm |
||||
| Microarrays-Cluster Analysis III | Fri 7/7 9:00-12:00pm |
view | download | ||
| Sequence Alignment Project Programming | Fri 7/7 1:00-5:00pm |
||||
| Evaluation | Mon 7/10 9:00pm |
||||
| Assessment | Mon 8/21 9:00am |
||||
| Assessment | Thurs 8/24 3:30pm |
Workshop
Literature Databases - Workshop
Workshop A:
Set up a cubby account on a biological topic of interest. Show the instructor the cubby account you set up. Subscribe to NCBI News.
Workshop B:
Go to OMIM Website and type "Breast cancer". Link to MIM#113705. What does the light bulb represent? What do the links with the numbers lead to?
Obtain the following information on BRCA1:
Sequence Comparisons - Workshop
Consider the sequence GAACTCATACGAATTCACGTCAGCCCATCG
Use a window of 3 nucleotides and slide the window 1 nucleotide at a time. Calculate the %GC as a function of nucleotide number. Draw a graph. Change the window to 2 nucleotides. Then overlap the two plots. You may use Excel. Print out the spread sheet and the graph.
Given the following sequence: PLSQETFSDLWKLLPENNVLSP use the Kyte/Doolittle Hydropathy scale and a sliding window of 7 amino acids to construct a hydropathy plot.
Find the protein sequence for bacteriorhodopsin. Make sure you obtain the full-length sequence. Find the Kyte-Doolittle Hydropathy program software at the Expasy Tools website (TGREASE). Perform Kyte-Doolittle analysis of bacteriorhodopsin. Compare the plot to the one displayed in lecture today. Are there differences in the two plots? If so, why?
Workshop A:
Workshop B:
Genes in eukaryotes are often organized into exons and introns, which require post-transcriptional splicing to produce a mature mRNA with a contiguous open reading frame for translation. This broken organization can make gene identification difficult in eukaryotes and particularly in higher eukaryotes with complex gene organization. Prediction of many genes and their organization has been based on similarity searches between genomic sequence and known protein amino acid sequences and/or genomic sequence and the corresponding full-length cDNAs or even ESTs.
Below is a small portion (~1,500 bp) of the C. elegans genome:
ATTTTTAAAAATGTACAAAATCAAACGCCCTACAAATCATGTGTGTGAAGAAGAATAATAACTAACATATUse this sequence to carry out the following:
Conduct a blastx search (BLAST) of the Swiss Protein database to attempt to identify regions in this sequence encoding amino acids with similarity to known proteins in this database. You can get to blastx software through Expasy: http://www.expasy.org/sprot/.
Write down the answers to the following questions.
B1. What does the blastx algorithm do with your nucleotide sequence before searching SwissProt for matches?
B2. From the blastx output, to what protein does this region of genomic DNA have significant?
B3. How can you tell that the coding regions for the amino acids within the matched protein are not located within a single contiguous region of the genomic DNA? (There is more than one way to tell.)
B4. How many separate regions of the genomic DNA align with the highest scoring match in the output?
B5. What essential feature of the organization of the gene does the above information provide?
B6. Note the numbering of the sequences in the alignments. Does the database genomic sequence progress in the same direction as the database amino acid sequences in the alignments? In other words is it the same orientation (below):
1.................................114 = query
61...............................98 = subject
or opposite orientation (below):
1.................................114 = query
98...............................61 = subject
B6. What does the orientation of the sequences in the alignment relative to each other tell you about the gene orientation relative to the sequence that was used as the query sequence?
Global and Local Alignment - Workshop
By hand, perform local alignment on the following two sequences:
Use the Blosum 45 matrix for scoring with the default gap penalty value of -5. Determine highest path score and the percent similarity for the local alignment of the highest path score.
Multiple Sequence Alignment - Workshop
Workshop A
Obtain the following protein sequences from any public database you wish and align using CLUSTALW program. The protein sequences are: Human MDM2, Hamster MDM2, Murine MDM2, Xenopus MDM2 and Zebrafish MDM2. The human MDM2 sequences is approximately 490 amino acids in length. Name three areas that are structually conserved amongst these orthologs.
According to the Guide Tree, which two sequences have the highest similarity?
Perform CLUSTALW just using Human MDM2 and Zebrafish MDM2 sequences. Does the human/zebrafish alignment in this run differ from the human/zebrafish alignment obtained in the first run?
Explain why.
Workshop B
There exists a paralog of MDM2. Obtain the sequences of the human paralog MDMX and the mouse paralog MDMX and perform multiple sequence alignment again together with the original five MDM2 sequences. Give the domains (in amino acid number ranges) that are highly conserved within sequences of this entire family. Use the human MDM2 amino acid numbers as the reference when explaining the ranges that are conserved.
Workshop C
The quagga was an African animal that is now extinct. It looked partly like a horse and partly like a zebra. In 1872, the last living quagga was photographed. More recently, mitochondrial DNA was obtained from a museum quagga specimen and sequenced. Perform a multiple sequence alignment of quagga (Equus quagga boehmi), horse (Equus caballus), and zebra (Equus burchelli) mitochondrial DNA. To which animal was the quagga more closely related?
Protein Structure Prediction - Workshop
Workshop A-Check to see if the BLIMPs program in the BLOCK searcher can predict the function of PTEN (NP_000305). PTEN is an abbreviation for a protein called the phosphatase and tensin homolog. Obtain the protein sequence from protein database at NCBI. Convert the sequence to FASTA format. Paste sequence into window in BLOCK Searcher (http://blocks.fhcrc.org/blocks_search.html). Determine the major function based on the BLOCK Searcher output. Find out the actual function of PTEN by performing a text search for PTEN in the OMIM database. Did this BLOCK searcher help assess the function of PTEN?
Workshop B-Find the complete amino acid sequence of human p53 and perform a secondary structure prediction with Psi-PRED, GOR, Chou-Fasman, or another secondary structure prediction algorithm.
Workshop C-Calculation of Q3 value of secondary structure prediction program. Go to the Protein Data Bank and obtain the record for the p53 crystal structure (1TSR). There are three identical p53 polypeptides in the record named A, B and C. Choose one of the polypeptides for this exercise. In the remarks section of the record you will observe an assignment of secondary structure for many of the amino acids. These will either be named "helix" or "sheet". For amino acids in the structure that were not assigned to "helix" or "sheet" class assume that they adopt a "coil" structure. Create a line graph that places the amino acid sequence in one row and the known secondary structure from the PDB record that amino acid in the next row. Next, use the predicted structure from Workshop B. Create a third row on the line graph that shows the predicted structure. The 1TSR file only contains the DNA binding domain of p53 so you will only be able to cover about half of the protein. If you can, obtain other portions of p53 where the structure has been solved from the Protein Data Bank (in different records) and fill in those regions in the second row that were not obtained in the 1TSR record. Show the instructor the line figure and calculate the percent accuracy of the Psi-PRED prediction. A hypothetical example is shown below
Percent accuracy: 14/15 X 100
References
http://adonis.creighton.edu/hsl/Searching/Medline-Fields.html
http://cmgm.stanford.edu/classes/csuh/literature/
http://hml.org/WWW/class/help/medcite.html
http://www.nlm.nih.gov/mesh/meshhome.html
Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 2001
http://www.infobiogen.fr/doc/dotter.html
Segurado et al., EMBO Reports, 4 1048-1053, 200
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
Pevsner, J., Bioinformatics and Functional Genomics, Wiley-Liss, Hoboken, NJ, 2003
Baxevanis and Ouellette, Bioinformatics 2nd Ed, Wiley-Interscience, New York, 2001
Misener and Krawetz, Bioinformatics Methods and Protocols, Humana Press, Totowa, NJ, 2000
Pevsner, Bioinformatics and Functional Genomics, Wiley-Liss, Hoboken, 2003.
Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 1998.
Feng and Doolittle, J. Mol. Evol. 25, 351-360, 1987.
Thompson et al., Nuc. Acids Res. 22, 4673-4690, 1994.