Curriculum 2008
| Lecture Title | Date & Time | html | .ppt | Workshop | References |
| Program Overview | Mon, 6/16 9:00 AM |
view | download | ||
| Molecular Life Science Review | Mon, 6/16 1:00 PM |
||||
| Python I | Mon, 6/16 1:00 PM |
view | download | ||
| Literature Databases | Tues, 6/17 9:00 AM |
view | download | ||
| Sequence Comparisons | Tues, 6/17 1:00 PM |
view | download | workshop | |
| Python II | Wed, 6/18 9:00 AM |
view | download | ||
| Python III | Wed, 6/18 1:00 PM |
view | download | ||
| Database Searching (Scoring Matrices) |
Thurs, 6/19 9:00 AM |
view | download | workshop | reference |
| Professioinal Development | Thurs, 6/19 1:00 PM |
||||
| Sequence Databases | Fri, 6/20 1:00 PM |
view | download | workshop | |
| Statistics I | Mon, 6/23 9:00 AM |
link | |||
| Statistics II | Mon, 6/23 1:00 PM |
link | |||
| Statistics III | Tues, 6/24 9:00 AM |
link | |||
| Applications of Statistics | Tues, 6/24 1:00 PM |
view | download | workshop | |
| Longest Common Substrong Algorithm (LCS) | Wed, 6/25 9:00 AM |
view | download | ||
| Python IV | Wed, 6/25 1:00 PM |
view | download | ||
| Global and Local Alignment (Sequence Alignment Algorithms) |
Thurs, 6/26 9:00 AM |
view | download | workshop | |
| Space Efficient Alignment Algorithms | Thurs, 6/26 1:00 PM |
view | download | ||
| BLAST and MSA | Fri, 6/27 9:00 AM |
view | download | workshop | references |
| Protein Structure Prediction | Mon, 6/30 9:00 AM |
view | download | workshop | references |
| Ethics of the Human Genome | Mon, 6/30 1:00 PM |
||||
| Protein Structure Manipulation | Tues, 7/1 9:00 AM |
view | download | workshop | references |
| Proteome Analysis | Tues, 7/1 1:00 PM |
view | download | workshop | refernces |
| Microarrays Session 1 | Wed, 7/2 . |
view | download | workshop | |
| Microarrays Gene Ontology | Wed, 7/2 . |
view | download | ||
| Solexa Sequencing | Wed, 7/2 . |
view | download | ||
| Microarrays Clustering | Wed, 7/2 . |
view | download | ||
| Sequence Alignment Project Programming | Thurs, 7/3 9:00 AM |
Workshop
Sequence Comparisons - Workshop
1. Consider the sequence
GAACTCATACGAATTCACGTCAGCCCATCGTGCCACGT
Use a window of 3 nucleotides and slide the window 1 nucleotide at a time. Calculate the %G+C as a function of nucleotide number. You may use an Excel spreadsheet and create a plot. Change the window to 5 nucleotides and create a second plot. Overlap the two plots. Show your instructor the spread sheet and the graph.
2. Given the following sequence: PLSQETFSDLWKLLPENNVLSP use the Kyte/Doolittle Hydropathy scale and a sliding window of 7 amino acids to construct a hydropathy plot.
3. Find the protein sequence for bacteriorhodopsin. Make sure you obtain the full-length sequence. Find the Kyte-Doolittle Hydropathy program software at the Expasy Tools website (TGREASE). Perform Kyte-Doolittle analysis of bacteriorhodopsin. Compare the plot to the one displayed in lecture today. Are there differences in the two plots? If so, why?
4. Import the human p53 (Accession number AAH03596) and squid p53 (Accession number AAA98563) sequences from the protein databases at NCBI onto your hard drive in FASTA format. This can be accomplished by changing the display format on the ENTREZ screen to FASTA. Highlight the entry and copy onto clipboard. Open NotePad on your local hard drive. Paste each sequence into a separate document and save them in a folder named "sequence" on C drive. Name the documents p53_human and p53_squid.
Type dotter c:\sequence\p53_human.txt c:\sequence\p53_human.txt RETURN
Do you detect some parallel lines? Why? What does the greyramp tool do?
Capture the image and save..
Type dotter c:\sequence\p53_human.txt c:\sequence\p53_squid.txt RETURN
What are the differences between the human vs. human comparison and the human vs. squid comparison?
Database Searching - Scoring Matrices - Workshop
Scoring Matrices - Workshop
A. Download PAM250 and PAM40 from internet and print out. What are the differences between the two matrices? Why do you see these differences?
B. Download BLOSUM80 and BLOSUM30. What are the differences between the two matrices? Why do you see these differences?
C. Obtain the mouse p53 sequence and compare it to human p53 with the Dotter program. According to your analysis, do you detect more similarity with human vs. mouse or human vs. squid? Are some subregions within the human p53 protein more conserved than others?
D. The p53 protein is known to have a certain number of conserved domains. Use the Dotter program and a series of p53 proteins from different species (at least 5 proteins) to determine the number of conserved domains and the boundaries of the conserved domains. In reporting the boundaries, use the human sequence number as the standard.
E. Create your own scoring matrix that shows fairly good results with simple sequences using the Dotter program. For example, you may choose to create a scoring matrix that only gives high marks for charged residues similarities but low marks for other amino acid similarities. An easy way to do this is to use one of the BLOSUM or PAM matrices as a template and change the numbers slightly. Show that your scoring matrix works in the Dotter program for some simple polypeptides that are 50 amino acids in length (you may choose your own sequences). Explain the purpose of your scoring matrix and explain your justification for the numbers you choose.
Applications of Statistics - Workshop
Donwload Files:
Aspirin Study Analysis (xls)
Aspirin Study Analysis 2 (xls)
Aspirin Study Analysis 2 (xlsx)
Aspirin Study Data (csv)
Chitest (csv)
Clinical Trials Analysis (pdf)
Clinical Trials Analysis 2 (pdf)
t_Test (csv)
http://homepage.mac.com/rmjohnston/FileSharing6.html
Note:
The .xlsx Excel file requires Office 2007 (PC) or Office 2008 (Mac).
If you have earlier versions of Excel you only need to download the .xls files.
Workshop A
Use Entrez to find the C-terminal region (approximately 215 residues) of human BRCA1 (SWISS-PROT accession number P38398). Search the NR protein database with this sequence using PSI-BLAST. Save your search results. Now perform a second iteration. Compare your new search results. Some sequence alignments from the second search have higher scores than the same sequence alignment obtained from the first search. Why? Alternatively, some sequence alignments from the second search have lower scores than the same sequence alignments obtained from the first search. Why
Workshop B
Obtain the following protein sequences from any public database you wish and align using CLUSTALW program. The protein sequences are: Human MDM2, Hamster MDM2, Murine MDM2, Xenopus MDM2 and Zebrafish MDM2. The human MDM2 sequences is approximately 490 amino acids in length. Name three areas that are structually conserved amongst these orthologs.
According to the Guide Tree, which two sequences have the highest similarity?
Perform CLUSTALW just using Human MDM2 and Zebrafish MDM2 sequences. Does the human/zebrafish alignment in this run differ from the human/zebrafish alignment obtained in the first run?
Explain why.
Workshop C
There exists a paralog of MDM2. Obtain the sequences of the human paralog MDMX and the mouse paralog MDMX and perform multiple sequence alignment again together with the original five MDM2 sequences. Give the domains (in amino acid number ranges) that are highly conserved within sequences of this entire family. Use the human MDM2 amino acid numbers as the reference when explaining the ranges that are conserved.
Workshop D
The quagga was an African animal that is now extinct. It looked partly like a horse and partly like a zebra. In 1872, the last living quagga was photographed. More recently, mitochondrial DNA was obtained from a museum quagga specimen and sequenced. Perform a multiple sequence alignment of quagga (Equus quagga boehmi), horse (Equus caballus), and zebra (Equus burchelli) mitochondrial DNA. To which animal was the quagga more closely related?
Protein Structure Prediction - Workshop
Workshop A
Find the complete amino acid sequence of human p53 and perform a secondary structure prediction with PSIPRED or another secondary structure prediction algorithm. Have the results emailed to you or displayed on your computer. Explain the results to the instructor.
Workshop B
Check to see if the BLIMPs program in the BLOCK searcher can predict the function of PTEN (protein sequence accession number NP_000305). PTEN is an abbreviation for phosphatase and tensin homolog Obtain sequence from protein database at NCBI. Convert to FASTA format. Paste sequence into window in BLOCK Searcher (http://blocks.fhcrc.org/blocks/blocks_search.html). Determine the major function based on thee BLOCK Searcher output. Determine the actual function of PTEN by performing a text search for PTEN in the OMIM database. Did the BLOCK searcher correctly predict the function of PTEN?
Workshop C
Calculation of Q3 value of secondary structure prediction program. Go to the Protein Data Bank and obtain the record for the p53 crystal structure (1TSR). There are three identical p53 polypeptides in the record named A, B and C. Choose one of the polypeptides for this exercise. In the remarks section of the record you will observe an assignment of secondary structure for many of the amino acids. These will either be named "helix" or "sheet". For amino acids in the structure that were not assigned to "helix" or "sheet" class assume that they adopt a "coil" structure. Create a line graph that places the amino acid sequence in one row and the known secondary structure from the PDB record that amino acid in the next row. Next, use the predicted structure from Workshop B. Create a third row on the line graph that shows the predicted structure. The 1TSR file only contains the DNA binding domain of p53 so you will only be able to cover about half of the protein. If you can, obtain other portions of p53 where the structure has been solved from the Protein Data Bank (in different records) and fill in those regions in the second row that were not obtained in the 1TSR record. Show the instructor the line figure and calculate the percent accuracy of the Psi-PRED prediction. A hypothetical example is shown below
Sequence: MEETHAPYRGVCNNM
Actual Structure: CCCCCHHHHHHEEEE
PSIPRED Predict.: CCCCCHHHHHHEEEH
Percent accuracy: 14/15 X 100
Protein Structure Manipulation - Workshop
Download structure coordinates for 1HEW protein from PDB onto your hard drive.
Follow the tutorial for viewing protein structures at:
http://www.usm.maine.edu/~rhodes/SPVTut/text/SPdbVTut.html
You can start the tutorial at Section 2--Windows and help.
If you are already advanced in the manipulation of protein structures you may attempt to predict a 3D model of the protein given a primary sequence.
http://expasy.org/spdbv/text/modeling.htm
A more up-to-date swiss model site is located at:
http://swissmodel.expasy.org/workspace/index.php?func=modelling_overview&userid=USERID&token=TOKEN
Choose a disease-related protein that you studied. Obtain its amino acid sequence. Estimate its location on a 2-D-gel with the following tool:
http://www.expasy.ch/cgi-bin/2dregion-for-seq.pl
Can you justify its position on the gel? Show the instructor. Perform an in silico trypsin-mediated digest of your protein. Obtain monoisotopic masses of peptides and feed data to Mascot server. Determine the minimum number of peptide masses necessary to give a correct identification of your protein. Is there a relationship between the number of peptide masses used and the accuracy of the prediction? What is the number of significant figures needed to get a positive identification? Is there an optimal mass range for positive identification?
Repeat this exercise using Protein Prospector. Which software program is better at retrieving your results?
References
Database Searching - Scoring Matrices - Reference
CSULA online manual chapter 4
http://www.calstatela.edu/faculty/jmomand/Bioinformatics%20Manual.pdf
Pevsner, Bioinformatics and Functional Genomics, Wiley-Liss, Hoboken, 2003.
Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 1998.
Feng and Doolittle, J. Mol. Evol. 25, 351-360, 1987.
Thompson et al., Nuc. Acids Res. 22, 4673-4690, 1994.
Protein Structure Prediction - References
Moult, J., Predicting Protein Three-Dimensional Structure, Current Opinion in Biotechnology 10, 583-588, 1999.
Chou, P.Y. and Fasman, G.D. Prediction of protein conformation. Annu. Rev. Biochem. 47, 251-276, 1978.
White, F.H. and Anfinsen, C.B. Some relationships of structure to function in ribonuclease. Ann. N.Y. Acad. Sci. 81: 515-523, 1959.
Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol. 292: 195-202, 1999.
Protein Structure Manipulation - References
http://swissmodel.expasy.org//course/text/chapter6.htm
Wang et al., Nucleic Acids Research 28, 243-245, 2000.
http://www.iucr.org/iucr-top/comm/ccom/School96/pdf/sb.pdf
http://www.usm.maine.edu/~rhodes/SPVTut/index.htm
http://swissmodel.expasy.org/workspace/index.php?func=modelling_overview&userid=USERID&token=TOKEN
Proteome Analysis - References
Alberts et al. Molecular Biology of the Cell, 3rd edn.
http://www.davidson.edu/academic/biology/courses/Molbio/SDSPAGE/SDSPAGE.html
http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS
http://prospector.ucsf.edu/