|
1
|
|
|
2
|
- References for Days 1, 2, and 3 and the optional lecture are found at
the end of the optional PowerPoint.
They are not in alphabetical order.
|
|
3
|
- To become familiar with the wet-bench basics of 2-color DNA microarrays
for expression analysis
- In the context of 2-color expression arrays, use GeneSpring for
selection of differentially expressed genes and for cluster analysis
- To identify points at which uncertainties and inaccuracies may arise in
2-color expression analysis
- In the context of 2-color expression arrays, to use EXCEL for processing
of primary data
- Optional
- To work with annotation of microarray data (optional)
- To work with a major microarray data base (optional)
- To become aware of several additional applications for microarray
technology (optional)
- SNP analysis
- Transcription factor binding site analysis
- Protein analysis
- Cell and Tissue analysis
|
|
4
|
- Day 1
- Lecture
- Principles of expression microarray
- Workshop
- Introduction to GeneSpring
- Day 2
- Lecture
- Principles of major clustering algorithms
- Workshop
- Using GeneSpring for cluster analysis
- Recognizing and dealing with variability in acquired data
- Day 3
- Lecture
- Sources of uncertainty and erro in microarray analysis
- Workshop
- Using EXCEL to process primary data and minimize error
- Optional
- Lecture
- Additional applications of microarray
- Workshop
- Annotation, Stanford Microarray
Database, Motif Searching (optional)
|
|
5
|
- Post-Genomic Era
- Analysis of gene expression then and now
- Principles of microarray (in the context of 2-color arrays)
- Overview
- Printing the array
- Data acquisition
- Data analysis (filtering and clustering)
- Making hypotheses about biological function
|
|
6
|
- Post-genomic = after the sequencing of the genome
- Functional genomics = the study of the genome in order to determine
- coordinated expession of mRNAs for all the genes in the genome
- the control regions of the genes
- the as yet unknown function of additional expressed but non-coding DNA
regions (e.g., template for micro RNAs?)
|
|
7
|
- Proteomics = the identification of all protein products of each gene and
the interactions of all proteins in the cell (leading to systems
biology)
|
|
8
|
- Expression microarrays are used to determine the identity and relative
quantities of each of the different kinds of (m)RNAs expressed by a cell
type of interest in a “single experiment”.
- Examples from your workshops
- What is expressed in tumor as opposed to normal cells?
- What is expressed over the period of development of a given tissue or
cell type (myogenesis)?
- AND WHY MIGHT WE CARE?
|
|
9
|
- THEN AND NOW
- Hypothesis
- My favorite mRNA is expressed in tumor tissue, but not in normal
tissue.
- What you need to know
- The nucleotide sequence of your favorite mRNA
- How to isolate total (m)RNA from normal and tumor tissues
- How to perform an mRNA blot hybridization analysis
- NOW
- Discovery
- Which genes are expressed in tumor tissue but not in normal tissue?
- What you need to know
- The nucleotide sequence of every possible mRNA
- How to isolate total mRNA from normal and tumor tissues
- How to perform and analyze an expression microarray experiment
|
|
10
|
|
|
11
|
|
|
12
|
- Microarrays (CHIPs) allow you to study the expression of thousands of
genes at once.
- Each chip has 10,000 - 100,000 distinct spots.
- Each spot has multiple copies of a sequence that represents a specific
gene and thereby its mRNA.
- Some sequences are known to represent mRNAs from databases of ESTs or
from predicted genes, but as yet have no identified function.
- Some chips “tile” across the entire length of a genome or chromosome,
thereby identifying transcripts coded for by template lacking the
classical structure of a gene.
|
|
13
|
|
|
14
|
- The Wet Experiment
- Requires printed slide(s) = CHIPs
- Requires specialized equipment and software
- Requires fluorescently labeled sample(s) representing total mRNAs from
the sources of interest
- Requires a hybridization or annealing period
- Data Acquisition – Where has hybridization occurred?
- Requires a scanner and associated software
- Data Processing (Extraction, Storage, and Normalization)
- Requires scanner software and storage in a tab delimited file
- Requires EXCEL or similar software
- Data Analysis and Storage
- Requires software for cluster and other types of analysis
- Assigning or hypothesizing regarding function
- Utilizes cluster analysis together with databases that have annotated
genes with known function
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
- http://www.bio.davidson.edu/courses/genomics/chip/chip.html
|
|
20
|
- Print it onto the slide – PCR and long oligos
- Synthesize it on the slide by photolithography – somewhat shorter oligos
|
|
21
|
|
|
22
|
- What information must be built into a program in order to automate
printing?
|
|
23
|
|
|
24
|
- Make an image of the slide by
- Exciting the fluorescent dyes
- Collecting the emitted light
- Generating digital images of the fluorescent signal and
- Quantifying the intensity of the signal
- Two major types
- Laser excitation with a photomultiplier tube (PMT) detector - faster
- Filtered white-light excitation with a charge-coupled device (CCD)
detector
|
|
25
|
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
- Tab Delimited File Output from Scanner
- Save as EXCEL file
- Notice the LogRatio column
- 1 = 2X more mRNA than standard of comparison (red = 2X green)
- 2 = 4X more
- -1 = ½ (or 2X less)
- (red = ½ green)
- -2 = ¼
- Log plot (GeneSpring) or a plot of logs
- More on Thursday – Lecture and EXCEL workshop
- (eliminate bad data; correct for background; correct for technical
variation within a slide; correct for technical variation between
replicate slides; determine the ratio of R to G for each gene)
|
|
30
|
- Selecting differentially expressed genes
|
|
31
|
|
|
32
|
- Places genes with similar expression patterns in groups.
- Sometimes genes of unknown function will be grouped with genes of known
function.
- The functions that are known allow the investigator to hypothesize
regarding the functions of genes not yet characterized.
- Examples:
- Identify genes expressed in a specific tumor type
- Identify genes coordinately involved in a developmental pathway
- Identify genes involved in a disease response
- Identify genes important in cell cycle regulation
- Identify genes that participate in a biosynthetic pathway
- Identify genes involved in a drug response
|
|
33
|
|
|
34
|
|
|
35
|
- Identify over-represented functional categories in the clusters (i.e.,
cluster contains may more genes known to be involved in a specific
biological process than would be expected by chance)
- Requirements for systematic analysis:
- Controlled vocabulary for describing biological processes (protein
biosynthesis\translation, apoptosis\programmed cell death)
- Standard assignment of genes into functional categories
- Gene Ontology or GO project at NCBI
|
|
36
|
- Purpose:
- 1) Define controlled terms
(ontologies) for description of gene products from 3 aspects:
- Biological process (DNA repair, mitosis)
- Molecular function (protein serine/threonine kinase activity,
transcription factor activity)
- Cellular component (nucleus, ribosome)
- 2) Establish a unified framework for organism-independent gene
annotation
- Characteristics:
- 1) A gene can have multiple
associations in each ontology
- 2) GO terms are organized in
hierarchical structures called directed acyclic graphs (DAGs)
- - The most general classifications are at top levels of the graph
- - More specialized classifications at lower levels
|
|
37
|
|
|
38
|
- Human
- Entrez http://www.ncbi.nih.gov/entrez/query.fcgi?db=gene
- GOA http://www.ebi.ac.uk/GOA/
- Mouse – Mouse Genome Informatics
(MGI)
- http://www.informatics.jax.org/
- Rat – Rat Genome Database
- Fly – FlyBase
- http://flybase.bio.indiana.edu/
- Arabidopsis – TAIR
- http://www.arabidopsis.org/
- Yeast – Sacchromaces Genome Database
- http://www.yeastgenome.org/
- Affymetrix chips – Netaffx
- http://www.affymetrix.com
|
|
39
|
|
|
40
|
- In the previous example:
- Total number of chip’s genes with annotation = 5000
- Total number of chip’s genes associated with metabolism GO category =
3,600 (72%)
- Number of annotated genes in cluster 3 = 73
- Number of metabolic genes in cluster 3 = 50 (68%)
- Is it reasonable to assume that genes in Cluster 3 are enriched for
metabolic function?
- Statistical tests are essential to determine whether enrichment of a
certain class of proteins is significant
|
|
41
|
- (Day 1) Today
- How to look at data in GeneSpring
- Selecting differentially expressed genes
- GeneSpring demo – yeast cell cycle
- You are expected to complete the GeneSpring Demo before you leave
- Day 2
- Lecture on clustering – Sagar
- Workshop - Hands-on with 2 data sets from mammalian cells
- Cluster analysis – unsupervised methods
- PCA – principle component analysis
- k means
- hierarchical clustering
- SOMs – self-organizing maps
- Discrimination – a supervised method
- ANOVA – analysis of variance
- Day 3
- Lecture – Recognizing shortcomings in microarray data
- Workshop – Data Refinement
|
|
42
|
|
|
43
|
|
|
44
|
|
|
45
|
|