Notes
Slide Show
Outline
1
Microarrays
(aka CHIPs)
In Vitro and In Silico

SoCalBSI
July 5, 6, and 7, 2005

Sandra B. Sharp
Professor, Biological Sciences, CSULA
Sagar Damle
Ph.D. Candidate, Division of Biology, Caltech
2
References
  • References for Days 1, 2, and 3 and the optional lecture are found at the end of the optional PowerPoint.  They are not in alphabetical order.
3
Objectives for 3 Days
  • To become familiar with the wet-bench basics of 2-color DNA microarrays for expression analysis
  • In the context of 2-color expression arrays, use GeneSpring for selection of differentially expressed genes and for cluster analysis
  • To identify points at which uncertainties and inaccuracies may arise in 2-color expression analysis
  • In the context of 2-color expression arrays, to use EXCEL for processing of primary data
  • Optional
    • To work with annotation of microarray data (optional)
    • To work with a major microarray data base (optional)
    • To become aware of several additional applications for microarray technology (optional)
      • SNP analysis
      • Transcription factor binding site analysis
      • Protein analysis
      • Cell and Tissue analysis
4
Outline for Each Day
  • Day 1
    • Lecture
      • Principles of expression microarray
    • Workshop
      • Introduction to GeneSpring
  • Day 2
    • Lecture
      • Principles of major clustering algorithms
    • Workshop
      • Using GeneSpring for cluster analysis
      • Recognizing and dealing with variability in acquired data
  • Day 3
    • Lecture
      • Sources of uncertainty and erro in microarray analysis
    • Workshop
      • Using EXCEL to process primary data and minimize error
  • Optional
    • Lecture
      • Additional applications of microarray
    • Workshop
      •  Annotation, Stanford Microarray Database, Motif Searching (optional)



5
Lecture Outline for Day 1
  • Post-Genomic Era
  • Analysis of gene expression then and now
  • Principles of microarray (in the context of 2-color arrays)
    • Overview
    • Printing the array
    • Data acquisition
    • Data analysis (filtering and clustering)
    • Making hypotheses about biological function
6
the post-genomic era
 functional genomics and proteomics
  • Post-genomic = after the sequencing of the genome
  • Functional genomics = the study of the genome in order to determine
    • coordinated expession of mRNAs for all the genes in the genome
    • the control regions of the genes
    • the as yet unknown function of additional expressed but non-coding DNA regions (e.g., template for micro RNAs?)
7
the post-genomic era
 functional genomics and proteomics
  • Proteomics = the identification of all protein products of each gene and the interactions of all proteins in the cell (leading to systems biology)



8
Expression Microarrays -
one approach to functional genomics
  • Expression microarrays are used to determine the identity and relative quantities of each of the different kinds of (m)RNAs expressed by a cell type of interest in a “single experiment”.
    • Examples from your workshops
      • What is expressed in tumor as opposed to normal cells?
      • What is expressed over the period of development of a given tissue or cell type (myogenesis)?
    • AND WHY MIGHT WE CARE?
9
Then and Now
 Example – Normal vs. tumor tissue
  • THEN AND NOW
  • Hypothesis
    • My favorite mRNA is expressed in tumor tissue, but not in normal tissue.
  • What you need to know
    • The nucleotide sequence of your favorite mRNA
    • How to isolate total (m)RNA from normal and tumor tissues
    • How to perform an mRNA blot hybridization analysis
      •  Northern
  • NOW
  • Discovery
    • Which genes are expressed in tumor tissue but not in normal tissue?
  • What you need to know
    • The nucleotide sequence of every possible mRNA
    • How to isolate total mRNA from normal and tumor tissues
    • How to perform and analyze an expression microarray experiment
10
Results - Then and Now
 Example – Normal vs. tumor tissue
11
 
12
Expression Microarrays -
one approach to functional genomics
  • Microarrays (CHIPs) allow you to study the expression of thousands of genes at once.
  • Each chip has 10,000 - 100,000 distinct spots.
    • Each spot has multiple copies of a sequence that represents a specific gene and thereby its mRNA.
    • Some sequences are known to represent mRNAs from databases of ESTs or from predicted genes, but as yet have no identified function.
  • Some chips “tile” across the entire length of a genome or chromosome, thereby identifying transcripts coded for by template lacking the classical structure of a gene.


13
How big are microarray slides? Remember, 10- to 100,000 genes!
14
Basics of Expression Microarray Technique
5 important steps
  • The Wet Experiment
    • Requires printed slide(s) = CHIPs
      • Requires specialized equipment and software
    • Requires fluorescently labeled sample(s) representing total mRNAs from the sources of interest
    • Requires a hybridization or annealing period


  • Data Acquisition – Where has hybridization occurred?
    • Requires a scanner and associated software


  • Data Processing (Extraction, Storage, and Normalization)
    • Requires scanner software and storage in a tab delimited file
    • Requires EXCEL or similar software


  • Data Analysis and Storage
    • Requires software for cluster and other types of analysis


  • Assigning or hypothesizing regarding function
    • Utilizes cluster analysis together with databases that have annotated genes with known function







15
 
16
 
17
 
18
 
19
Animation of Expression Dual-Color Microarray Setup
  • http://www.bio.davidson.edu/courses/genomics/chip/chip.html




20
Putting DNA on the Slides
  • Print it onto the slide – PCR and long oligos
  • Synthesize it on the slide by photolithography – somewhat shorter oligos
21
 
22
http://latin.arizona.edu/~dgalbrai/arrayer.html
  • What information must be built into a program in order to automate printing?
23
 
24
Data Acquisition - Scanners
  • Make an image of the slide by
    • Exciting the fluorescent dyes
    • Collecting the emitted light
    • Generating digital images of the fluorescent signal and
    • Quantifying the intensity of the signal
  • Two major types
    • Laser excitation with a photomultiplier tube (PMT) detector - faster
    • Filtered white-light excitation with a charge-coupled device (CCD) detector


25
 
26
 
27
 
28
 
29
Data Storage and Processing
  • Tab Delimited File Output from Scanner
  • Save as EXCEL file
  • Notice the LogRatio column
    • 1 = 2X more mRNA than standard of comparison (red = 2X green)
    • 2 = 4X more
    • -1 = ½  (or 2X less)
    •    (red = ½ green)
    • -2 = ¼
  • Log plot (GeneSpring) or a plot of logs
  • More on Thursday – Lecture and EXCEL workshop
      • (eliminate bad data; correct for background; correct for technical variation within a slide; correct for technical variation between replicate slides; determine the ratio of R to G for each gene)


30
Data Analysis
  • Selecting differentially expressed genes
    • GeneSpring



31
Data Analysis




32
Data analysis (cont’d.)
clustering – suggests function
  • Places genes with similar expression patterns in groups.
  • Sometimes genes of unknown function will be grouped with genes of known function.
  • The functions that are known allow the investigator to hypothesize regarding the functions of genes not yet characterized.
    • Examples:
      • Identify genes expressed in a specific tumor type
      • Identify genes coordinately involved in a developmental pathway
      • Identify genes involved in a disease response
      • Identify genes important in cell cycle regulation
      • Identify genes that participate in a biosynthetic pathway
      • Identify genes involved in a drug response
33
 
34
 
35
Assigning or Hypothesizes about Biological Meaning of Clusters
  • Identify over-represented functional categories in the clusters (i.e., cluster contains may more genes known to be involved in a specific biological process than would be expected by chance)
  • Requirements for systematic analysis:
    • Controlled vocabulary for describing biological processes (protein biosynthesis\translation, apoptosis\programmed cell death)
    • Standard assignment of genes into functional categories
      • Gene Ontology or GO project at NCBI
36
Gene Ontology (GO) project
http://www.geneontology.org/
  • Purpose:
    • 1)  Define controlled terms (ontologies) for description of gene products from 3 aspects:
      • Biological process (DNA repair, mitosis)
      • Molecular function (protein serine/threonine kinase activity, transcription factor activity)
      • Cellular component (nucleus, ribosome)
    • 2) Establish a unified framework for organism-independent gene annotation
  • Characteristics:
    • 1)  A gene can have multiple associations in each ontology
    • 2)  GO terms are organized in hierarchical structures called directed acyclic graphs (DAGs)
    • - The most general classifications are at top levels of the graph
    • - More specialized classifications at lower levels

37
Hierarchical classification scheme for proteins that function in M-phase of mitosis
38
Online Databases that annotate genes by GO
  • Human
    • Entrez http://www.ncbi.nih.gov/entrez/query.fcgi?db=gene
    • GOA http://www.ebi.ac.uk/GOA/
  • Mouse – Mouse Genome Informatics  (MGI)
    • http://www.informatics.jax.org/
  • Rat – Rat Genome Database
    • http://rgd.mcw.edu/
  • Fly – FlyBase
    • http://flybase.bio.indiana.edu/
  • Arabidopsis – TAIR
    • http://www.arabidopsis.org/
  • Yeast – Sacchromaces Genome Database
    • http://www.yeastgenome.org/
  • Affymetrix chips – Netaffx
    • http://www.affymetrix.com
39
Example:  Cluster 3, 95 genes
40
Identifying enriched GO categories in clusters
  • In the previous example:
    • Total number of chip’s genes with annotation = 5000
    • Total number of chip’s genes associated with metabolism GO category = 3,600 (72%)
    • Number of annotated genes in cluster 3 = 73
    • Number of metabolic genes in cluster 3 = 50 (68%)
  • Is it reasonable to assume that genes in Cluster 3 are enriched for metabolic function?
  • Statistical tests are essential to determine whether enrichment of a certain class of proteins is significant
41
Workshops and Lectures to Come
  • (Day 1) Today
    • How to look at data in GeneSpring
      • Selecting differentially expressed genes
    • GeneSpring demo – yeast cell cycle
    • You are expected to complete the GeneSpring Demo before you leave
  • Day 2
    • Lecture on clustering – Sagar
    • Workshop - Hands-on with 2 data sets from mammalian cells
      • Cluster analysis – unsupervised methods
        • PCA – principle component analysis
        • k means
        • hierarchical clustering
        • SOMs – self-organizing maps
      • Discrimination – a supervised method
      • ANOVA – analysis of variance
  • Day 3
    • Lecture – Recognizing shortcomings in microarray data
    • Workshop – Data Refinement




42
Single –color
Affymetrix
43
 
44
 
45
Affymetrix Chip Scan