Internship Mentors 2008
LAST UPDATED: 05/28/08
Andrew Cameron (Caltech) | Ping Du (Allergan) | Eleazar Eskin (UCLA) | Alfred Fonteh (HMRI) | William A. Goddard III (Caltech) | Ian Haworth (USC) | James Kerwin (HEI) | Garry Larson (City of Hope) | Chris Mattmann (JPL) | Jamil Momand (CSULA) | Matteo Pellegrini (UCLA) | Angela Presson (UCLA) | Bruce Shapiro (Caltech) | Shantanu Sharma (CSU Pomona) | Janet Sinsheimer (UCLA) | Victor Tam (MannKind) | Barbara Wold (Caltech) | Todd Yeates (UCLA) | Nagarajan Vaidehi (COH) | Xiwei Wu (COH)
Andrew Cameron, Ph.D (Caltech)
Dr. Cameron's Personal Website
Description:
Sea urchin molecular development and computational biology
Projects for Interns:
A problem related to data curation and presentation on the sea urchin genome database and web site.
Intern Requirements:
Some knowledge of, or at least a nodding acquaintance with, molecular biology, genomics and scripting computer languages such as Perl or Python
Description:
Allergan, Inc. is a multi-specialty health care company focused on discovering, developing, and commercializing innovative pharmaceuticals, biologics, and medical devices. the Research Informatics group supports discovery and pre-clinical reearch by providing tools for data analysis, assay protocal development, system configuration, and laboratory systems integration.
Projects for Interns:
the objective of the Data Intergration project is to assist in pharmaceutical discovery research by integrating multi-disciplinary data into a research data warehouse. the intern will develop data extraction tools to load data from a source computer database to the data warehouse.
Intern Requirements:
Knowledge in relational database and data warehousing. Experience in data modeling, developing SQL queries and stored procedures. Familiar with scientific research concepts such as chemical synthesis and biological activity assays.
Description:
Our research focuses on developing techniques for solving the challenging computational problems that arise in attempting to understand the genetic basis of human disease.
Projects for Interns:
Many recent studies has demonstrated that genetic variation influences gene expression or which genes are active in any given cell. These gene expression changes may have implications in human diseases.
Using both information on genetic variation and information on gene expression from different strains of model organisms, this project will attempt to understand how genetic variation affects gene expression.
Intern Requirements:
Knowledge of a programming language and a basic knowledge of statistics.
Alfred Fonteh, Ph.D (Huntington Medical Research insittutes)
Molecular Neurology Program Webpage
Dr. Fonteh Bio
Dr. Fonteh's Publications
Description:
The goal of the Molecular Neurology Program is to determine the brain composition of healthy and diseased study participants. we have collected cerebrospinal fluids (CSF) from subjects afflicted with migraine, Alzheimer's and Parkinson's diseases, multiple sclerosis, bipolar disorder, depression, and schizophrenia. Using mass spectrometry, we have generated comprehensive databases of total proteins (The Brain Proteome Project) and lipids (The Brain Lipidome Project) found in CSF. We have neuropsychological and clinical measures of brain function and plan to link these to molecular changes in CSF. A link between our molecular databases and our clinical measures will substantially improve our understanding of brain pathophysiology and will facilitate diagnoses and discovery of novel therapeutic targets.
Projects for Interns:
A) Quantitative Proteomics in Alzheimer's disease using Isotope Dilution Mass spectrometry.
B) Lipidomic algorithm for Electrospray ionization mass spectrometry.
C) Generation of Clinical and Molecular Databases.
Intern Requirements:
Interns with interest in neuropathophysiology and with computer skills will benefit from the above projects. Specific computer skills include the use of Excel for importing, exporting and sorting of data and use of Pivot tables, the search of protein databases (Swissprot database, http://www.expasy.ch/sprot/). Interns should be familiar with principles of mass spectrometry of proteins (peptide fragmentation, accurate mass determination). Proprietary software (XCalibur, LCQuan) from a manufacturer of our triple quadrupole mass spectrometer will be used.
William A. Goddard, Ph.D (Caltech)
Description:
We predict protein structure for GPCRs with bound ligands.
Projects for Interns:
A specific GPCR such as Muscarinic Acetylcholine Receptor bound to agonist or antagonist.
Intern Requirements:
Familiarity with using computers (Linux), using bioinformatics software such as Blast, knowing about biochemistry smart, hard working.
Description:
Our laboratory is interested in the computational design of biomolecular interfaces, generally with a therapeutic or diagnostic goal. We also have projects in computational analysis of nucleic acid folding, including fitting of data to experimental distances derived from EPR, and analysis of DNA bending in formation of condensed DNA for non-viral gene delivery. Most of our work is based on new algorithms that have been or are being developed in the laboratory. We also use commercially available algorithms for established methods such as molecular dynamics simulations and database searching.
Projects for Interns:
Intern Requirements:
The most important attribute for the intern will be a good appreciation of biomolecular structure, or at least, the desire to learn quickly about the structure of proteins and nucleic acids. Familiarity with any form of computer programming will be useful, but this is not absolutely essential. Attention to detail will be very important in developing input parameters for running the algorithms, and good skills in numerical and structural analysis are needed for interpretation of results.
James Kerwin, Ph.D (House Ear Institute)
Description:
HEI is a non-profit organization dedicated to research, clinical studies and treatment of hearing-related disorders. Collaborations between laboratories focusing on clinical studies and basic research provide a unique opportunity to have an immediate impact on human health while working on projects investigating the biochemical and physiological bases of normal and pathological development using a variety of experimental systems.
Projects for Interns:
A newly established center for mass spectrometry, primarily dealing with proteins and peptides, uses sophisticated data base search engines to interpret mass spectral data and relate these results to specific physiological functions. In addition to exploiting the capabilities of existing software, we are interested in extending results to characterize interactions between proteins and protein complexes since most biological processes involve protein-protein interactions. Developing software to interpret mass spectra of these covalently-bound complexes is a daunting task.
Intern Requirements:
These projects require a basic understanding of protein (bio)chemistry, and the ability to modify existing programs in collaboration with laboratories currently involved with developing software for interpretation of complex protein mass spectra.
Garry Larson, Ph.D (City of Hope)
Description:
Since only one percent of the human DNA comprises the exonic protein encoding regions of the genome it seems logical to identify mutations that influence disease susceptibility by mechanisms other than protein structure or function. One mechanism worthy of pursuit is gene expression. Expression signatures are extremely powerful and are capable of distinguishing tumors from patients possessing either BRCA1 or BRCA2 mutations. This suggests that heritable risk variants (ie. disease polymorphisms) in cis-acting transcriptional control elements may be one possible explanation leading to the aberrant transcript levels observed in breast tumors. To identify these expression risk variants dysregulated in breast cancer our group uses a combination of statistics, comparative phylogenetics and family-based linkage methodologies. We perform meta-analysis and bioinformatics based analyses of multiple, publicly available BrCa microarray datasets to identify statistically “worthy” candidates. We subsequently utilize extensive bioinformatic and comparative phylogenetic analyses of our candidates in orthologs to both select genes worthy for genetic experimentation along with the identification of evolutionarily conserved transcriptional regulatory elements. We also employ genetic enrichment strategies in a previously acquired cohort of multiplex families with disease (sibling pairs) using allele-sharing enrichment and postulated gene-gene interactions. Putative high-risk transcriptional alleles will be characterized to demonstrate abnormal interactions with elements of the transcriptional apparatus in biochemical assays. Student participants should expect exposure to a mixed bag of both concurrent laboratory experimentation and be prepared to query external databases for specific targets, construct relational DBs, and utilize mining approaches to integrate diverse datasets.
Projects for Interns:
Mapping genetic variation(s) to pre-specified DNA elements.
Intern Requirements:
Strong programming skills preferred. In-depth knowledge of biology is a plus, but not required.
Chris Mattmann, Ph.D (Jet Propulsion Laboratory)
http://sunset.usc.edu/~mattmann/
References:
http://portal.acm.org/citation.cfm?id=1192631
http://csdl.computer.org/comp/proceedings/cbms/2003/1901/00/19010117abs.htm
http://dx.doi.org/10.1109/IRI.2007.4296656
http://edrn.jpl.nasa.gov
Description:
Informatics in biomedicine is becoming increasingly interconnected via distributed information services, interdisciplinary correlation, and cross-institutional collaboration. Partnering with NASA, the Early Detection Research Network (EDRN), a program managed by the National Cancer Institute, has been defining and building a knowledge environment to support the discovery of biomarkers in their earliest stages. The architecture established by EDRN serves as a blueprint for constructing a set of services focused on the capture, processing, management and distribution of information through the phases of biomarker discovery and validation.
Projects for Interns:
The student will assist in data management activities and support integration, testing and population of the EDRN Biomarker Database. The EDRN Biomarker Database is a distributed web application component, part of the EDRN Knowledge Environment, whose responsibility is the management of metadata information about cancer biomarkers. Information stored in the Biomarker Database includes EDRN study information, protocols, technologies used, associated publications, and sensitivity and specificity information. The student will curate biomarker information available from NCI and other sources such as Pub Med.
Intern Requirements:
Applicant should have computer science, bioinformatics, database or informatics background. Experience with PHP-based web applications, and Java-based, Object Oriented (OO) programming desired. Experience with MySQL database management systems and SQL a plus.
Suggested Courses: Database Systems, Web Programming, Distributed Systems, Object Oriented Software Development, Software Engineering, Software Architecture, Bioinformatics.
www.calstatela.edu/faculty/jmomand
Description:
Proteins are subject to chemical damage that can lead to inhibition of function. One type of chemical damage is deamidation of asparagines. Deamidation can result in enzyme inactivation and abnormal protein conformation. A ubiquitous methyltransferase enzyme, called isoaspartyl methyltransferase, has evolved to repair the damage caused by asparagine deamidation. Mice created without this methyltransferase have a buildup of damaged proteins. Physiologically, these animals demonstrate fatal progressive epilepsy and die at a mean age of just 42 days. It would be useful to predict which proteins are susceptible to deamidation. Such proteins would be particularly vulnerable to methyltransferase inactivation and may point to the reasons why epilepsy occurs in knockout mice. With the advent of significant structure data available in the Protein Data Bank we propose to create an algorithm that predicts deamidation sites based on structure properties. The algorithm will be useful for designing polypeptides that do not succumb to deamidation reactions.
Projects for Interns:
From the literature, the student will obtain information for proteins with the following criteria:
The student will initiate work on building a database that contains structure information related to asparagines susceptible to deamidation and asparagines that are not prone to deamidation. The database will contain the following information: regional protein sequences, phi angles, psi angles, chi-1 angles, accessible solvent area polypeptide flexibility, and interatomic distances.
Machine learning programs from the suite of programs found at Weka (Waikato Environment for Knowledge Analysis) will be employed to analyze the collected data. A decision tree will be created that classifies asparagines into two classes: one that is susceptible to deamidation and another that is not susceptible to deamidation. An algorithm will be developed from the decision tree.
.
Intern Requirements:
Knowledge of PubMed and familiarity with machine learning programs. Programming skills would be advantageous..
Matteo Pellegrini, Ph.D (UCLA)
Description:
Our lab is interested in the development of computational approaches to interpret genomic data. These methodologies allow us to develop large-scale models of transcriptional and epigenetic regulation as well as signal transduction. Our approach is to build models that integrate varied data that sheds light on these phenomena. This data is produced using the latest generation of high throughput sequencers, tiling and expression arrays along with mass spectrometry. Our research focuses on the development of both low and high-level analyses. For instance we are developing suites of tools for the analysis of high throughput sequencing data, as well as tools that combine multiple data types to infer transcriptional regulatory mechanisms.
Projects for Interns:
One potential project involves the analysis of high throughput sequence data we have generated for the plant Arabidopsis. While the majority of the DNA we have sequenced can be mapped back to the Arabidospis genome, a significant fraction cannot. This may be either due to sequencing errors, or to sample contamination from other organisms (e.g. bacteria or plant pathogens). We would like to resolve this issue by attempting to align our sequences to all known genomes, and determine which species, if any, are contaminating our samples.
Intern Requirements:
The student will have to use alignment programs such as BKAST or BLAT, and so should be familiar with the UNIX programming environment.
Description:
Statistical analysis for clinical and genetic research projects.
Projects for Interns:
Statistical analysis of pediatric cancer/lung transplant data, mouse genetics data, biomarker, or microarray data. Possible statistical methods include multivariate regression, random effects models, survival analysis, gene co-expression network analysis, qtl analysis, case-control association analysis.
Intern Requirements:
Statistical courseworkk and/or research experience, programming skills, experience with statistical software such as R or SAS.
Description:
The Caltech BNMC (Biological Network Modeling Center, http://bnmc.caltech.edu) is a directorís initiative created by the Beckman Institute in the fall of 2005. Its goal is to bring together Caltech biologists, bioengineers, mathematicians, and computer scientists to develop and apply state-of-the-art computational tools for modeling and analyzing complex biological systems. There are two major research areas. Students could work in either of these areas.
(1) Plant growth modeling. The shoot apical meristem (SAM) is a hemispherical dome of cells at the apex of growing plants from which all above-ground tissue ultimately derives. In Arabidopsis thaliana (thale cress), a small flowering weed of the Brassicaceae family (related to mustard and cabbage), the SAM typically contains some three to five hundred cells that range from five to ten microns in diameter. These cells are organized into several distinct zones that maintain their topological and functional relationships throughout the life of the plant. As the plant grows, organs (primordia) form on its surface flanks in a phyllotactic pattern that develop into new shoots, leaves, and flowers. The central region contains pluripotent stem cells that continue to divide and differentiate into mature tissue throughout the life of the plant. In the computable plant project we observe several cell type-specific markers for growth and differentiation in live Arabidopsis plants with a dedicated confocal laser scanning microscope. These markers are affixed to various gene products or promoter regions using green fluorescent protein (GFP) variants that flouresce when they are illuminated within the microscope by a laser. This allows us to observe various meristem and floral primordial features, such as membranes and nuclei, and to track specific cell lineages over time. By fitting mathematical and computational models to these spatiotemporal expression patterns, we can infer how primordial cells are progressively specified and organs develop. From this we develop forward simulations and visualizations of the growing SAM. (see http://comptuableplant.org ).
(2) Software platforms for Systems Biology. The Systems Biology Markup Language (SBML) is a tool-neutral, computer-readable, text file (XML) format for representing models of biochemical reaction networks, especially applicable to descriptions of cell signaling pathways, metabolic networks, genomic regulatory networks, and other modeling problems in systems biology. SBML is based on XML (the eXtensible Markup Language), a standard medium for representing and transporting data that is widely supported on the Internet as well as in computational biology and bioinformatics By encoding models in SBML, they can be freely interchanged between users, regardless of which software tool, hardware platform, or operating system each uses. So long as each modeler uses SBML compliant software, they will both be able to run simulations from the same model, without modification, on their on platform, and compare results. The benefits of this interoperability are enormous. Not only can users share models, but they can use multiple simulation tools and techniques within a single research project without rewriting their models; publish reproducible models in the scientific literature; and can help ensure model survivability. The BNMC participates in the international SBML-team, which is responsible developing standards, documentation, software, and a web site to support SBML. See http://sbml.org or more information.
Projects for Interns:
There are several potential projects, depending on the interests and background of the student.
Intern Requirements:
Essential: Some programming experience in a high level language like Mathematica, Python, or C. An interest in digging into a problem and letting it take them where it leads. Willingness to ask a question whenever they run into a problem. There is no expectation that the intern will figure everything out on their own!
Helpful: basic understanding of chemical kinetics, what is a chemical reaction; understanding of the law of mass action; what is a differential equation (though no need to know how to solve one). Comfortably able to install the necessary software on the operating system of their choice (Linux, Max OSX, or Windows).
All of the projects are computationally based and there is little to no chance of any wet-bench portion to the internship.
Shantanu Sharma, Ph.D (Cal Poly Pomona)
http://quanta.sci.csupomona.edu
Description:
Our research group focuses on the development of quantitatively based computational methods for the determination of residue-specific interactions of antimicrobial peptides with functional membrane proteins and ion channels.
Projects for Interns:
Interns will be involved in a collaborative project to develop a web-based service for protein-protein docking using a novel protocol and existing tools. The majority of the work will entail:
Intern Requirements:
Competency with Linux/UNIX and web scripting/programming languages such as php, perl or asp.net (C# with mono). Working knowledge of protein structure and function is a decided advantage, but not necessary.
www.biostat.ucla.edu/people/sinshmer.htm
Description:
Statistical Genetics
Projects for Interns:
Our current focus is on mapping genes for complex traits and understanding the interplay between genes and environment. we have projects working that relate to breast cancer, schizophrenia, and glaucoma, as examples. An intern could assist us in the statistical analysis of genotype data with the goal of finding associations between the genes and the traits. They could also assist us in the development of new statistical methods to map trait genes.
Intern Requirements:
An interest and aptitude for statistics, data analysis and possibly computer programming.
Description:
Mannkind is a biopharmaceutical company focused on the discovery, development and commercialization of therapeutic products for diseases such as diabetes and cancer. In the field of cancer therapy, we are currently exploring novel targeted approach through the development of smal molecules for treatment of a variety of cancer types.
Projects for Interns:
Gene-expression data database. the student would assist in the development of a database for end users specifically looking at gene expression and become familiar with Affymetrix and Agilent microarray technologies, GEO database, along with the ability to work with lare gene-expression datasets.
Intern Requirements:
We are looking for an individual with knowledge of SQL Oracle, HTML/XHTML, and basic knowledge of statistics.
Description:
Information
Projects for Interns:
Information.
Intern Requirements:
Information.
Description:
Protein structure and self-assemply, protein design, and comparative genomics.
Projects for Interns:
Identification of novel proteins involved in formation of bacterial microcompartments.
Intern Requirements:
Curiosity, self motivation, ability to think critically.
Nagarajan Vaidehi, Ph.D (City of Hope)
Description:
My research group works on development and application of computational methods to drug design for proteins that are implicated in ovarian cancer and pancreatic cancer.
Projects for Interns:
STAT (Signal transducer and Activator of Transcription) proteins, especially STAT3 has been implicated in resistance developed for chemotherapy for many ovarian cancer patients. The intern will be working on application of computational methods to perform virtual screening of large database of compounds to come up with possible hit compounds that will be tested out experimentally at City of Hope.
Intern Requirements:
Knowledge of protein structure and function. Computer programming knowledge is desireable but not required.
Description:
The rapidly evolving technology of DNA microarrays provides a powerful tool for the global investigation of cellular activity at different levels, such as gene expression, miRNA expression, protein-DNA interaction, DNA and histone methylation, copy number variation and single nucleotide polymorphism. The COH microarray core provides instrumentation and expertise for genome-wide gene expression profiling and other genome-scale studies using Affymetrix GeneChip (TM) technology, Agilent technology, and other microarray technologies. In collaboration with Biomedical informatics core, we establish data analysis pipelines and develop novel analysis methods for microarray data. We have provided analysis solutions to all data types generated from Microarray core, including gene expression, miRNA expression, chIP-on-chip, CpG methylation, and transcript mapping. We typically use open-source software and tools to conduct data analysis and data mining. For example, R/Bioconductor is one of the major tools we use every day.
Projects for Interns:
One of the most interesting data mining techniques for microarray is gene set enrichment analysis. In contrary to one gene at a time analysis approach, a set of genes are analyzed together to identify coordinate changes. The gene set can be flexibly defined as genes that are in the same signaling or metabolic pathway, gene ontology category, or even the same chromosome location. The advantage of analyzing genes as a set is the ability to identify subtle changes in gene expression that is otherwise undetectable in traditional approach and generate results that are easier to interpret. In a typical analysis, the scores measuring differential expression of each gene in the gen set will be averaged and the average score will be used to rank the gene sets. The gene set with the highest score or smallest P value will be reported as significant gene set. There are many tools available to conduct gene set analysis. Most of the tools look for common changes in the gene sets, regardless of whether the genes are up or down-regulated. However, it is estimated that at least 30-40% of the signaling pathways in commonly cited databases include genes that are inhibitors of the pathway. It is obvious that up-regulation of inhibitors will have the opposite effect of the up-regulated activators of the pathway. Therefore, it is essential to distinguish the activators and inhibitors in each signaling pathway to obtain biological meaningful results. The role of each gene in all the signaling pathways will be identified using graph theory by parsing and processing XML pathway documents. A novel gene set analysis tool will also be developed to take advantage of this additional information to improve the analysis accuracy.
Intern Requirements:
Knowledge of XML, XML parser, and strong background in an advanced programming language, ideally Java, is highly required. No biological background is required, but will be helpful.