Internship Mentors 2006
Cecilie Boysen & Jim Breaux (ViaLogy) | Bruce Hoff / Soheil Shams (BioDiscovery) | York Marahrens (UCLA) | Eric Mjolsness (UCI) | Jeanette Papp (UCLA) | Matteo Pellegrini (UCLA) | Bruce Shapiro (Cal Tech) | Steven Smith (City of Hope) | Michael Thompson (UCLA) | Barbara Wold (Cal Tech) | Qing 'Tina' Xiao (JPL) | Todd Yeates
Cecilie Boysen, Ph.D & Jim Breaux, Ph.D (ViaLogy)
ViaLogy Description:
ViaLogy develops and markets software solutions for active signal processing using its proprietary technology, Quantum Resonance Interferometry (QRI). ViaLogy's technology can be employed to increase the sensitivity and reproducibility of signal detection and the technology is applicable to a broad range of measurements, such as DNA and protein microarrays, mass spectrometers, electron microscopes, medical imaging, optical and microwave communications. ViaLogy Microarray Analysis Service (VMAxS) provides order-of-magnitude increases in detection sensitivity, and significantly improved reproducibility and specificity through an on-line user-driven re-analysis of raw data files from market-leading DNA microarray systems, without changing laboratory protocols and without new instrumentation.
Projects for Interns:
As ViaLogy currently has a number of new projects underway, there are a number of possibilities for interns: 1. Implementation and testing of a new DNA microarray test suite, including analysis of results from gene expression experiments. 2. Application of new image processing algorithms for analysis of fluorescently labeled cells, including morphology studies and cell counting procedures. 3. Development of tools for analysis of mass spectrometry output.
Intern Requirements:
Note that all activities at ViaLogy are computational in nature; ViaLogy does not operate a wet lab, and relies on incoming data from collaborators, clients and prospective customers. Thus, the desired intern skills and interests include familiarity with programming (C++ and/or R in particular), image processing, and data analysis. Ability to work independently, yet, has very good communication skills, as most projects is being done in close interaction with biologists, physicists, engineers, and computer scientists.
Bruce Hoff Ph.D / Soheil Shams Ph.D (BioDiscovery)
** Update **
Bruce Hoff has left BioDiscovery. Soheil Shams (sshams@biodiscovery.com) will be the new BioDiscovery contact person.
BioDiscovery Description:
Gene microarrays have become recognized as powerful tools for providing a global view of gene expression regulation for a biological condition of interest. The other side to this double-edged sword is that such studies produce large amounts of interesting numerical results, making it difficult to get an intuitive grasp on what is happening biologically: Our company, BioDiscovery, is dedicated to providing researchers useful software tools for gleaning biological meaning from large data sets. One area of interest is the discovery of gene interactions, useful in elucidating novel biological mechanisms. The sum me r internship project will involve applying tools such as clustering analysis, self-organizing maps, and genomic pathways to the discovery of new biological interactions between genes.
Projects for Interns:
Interns will be involved in researching and prototyping novel analytic and statistical tools for processing microarray data and biological information. The ideal candidate is an adept Java programmer, knowledgeable in mathematics, statistics, biology, and publicly available (web-based) Bioinformatics resources (e.g. those of NCBI). Particularly valuable knowledge includes an understanding of gene metabolic and signaling pathways, an understanding of multi-variate and multi-factor statistics, and exposure to existing microarray data analysis software tools.
Intern Requirements:
The ideal candidate is an adept Java programmer, knowledgeable in mathematics, statistics, biology, and publicly available (web-based) Bioinformatics resources (e.g. those of NCBI). Particularly valuable knowledge includes an understanding of gene metabolic and signaling pathways, an understanding of multi-variate and multi-factor statistics, and exposure to existing microarray data analysis software tools.
UCLA Description:
How genome DNA sequence determines its chromatin structure and gene expression.
Projects for Interns:
1) a project to automate the quantitation of fluorescent signal on mitotic chromosomes in which the certain chromatin protein(s) are fluorescently labeled. This project has already been started by others.
2) project to determine the relationship between level of gene expression (all genes in established microarray data) and the abundance of repetitive sequences around the genes. We have already established important relationships between various repetitive sequences and gene expression in humans (where ~50% of the genome sequence consists of repetitive sequences). We would like to perform similar analyses in other organisms (eg. mouse, etc...) in which the genomes have been sequenced and gene expression data is available.
Intern Requirements:
Established skills in using the computer for data mining. Some statistics knowledge preferred also.
UC Irvine Description:
Computable Plant project, www.computableplant.org, working at UCI.
Projects for Interns:
One possible assignment is in the outreach to high school science teachers (see the outreach section of the foregoing web site). Another is to try out new hypotheses with the modeling software.
Intern Requirements:
A primary skill would be the ability to learn to model in Cellerator www.cellerator.org which uses a computer algebra system, and to show other people how to do the same.
UCLA Descripion:
Dr. Papp is the Director of the UCLA Genotyping and Sequencing Core Facility, and a member of the UCLA Bioinformatics Core. In addition to overseeing data generation and analysis in the laboratory, her research interests include developing novel bioinformatic solutions for the management and analysis of all types of genetic data within the Department of Human Genetics.
Projects for Interns:
Project 1. Candidate will assist in the development of a web-base tools that will assist scientists in accessing and visualizing genetic data.
Project 2. The shift from Mendelian to complex disease genetics means that instead of searching for one genetic mutation with a large effect, we are interested in finding many genes each contributing a small effect. One approach is to co-analyze gene expression, genetic marker and trait data. In collaboration with Steve Horvath, we have developed and correlated a gene co-expression network with phenotype and SNP data to identify genetic markers for chronic fatigue syndrome. A summer intern might similarly analyze microarray and SNP data to identify complex disease markers.
Intern Requirements:
Project 1. Candidate must have a good understanding of RDBMS, preferably SQL Server, Postgres, or MySQL. Candidate must have programming experience in PHP and understanding of object oriented programming concepts. In addition, candidates are recommended to have some experience with developing interactive web applications using asynchronous requests such as AJAX. Candidate should be highly motivated and self-directed.
Project 2. Summer interns who are interested in implementing R code should apply. Students with computer programming skills and/or experience with statistical/mathematical software packages (for example matlab, mathematica, R, or S), who have also had some genetics courses would be a good fit for this project.
UCLA Description:
Our lab is interested in developing computational approaches to reverse engineer molecular networks. These network models allow us to elucidate the mechanisms of signal transduction, transcription and metabolism. Our approach is to build models that integrate varied data including measurements of gene expression, protein binding, phosphorylation and genome sequences. For example, we use genome sequence data to infer networks of co-evolving proteins, which allow us to study the function of most proteins. Currently, we are also developing methods to reconstruct dynamical networks of transcriptional regulation. Our long-term goal is to build network models that allow us to quantitatively predict the outcome of perturbations in cells.
Projects for Interns:
The student will work on projects involving either (a) the distribution of protein domains across all fully sequenced genomes in order to study protein function or organismal evolution or (b) the interpretation of expression data in terms of transcription factor activities.
Intern Requirements:
The ideal student for this project would be familiar with some form of computer programming. The actual projects will be carried out using Matlab. Knowledge of linear algebra would be a plus.
Bruce Shapiro, Ph.D (Cal Tech)
Caltech Description:
The Biological Network Modeling Center (bnmc.caltech.edu) is directly involved in a wide variety of interdisciplinary projects, putting the BNMC at an exciting intersection of talent and activities in computation, biology, and theory. An intern might choose to work on one of the following projects. A student could design a project that overlaps one or more of our research areas.
Projects for Interns #1:
1. The Computable Plant (www.computableplant.org)
Project Description: The computable plant project is developing an end-to-end research and modeling framework for the Arabidopsis SAM (shoot apical meristem), the growing tip of a plant stem. We observe several cell type specific markers for growth and differentiation in real-time in live plants with a dedicated confocal laser scanning microscope. Using a combination of computational modeling and image processing techniques we then infer specific transduction pathway data and fit mathematical models to produce two- and three-dimensional visualizations of the growing SAM, including phyllotactic and leaf vein development. Our aim is to determine the spatial and temporal relationships between different genes in an effort to understand how primordial cells are progressively specified.
What an Intern Might Do: We have developed and are developing several different signal-transduction and gene-regulatory network models of meristem development. To determine the efficacy of these models, various parameters need to be tuned, simulations run, and the resulting predictions compared with observed data. The networks may need to be modified by adding, removing, or changing some of the biochemical reactions involved.
Projects for Interns #2:
2. Cellerator (xCellerator.info)
Project Description: Cellerator is a computational environment for describing and simulating cellular models using Mathematica. It supports single and multiple compartment systems with a wide variety of interactions include mass-action, enzymatic, allosteric and connectionist models. Reactions are translated into differential equations and can be solved numerically to generate predictive time courses or exported for use in other programs.
What an Intern Might Do: A student could write a Cellerator plug-in that does one of the following:
(1) converts descriptions of chemical reactions to the SBGN Graphical notation (www.sbgn.org)
(2) interface with CGAL (Computational Geometry Algorithms Library) to generate three dimensional dynamic visualizations of growing plant tissue for the computable plant project.
(3) develop a specification for incorporating the Systems Biology Ontology (www.ebi.ac.uk/compneur-srv/sbo/) into Cellerator.
Projects for Interns #3:
3. SBML (Systems Biology Markup Language) (sbml.org)
Project Description. The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable to metabolic networks, cell-signaling pathways, regulatory networks, and many others. It is the international de facto standard machine-readable format for exchanging computational models of biological reaction networks, and is currently supported by over 90 different software systems. Advances in biotechnology are leading to larger, more complex quantitative models. The systems biology community needs information standards if models are to be shared, evaluated and developed cooperatively. SBML's widespread adoption offers many benefits, including: (1) enabling the use of multiple tools without rewriting models for each tool, (2) enabling models to be shared and published in a form other researchers can use even in a different software environment, and (3) ensuring the survival of models (and the intellectual effort put into them) beyond the lifetime of the software used to create them.
Intern Requirements:
Students should have the following background: calculus, some understanding of differential equations (a full course is not necessary, just knowing what they are and having an interest in solving them numerically); a desire to work intensively doing computer modeling or computer programming; Modeling will be done with Mathematica but no prior knowledge of Mathematica is required; enough background in biology to know what a signal transduction network is. All work would be done at Caltech. For some projects students would coordinate their work with wet-bench researchers at Caltech. No wet bench research is involved in any of these projects.
What an Intern might do.
(1) Write software for MathSBML to incorporate graphic diagram layouts. An SBML graphic layout definiton has already been developed, it is a matter of converting it to a useful form.
(2) Identify and curate models for the new Biomodels Database (biomodels.net) Models present in BioModels Database are annotated and linked to relevant data resources, such as publications, databases of compounds and pathways, controlled vocabularies, and more.
Steven Smith, Ph.D (City of Hope)
City of Hope Description:
Bioinformatics Research in our Laboratory (Under Its Broadest Definition) is of Three Types:
Bionanotechnology
Bionanotechnology is a new field that is working to create a variety of devices that will improve and augment existing approaches to medical and biological problems. This new field uses information from molecular biology, chemistry and physics to link biological and non-biological molecules into complex bioassemblies not normally found in nature. In these applications, the goal of bionanotechnology is not only to produce the specificity and telemetry that a radio beacon soft-landed on the moon might exhibit, but also the capacity for detailed and serial analyses of the landing site that a soft-landed robot might exhibit. No one expects that robotic control of a molecular device will be available anytime soon, but such a device might be preprogrammed to report several findings once its initial target is located. In developing nanoscale assemblies for these biological applications, we find that computer aided design is an essential step in the process. Devices now under development use the tools of computational chemistry: e.g. electronic structure calculation, homology modeling and molecular modeling. Available tools include Spartan, Gaussian, Insight II, Chimera, Midas, and Biograf implemented on SGI Origin desk side supercomputers or other SGI hardware.
Epigenetics of NonCG Methylation:
The role of DNA methylation in biology is currently a topic of great interest in the research community. The genomics of cytosine methylation at CG sites in human DNA has received considerable attention, while the genomics of non-CG methylation in human DNA is less well studied, even though it may make up the majority of methylation in human cells. In our lab we have been collaborating with scientists in Sweden and Moscow on what role, if any, that this sort of methylation may play in human cells. It is well known that CG methylation has played an important role in the evolution of the structure of the human genome. Our preliminary findings suggest that non-CG methylation, may also have played an important role in the evolution of the structure of the human genome. To this end we are studying the distribution non-CG methylation sites in the human genome with the tools of genomic analysis available at City of Hope. Available tools are listed at the City of Hope Sequence analysis website:
www.infosci.coh.org/corelab/sequence_analysis.asp?parent=genomics&child=sequence
Cancer Diagnosis with CG Methylation:
Changes in DNA methylation are important early markers of tumorigenesis. DNA present in a variety of non-invasively obtained specimens e.g. (Urine and Expressed Prostatic Secretion) is being studied for changes in promoter methylation state. Our hypothesis is that the synchronous determination of the promoter methylation state at multiple genes will yield patterns of methylation that correlate with the presence of prostate or bladder cancer and that these patterns can be useful in diagnosis and prognosis of the disease. Individual markers are being analyzed with standard performance tables and Receiver Operator Characteristic analysis. Multivariate statistical models of the complete data sets are being developed using a variety of Biostatistical tools.
Projects for Interns:
1. Molecular modeling for Bionanotechnology
2. Epigentics of Non-CG methylation and genome structure
3. Biostatistical Analysis of Cancer Diagnosis Data Sets.
Intern Requirements:
1. Computer Literacy
2. Resourcefulness
3. Interest in molecular structure, genome stucture or data structure.
UCLA Description:
The structural basis of amyloid formation.
Projects for Interns:
Amyloid formation is associated with neurodegenerative disease like Alzheimer's and with other pathologies. Amyloid is an aggregate of protein in a fibrillar form. Typically this is not the natural functioning form of the protein, and there is much interest in understanding this fibrillization process. It is possible that the same process underlies the prion phenomenon (Mad Cow Disease, Kuru, etc). We are developing a method to predict which proteins might form amyloid and which parts of those proteins are involved.
Internship on this project will involve refinement/improvement of existing algorithms, critical scholarship in the field of amyloid research, and a selective research emphasis on an amyloid-related disease of the intern's choosing (e.g. diabetes, familial mediterranean fever, etc.)
Intern Requirements:
Competent programming skills in one or more languages. Some knowledge of basic statistics and biochemistry. Eagerness to learn.
Projects for Interns:
1. A mix of biology and computing around the software package called Cistematic that would involve definition of cis-regulatory cohorts - protein binding sites for proteins that work together to regulate transcription of a given gene or genes. Some introductory information regarding Cistematic is available at: http://cistematic.caltech.edu
2. Working on further development of BioHub or CompClust.
Information for BioHub can be found at: http://woldlab.caltech.edu/biohub
Information for CompClust can be found at: http://nar.oxfordjournals.org/cgi/content/full/33/8/2580
JPL Description:
JPL scientific data management group develops middleware software that allowing transparent access to distributed resources, data discovery and query optimization, distributed data processing and virtual archives.
The Object Oriented Data Technology Task (OODT) group has been performing research in distributed and object-oriented technology to improve scientific data management and interoperability among space science data systems as well as biomedical data systems. We have designed framework and provided software components to perform archiving, search and retrieval, and data analysis for science data systems. Both NASAís Planetary Data System (PDS) and the National Cancer Instituteís Early Detection Research Network (EDRN) have implemented this technology to collect and distribute data for scientific collaboration.
Projects for Interns:
Possible projects: development of databases for existing biomarkers in early cancer detection based upon literature search, development of programs and user-interface for cancer biomarker analysis using public data resources including gene expression, SNP.
Analysis of proteomics data is currently the most active research area for bioinformatics. No standard methodology has been define yet. For this summer project, we would like to evaluate various data analysis platform such as caWorkbebench, S-plus. During this process, we should accomplish the following:
1) Identify the steps involved in analyzing proteomics data
2) develop a specification for the software platform supporting the process of proteomics data.
Description:
Text