Internship Mentors 2004

Michael Curtin (Paracel Inc) | Bruce Hoff (BioDiscovery) | Gary Larson (City of Hope) | Magnus Nordborg (USC) | Sandeep Gulati (ViaLogy) | Barbara Wold (Cal Tech) | Todd Yeates & Mike Thompson (UCLA) | Qing 'Tina' Xiao (Cal Tech/JPL)

Michael Curtin, Ph.D (Paracel Inc)

www.paracel.com

Project 1 Description

The purpose of this project is to develop a web interface to administrative functions in Paracel, Inc.'s BioView WorkBench software. Administrative functions include loading databases, monitoring and reprioritizing user jobs, and configuring user permissions. The intern should be familiar with web page design, HTML, Linux, Perl, and shell scripting.

Project 2 Description

The purpose of this project is to implement optimizations to the Paracel, Inc. BLAST software to enhance its performance on large clusters. These optimizations will include improved communication patterns, better use of local disks, and updated scheduling algorithms. The intern should be familiar with C programming on UNIX or Linux, parallel programming with MPI, Perl, shell scripting, and basic system administration.

Project 3 Description

The project seeks to demonstrate the improved sensitivity of Smith-Waterman over BLAST, as it relates to the fields of oligonucleotide design and RNAi selection. The intern will work with scientists and engineers to set up and run representative benchmarks using the GeneMatcher2 and BlastMachine2. In addition, the intern will record all outcomes of the experiments and work with other departments to communicate the results. Other possible areas of study include running a variety of tests using the GeneWise algorithm and other Hidden Markov Model-based algorithms.

Back to Top



Bruce Hoff, Ph.D (BioDiscovery)

www.biodiscovery.com

What is BioDiscovery

Gene microarrays have become recognized as powerful tools for providing a global view of gene expression regulation for a biological condition of interest. The other side to this double-edged sword is that such studies produce large amounts of interesting numerical results, making it difficult to get an intuitive grasp on what is happening biologically: Our company, BioDiscovery, is dedicated to providing researchers useful software tools for gleaning biological meaning from large data sets. One area of interest is the discovery of gene interactions, useful in elucidating novel biological mechanisms. The sum me r internship project will involve applying tools such as clustering analysis, self-organizing maps, and genomic pathways to the discovery of new biological interactions between genes.

What Would An Intern Work On?

Interns will be involved in researching and prototyping novel analytic and statistical tools for processing microarray data and biological information. The ideal candidate is an adept Java programmer, knowledgeable in mathematics, statistics, biology, and publicly available (web-based) Bioinformatics resources (e.g. those of NCBI). Particularly valuable knowledge includes an understanding of gene metabolic and signaling pathways, an understanding of multi-variate and multi-factor statistics, and exposure to existing microarray data analysis software tools.

Back to Top



Gary Larson, Ph.D (City of Hope)

www.coh.org

What Does Dr. Larson Study?

Dr. Larson works on studies aimed at identifying disease alleles that contribute to inherited cancer risk. To identify these mutants both linkage analyses and association testing are carried out in both cancer families and control patients respectively. Ultimately, identifying disease variants and deciphering their mechanism of action are the goals.

What Would An Intern Work On?

Students will participate in one of the following projects: development of a laboratory management information system to link experiments with genetic data in a pre-existing database, utilization of comparative genomic tools to identify phylogenetically conserved elements, and possibly meta-analysis of array data.

Back to Top



Magnus Nordborg, Ph.D (USC)

www.usc.edu

What Research Is Conducted at Dr. Nordborg's Laboratory?

We are working on the population genetics of the model plant Arabidopsis thaliana, on a genomic scale. We are carrying out a genomic survey of genetic polymorphism by sequencing short reads throughout the genome in a sample that contains individuals from all over the world. We use these data to get a picture of how variation is organized in this species, with respect to chromosomes, individuals, and populations. In other words, which alleles tend to co-occur, which do not, and so on. The major reason for understanding how genetic variation is structured is to understand the genetics of adaptation, how these plants are adapted to their local environment. We measure important traits like differences in flowering behavior, and attempt to correlate these with the genetic variation. Research in the lab involves everything from working with plants, via standard molecular biology techniques, to computer programming and statistical modeling.

Back to Top



Sandeep Gulati, Ph.D (ViaLogy)

www.vialogy.com

- Description Unavailable -

Back to Top



Barbara Wold, Ph.D (Cal Tech)

www.caltech.edu

- Description Unavailable -

Back to Top



Todd Yeates, Ph.D & Mike Thompson Ph.D (UCLA)

www.ucla.edu

Summer Research Projects

One potential summer research project would involve extending the phylogenetic profile method to analyze gene duplication events and their correspondence to function. As gene duplication is often posited to be a source of evolutionary novelty in expanding the functional repertoire of organisms, it would be interesting to construct and analyze phylogenetic profiles for duplicated pairs of genes. The duplicated gene in a given pair may have gained a new function (and so the pair is retained in some

present day genomes) or it may not have gained a new function and subsequently been lost (the pair is not retained in some organisms). This pattern of co-occurrence of the gene pair, when compared to those of other single proteins, may prove useful in understanding the evolution of protein function via this duplication mechanism.

A second potential summer research project would involve a survey and analysis of Q/N-rich proteins. These are proteins or subsegments of proteins that contain an abundance of glutamine and asparagine residues. In yeast, some of these proteins have an experimentally demonstrated capacity to become prions. In human, in the case of Huntington, there is an association of these proteins with neurodegenerative disorders. While some work has aimed at identifying these Q/N-rich proteins, it would interesting to see if they can be classified into subtypes based on composition or possible repeating motifs within the Q/N-rich region.

Back to Top



Qing 'Tina' Xiao (Cal Tech/JPL)

www.oodt.jpl.nasa.gov

What Research Is Conducted at JPL?

JPL scientific data management group develops middleware software that allowing transparent access to distributed resources, data discovery and query optimization, distributed data processing and virtual archives.

The Object Oriented Data Technology Task (OODT) group has been performing research in distributed and object-oriented technology to improve scientific data management and interoperability among space science data systems as well as biomedical data systems. We have designed framework and provided software components to perform archiving, search and retrieval, and data analysis for science data systems. Both NASAís Planetary Data System (PDS) and the National Cancer Instituteís Early Detection Research Network (EDRN) have implemented this technology to collect and distribute data for scientific collaboration.

What Possible Projects Will Interns Have A Chance To Participate In?

Possible projects: development of databases for existing biomarkers in early cancer detection based upon literature search, development of programs and user-interface for cancer biomarker analysis using public data resources including gene expression, SNP.

Back to Top