David W. Mount, PhD, Director
Ritu Pandey, PhD, Co-Director
The Informatics/Bioinformatics Shared Service at The University of Arizona provides support in the following areas:
- Analysis of genomic (e.g. gene expression, CGH, DNA methylation, RNAi screens, genome and sequence analysis), genetic (SNP analysis), proteomics, and other types of molecular data sets of cancer cells and tissues
- Analysis of cancer genomic, molecular, and genetic data collected by other University of Arizona Cancer Center shared services, TGEN, and other data sources, but especially Cancer Center Genomics and Proteomics shared services.
- Biological interpretation of the above data, including pathway and ontology analysis, systems analysis, genetic vulnerabilities for drug targeting, predictive patterns for outcome, and data modeling
- Informatics support for Cancer Center projects and other shared services in the form of tissue and molecular databases, genome databases, and data sharing tools
- Data integration of clinical, molecular, and genetic data utilizing CaBIG tools
Who we are
The Informatics/Bioinformatics Shared Service was founded in 2002 to provide bioinformatics, genomics, and proteomics support to The University of Arizona Cancer Center researchers so that they can fully utilize the power of the human genome project in their research. We collect, store, and make available a variety of molecular and genetic data on cancer genomes. We use established computational tools but also develop new tools as needed for analysis of all genome-related data. A variety of computer-related services for informatics support of research projects is also provided. An example of our informatics support is web-based databases for storing clinical and pathological, proteomics, and cancer genome data. Through these various levels of support, the informatics/bioinformatics shared service provides an integrated approach that assists researchers in their quest for new biological information about cancer cells and tissues and thus aids in finding new drug targets and preventative methods. Some examples of the types of services we offer are given below.
The Bioinformatics group includes the following staff members
- David W. Mount, PhD, Director of the Bioinformatics Shared Service, is Professor Emeritus of Molecular and Cellular Biology. He is an established geneticist and molecular biologist and an expert in bioinformatics and computational biology.
- Ritu Pandey, PhD, Co-Director of Bioinformatics Shared Service and Coordinator of Biomedical informatics at The University of Arizona Cancer Center. She was the CaBIG (Cancer Bioinformatics Grid) Deployment lead at the cancer center. Dr. Pandey specializes in storage, collection, management, and utilization of genomic and proteomics data for interpretation of large data sets.
- Ann Manziolli, BS in mathematics. Ann analyzes expression and other data sets using R programming and BioConductor tools.
- All the Informatics and Bioinformatic team members are currently located on the first floor of the Levy building in and around rooms 1930
Attn: Person Name room#
The University of Arizona Cancer Center
1515 N. Campbell Ave.
PO Box 245024
Tucson, AZ 85724-5024
Some of our Informatics Initiatives at the Cancer Center
We have been working on several data management and data integration initiatives since past few years. These are some of the tools that were developed by the group or adopted and deployed to support Cancer Center programs, shared services and the researchers. The goal here is to leverage any existing tools in order to integrate heterogeneous data for research. These are currently set up in a testing environment.
- CaArray — A data management system for array data. This application is web and programmatically accessible for storing, dissemination and exchange of annotation and data. CaArray supports many leading array manufacturers and it allows exchange of data with analysis tools and other storage systems for further analysis. Contact us for login and password.
- CaIntegrator2 — A web-based software package that allows researchers to set up custom, caBIG compatible web portals that bring together heterogeneous clinical, microarray and medical imaging data to enrich multidisciplinary research. Caintegrator2 uses caGrid analytical services, leverages the Cancer Data Standards Registry and Repository (caDSR) to map experimental data to well-defined datatypes and utilizes caGrid and Java client APIs to access data form caBig applications for translational studies.
- AZCC PubDB — A web based system for storing and tracking publications by University of Arizona Cancer Center members, programs and shared services. This offers automatic retrieval of publications from Pubmed and PubMed Central and generation of reports for grants usage.
- Protbase — This is a web based lab data management system for Proteomics Shared Services. This system offers users to submit sample request, track their experiment, download/view results and share their experiments. Further more, it offers facility core staff to track experiments request and lab usage for proteomics services, upload results and bill the researchers for services offered.
- Genomics Core database — A storing and tracking system for microarray data for the Genomics Shared Service. This is a web based system for users to submit service request and download their results online.
- Pathway Miner — A web based system for biological interpretation of results from microarray experiments. This offers collective genome and proteome information from several third party resources for data mining and data exploration.
Informatics system that have been discontinued for further use.
CaTissue — This is a CaBig biospecimen informatics system for inventory tracking and clinical and pathology annotation. caTissue permits users to track the collection, storage, quality assurance, and distribution of specimens as well as the derivation and aliquotting of new specimens from existing ones (e.g. for DNA analysis).
- GI Tissue DB — The GI SPORE Tissue Database was designed and build by informatics core in close compliance with CaBIG tissue banks and pathology working group several years ago. This is a secure J2EE web application running on an Oracle 10g Application Server.
- Prostate Tissue DB — This was designed for storing bio-specimens for Prostate tissues.
IBISS Project Pricing, Turnaround Time & Consulting
Interaction with Cancer Center Members:
The service we offer is our expertise in data analysis and management. If we can assist a cancer researcher on a temporary consulting basis, then we will charge for the time spent on a project. We welcome the opportunity to participate in laboratory meetings, research discussions and writing papers and grant applications. We can perform a free preliminary data analysis to support the feasibility of an application. We can also add our experience and expertise in data management and analysis, thereby helping to increase the fundability of many research grant applications. Presently, we are supported by the Cancer Center Core grant, the GI spore grant, and contracts with TGen. We usually have time available and are very interested in participating in more projects.
Our areas of expertise include both computational biology - biological sequence analysis, protein structure analysis, genome analysis, advanced computational analysis of large data sets such as gene expression, single nucleotide polymorphism (SNPs) and proteomics data and basic biological studies - population and molecular genetics, molecular and cell biology, biochemistry, and evolutionary biology. Our goal is to provide assistance with data analysis that will lead to testable hypotheses and fundamentally important discoveries in cancer research. We specialize in the biological interpretation of data, leading to a new understanding of cancer biology, and the discovery of new diagnostic markers, risk genetic markers (haplotypes), patterns in data, and drug targets. Our staff is well prepared to perform all of these types of analysis.
How we perform our services
First, we are a group of experienced computer programmers, database designers, and website developers. We program in all of the currently used languages in bioinformatics including Java, Perl, PHP, and R. We use programming tools that have been established by the bioinformatics community in the R programming language and the BioConductor project (http://www.bioconductor.org) and that are based on sophisticated but sound statistical principles. We routinely utilize several database management systems including Oracle, Postgres, and MySQL. Second, we keep ahead in the bioinformatics literature – books and peer-reviewed papers, participate in NIH study sections and NSF panels, and attend meetings and conferences to stay abreast of new technology. We utilize computational tools developed and published by others and public websites of biological data, often by automated computer scripts. When existing computer programs or commercially available software are not useful for a cancer research project, the Bioinformatics Shared Service develops the needed computer tools and databases locally. We perform many other types of analysis with various types of high throughput data including data smoothing, modeling, and mining. The examples given below illustrate these types of analysis.
Interaction with other Shared Services
We assist users of the Genomics and Proteomics Shared Services by providing storage on web-based databases that we have designed at http://www.protbase.org and http://azcc-microarray.arl.arizona.edu/index.php respectively. We also provide biological interpretation and modeling of genomic and proteomic results. For example, the Informatics/Bioinformatics Shared Service offers assistance with experimental design and a basic analysis of gene expression experiments to include a quality control analysis and analyses to obtain a list of varying genes that provides information on the biological functions of these genes and their metabolic and regulatory interactions. A similar service is offered to users of the Proteomics Shared Service. We also provide advanced computational modeling of the data in order to find biological patterns that are diagnostic for disease or candidate targets for new drugs. We offer the most current and powerful computational tools and models to help cancer center investigators get the most information from their experiments.
Methods for finding genes that are changing in cancer cells and tissues.
Pathway Analysis: In the first method, a pattern of significantly varying genes is input into our local database, which has collected all of the current information on the human genome that is available, as well as many other genomes (this database is described further below). An output of the Pathway Miner tool shows which genes are in well-known regulatory and metabolic pathways in pancreatic cancer tissues.
Methods for finding new drug targets.
Novel methods are used to determine which genes are suitable drug targets or predictive for disease. Often, targets are identified as products of significantly over-expressed genes in cancer tissues and cells. We also use a sophisticated computational method called graph theory to determine which genes appear to interact, such as one gene regulating another. These types of graphs can be used very effectively to predict gene interactions by combining different data sets and types of analysis (also see http://www.cytoscape.org).
Alternatively, the genes could be a synthetic lethal combination in that the cell needs one gene or the other for viability, or needs one gene expressed in order to compensate for over-expression of another. Through data analysis, we can determine if a cancer cell is depending totally on one gene, having lost or over-expressed the other, making the remaining gene a sensitive drug target. As an example, we are helping to discover synthetic lethal targets for over-expressed cell cycle gene in adrenocortical and pancreatic cancer. We are also able to search for genes that are likely to be strongly over-expressed because of a rearrangement in the genome that fuses the gene to a strong promoter, as occurs in chronic myelogenous leukemia and other B and T cell malignancies.
Future role for the Bioinformatics Shared Service as a Resource for Cancer Genome and Population Genetic Data.
The Bioinformatics Shared Service will continue to aid The University of Arizona Cancer Center investigators with access to genome and proteome data, data analysis, and integration of large data sets data with clinical and biological information for specific research projects. They will keep abreast of new data sets, analytical tools and generate new computational tools and methods, as needed. Large public data sets on the cancer gnome will become available in the next two years. The National Cancer Institute has initiated the Cancer Genome Anatomy Project in which designated Genome Centers will collect high quality gene expression, gene copy number, and DNA methylation array data on thousands of cancer tissues (http://genome.gov/19518624 ). A related project will identify sequence variations in the major cancer genes in as many tissues. The hapmap project is collecting large data sets of genetic variability in human populations. In most cases, it will be beyond the capability of individual laboratories to utilize these data; the Bioinformatics Shared Service is prepared to assist as we are experts in this area.