Latest improvements for IPA
Current line Archive
What’s new in the IPA Fall Release
September 30, 2017
Analysis Match* automatically discovers other IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms. It matches your analysis against other analyses you have created (in your Project Manager) as well as thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.
With this new capability, you can:
- Build confidence in your results by identifying shared biological signatures across disparate diseases, tissues, treatments and more.
- Develop greater insight—about upstream drivers, downstream phenotypes and biological pathways by examining their potential roles in disease and other conditions.
- Easily obtain and evaluate critical hypotheses across an extensive collection of public data.
The analyses included in Analysis Match were generated in IPA from more than 6,000 highly curated and quality-controlled human and mouse disease and oncology datasets re-processed from SRA, GEO, Array Express, TCGA and more. These datasets were generated by QIAGEN’s recently acquired company, OmicSoft, and are the “comparisons” found in DiseaseLand
representing various contrasts between disease and normal, treatment vs. non-treatment and much more.
Figure 1 shows the new Analysis Match tab from one of IPA’s Example Analyses based on the expression data derived from mouse lung exposed to welding fumes. The results in the figure have been filtered to show only the highest scoring results against all the analyses in the OmicSoft repository within IPA. Of the more than 6,000 in the repository, 125 analyses had an overall score of >60% or <-60%, corresponding to strongly similar or dissimilar patterns, respectively. You can further filter the results in a number of ways, for example by type of comparison, by disease state, tissue, and much more. The keyword filtering is possible because each analysis has been extensively annotated by OmicSoft using a controlled vocabulary which can be displayed in columns as shown in figure 1. Only a few columns are shown in IPA by default due to screen space limitations.
The analyses are matched based on a set of signatures that are created for each analysis, namely one signature for the Canonical Pathways, one for Upstream Regulators, one for Causal Networks, and one for Diseases and Functions. Each signature is used independently to match against other analyses, and an overall average is computed.
*Analysis Match requires additional licensing. Please contact us
Fig 1. Analysis Tab displaying matching analyses. By default, the analyses are ranked from most similar to least similar based on the overall similarity score (the right-most column shown above). The analyses are matched based on a set of signatures that are created for each analysis, namely for Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions. Each signature is used independently to match against other analyses. In the image above, each of the first four colored columns at the right represents the percentage similarity of each type of signature to the analysis you opened. The fuchsia color indicates similarity (shown here) and cyan color indicates dissimilarity (not shown here). The first scoring column (“CP”) is the match for the Canonical Pathway signature, the second (“UR”) is for Upstream Regulators, the third (“CN”) is for Causal Networks and the last (“DE”) is for Downstream Effects (i.e. Diseases and Functions). The final column shown above is the average of those four signature matches. More detail on the signature scoring algorithm can be found here. Note that some of the columns normally shown by default in the Analysis Match tab have been hidden in this figure.
As shown in Figure 1, the analysis with the best overall match from the repository is an expression analysis from mouse lung exposed to heat killed influenza virus (from GSE41684), which has strong similarity across all 4 signature types. The next step is to explore the signatures themselves across all or a subset of matching analyses, to understand in more detail which “entities” (the set of upstream regulators, canonical pathways, etc.) drove the similarity scoring. In this example, the matching analyses were further filtered to limit to the repository folder called “MouseDisease” which retained 75 of the analyses, and a heatmap was created by clicking the View as Heatmap button. Figure 2 shows this heatmap, where the rows are the entities from the four signatures with columns for the 75 similar (and dissimilar) analyses. The z-score for each entity from each analysis is represented in the cells with an orange or blue color (for positive and negative z-score respectively).
Fig 2. The heatmap of the signatures vs. the matching analyses reveals similarities and differences. The ”4 hr lung” analysis (highlighted in pink above) by definition has a significant z-score for every entity that is listed in the left column, because those entities represent the union of all 4 types of signatures derived for that analysis. The other selected analyses are shown for reference and may or may not have a significant z-score for each entity. The rows and columns were clustered using agglomerative clustering with Euclidean distance and average linkage (UPGMA linkage).
The heatmap is filterable to enable you to focus on the types of entities of interest to you. Figure 3 shows the heatmap filtered for upstream regulators which are classified as transcription regulators. The clustering of the rows reveals which transcription regulators have similar patterns across the analyses, whereas the clustering of the columns shows which analyses are most closely correlated to one another based on the underlying transcriptional regulator pattern.
Fig. 3. Analysis Match Heatmap filtered to show only upstream regulators which are classified as transcription regulators. The heatmap offers several filters to enable you to explore the nature of the signatures. Clicking on a column header for an analysis in the repository displays its metadata at the right side of the window as shown.
The clustering of the entities (the rows) can reveal interesting similarities among the entities. For example, after removing the prior filter in order to show all the entities, Figure 4 shows that the drug bexarotene clusters closely with the “PPAR/RXR activation” canonical pathway in a larger cluster containing CR1L, ALDH1A2, SUMO1, and ABCB4. Bexarotene is an RXRA and RXRB agonist, providing a rationale why it tightly correlates with this pathway in the heatmap. SUMO1 is a regulator of PPAR activity, whereas it is not as clear why the other entities appear in this cluster, an observation which could provide interesting avenues of investigation.
Fig.4. Heatmap showing a cluster which contains both an upstream regulator and a canonical pathway. Tight clustering of entities may reveal correlations that may be of biological interest.
You can select and send entities (except Canonical pathways) to a My Pathway for further analysis, for example to connect nodes together or to discover drugs that target them.
Another valuable way to use the OmicSoft analysis repository is to start by finding analyses of interest by using IPA’s Dataset and Analysis Search by entering keywords such as disease name or tissue. Figure 5 below shows a search for human asthma analyses but excluding those involving albuterol. From search results like these, you can double click to open an analysis, or select up to 20 to visualize in a full comparison analysis.
Fig 5. Discovering analyses of interest using Dataset and Analysis Search. The query “human AND asthma NOT albuterol” finds 136 analyses with those keywords in the OmicSoft repository in IPA. Double-click to open one or create a Comparison Analysis with up to 20. Metadata about the selected analysis (or analyses) is displayed on the right side of the search screen.
The repository of datasets and analyses are stored in IPA’s Libraries folder in the project manager as shown in Figure 6. Note that these are read-only and cannot be exported out of IPA.
Fig 6. OmicSoft repository in IPA with over 6000 datasets and corresponding datasets and analyses. The repository is read-only and cannot be exported out of IPA.
Analysis Match combines literature-powered causal analytics from IPA with a massive dataset collection provided by OmicSoft, creating a unique opportunity for you to make biological discoveries.
Other great updates to IPA
- Dendrograms in the Comparison Analysis heatmap.
- Export of chemical IDs from networks and pathways.
- Four new Canonical Pathways.
- Support for Clariom arrays from Affymetrix.
- New findings including 56,000 from the BioPlex 2.0 protein-protein interaction database.
- A new help portal for IPA.
What’s new in the IPA Spring Release (March 2017)
Changes in the phosphorylation states of proteins provide an important regulatory mechanism in mammalian cells. Now you can get more from your phosphoproteomics datasets in IPA with a new Phosphorylation Core Analysis*.
Discover upstream regulators and causal network master regulators that may be driving the changes in phosphorylation levels of the proteins in your phosphoproteomics dataset. These results provide testable hypotheses by identifying potential upstream signaling cascades from the phosphorylation patterns in your dataset.
To illustrate this new feature, we analyzed a phosphoproteomics experiment obtained from the literature, in which insulin was applied to starved mouse adipocytes that had been differentiated from 3T3-L1 cells in vitro (PMC3690479). Phosphorylated proteins were isolated from the cells by the authors during a time course of 15 seconds to 1 hour.
As shown below in Figure 1, after 15 seconds of insulin exposure, a characteristic phosphorylation pattern is established in these adipocytes highlighted by the fact IPA predicts insulin (gene symbol Ins1 below) as one of the top predicted upstream regulators which is activated.
Fig 1. Upstream Regulator Analysis. The pattern of differentially phosphorylated proteins in the dataset of insulin- treated cells was used to predict the responsible upstream molecules.
Fig 2. The Ins1 Upstream Regulator network in the 15 second time point. Insulin is a top upstream regulator predicted to be “activated” based on the pattern of phosphorylation of insulin targets in adipocytes treated with insulin for 15 seconds. Proteins with red fill color have increased phosphorylation relative to the untreated control, and the green node have relative decreases in phosphorylation. Clicking on the badge next to each protein displays the differentially phosphorylated peptides that were uploaded in the dataset (as shown for the insulin receptor, INSR).
Figure 2 indicates there is a positive phosphorylation relationship (orange line) between Ins1 and GAB1. This is supported by a paper that showed that in differentiated 3T3-L1 cells, insulin can increase the phosphorylation of GAB1. For the relationship between Ins1 and STAT3, a different paper showed that insulin can increase the phosphorylation of Stat3 in RAW 264.7 cells (see Figure 3 below).
Fig 3. Examples of phosphorylation findings curated from the literature in the QIAGEN Knowledge Base. Both indicate that insulin can increase a target protein’s phosphorylation (indirectly through unspecified mediators).
Causal Network Analysis predicts regulatory networks to explain phosphorylation changes exhibited in a dataset. Causal Network Analysis enables the discovery of novel regulatory mechanisms by expanding upstream analysis to include regulators that do not yet have known “direct” connections to the targets in your dataset.
For example, stimulating adipocytes with insulin is predicted to activate the master regulator FLT1 (also known as the vascular endothelial growth factor receptor 1) after 15 seconds of exposure. In this causal hypothesis, FLT1 is predicted to drive the activity of nine other regulators which in turn drive changes in the phosphorylation of a larger number of dataset proteins as shown below in Figure 4.
Fig 4. Causal Network Analysis. FLT1 is predicted to activate or inhibit several intermediate regulators leading to the changes in phosphorylation in dataset proteins.
If you’re an existing customer, launch IPA from your desktop and check out the new features. If you need to install IPA, click here
What’s new in the IPA Winter Release (December 2016)
Enhanced phosphoproteomics data visualization
Changes in the phosphorylation states of proteins is an important regulatory mechanism in cells. Now you can get more from your phosphoproteomics datasets in IPA with improvements to phosphorylation data upload and visualization.
Last September the IPA Fall Release added a new data type to IPA to support the upload of protein or gene IDs along with corresponding phosphorylation increases or decreases represented as fold change (or log ratio). With this December release you can now upload the corresponding individual phospho sites for display on networks and pathways. These can be represented with any text you wish; such as the actual phosphorylated peptide, e.g. _FSSS(ph)QPEPR_ as shown in Figure 1 below, just a residue number (e.g. Y347), or any combination of text and numbers.
What’s new for the Winter Release:
1) Visualize multiple differentially phosphorylated sites (phospho peptides) on networks and pathways.
Fig 1. Display multiple phospho sites from an uploaded “phospho” dataset. Top image: The small badge at the top right of the node indicates how many phospho sites are in the dataset or that passed your cutoffs in an analysis (depending on whether a dataset or analysis is overlaid). In this example, two phospho peptides for Chk1 passed the analysis cutoff for Phospho Fold Change. Clicking the badge shows the differential phosphorylation as a heat map alongside the phosphorylated peptide sites (if uploaded in the dataset). Bottom image: Example of phosphorylation sites uploaded in the dataset (right column).
2) Easily identify the proteins on networks and pathways where IPA predicts that increases in phosphorylation inhibits their activity or where decreases in phosphorylation increases their activity. The activity of certain proteins is more likely to be inhibited by phosphorylation than activated by it. In the example below the Molecular Activity Predictor, with overlaid phospho data, indicates this by using blue or orange halos to indicate the predicted activity.
Fig 2. MAP (Molecule Activity Predictor) now uses colored halos around nodes on networks and pathways to indicate the activity for proteins which are inhibited by phosphorylation. Phosphorylation fold change data has been overlaid on CFL1 and GSK3B. CFL1 has increased phosphorylation in this dataset and MAP indicates that its activity is inhibited with the blue halo. GSK3B has decreased phosphorylation in the dataset and MAP indicates that it is likely activated using the orange halo. The full list of proteins where phosphorylation is expected to be inhibitory is available here in the IPA help portal.
Get more from your phosphoproteomics datasets in IPA. If you’re an existing customer, launch IPA from your desktop and check out the new features. If you need to install IPA, click here
What’s new in the IPA Fall Release (September 2016)
Discover significant isoforms in RNA sequencing data with enhanced IsoProfiler
RNA sequencing technologies can generate datasets with thousands of differentially spliced transcripts. IsoProfiler helps you determine which isoforms have interesting biological properties relevant to your research project.
- Results are now expanded to include gene-level disease and function annotations to enable you to focus on potentially biologically interesting (but not yet well-understood) isoforms
- Quickly narrow down to the transcripts of interest by searching on specific gene names or disease or function terms
- Save time by visualizing isoform schematics inside IsoProfiler to understand the basic structure of the isoforms of interest
- Focus on protein-coding transcripts with the new transcript type column for RefSeq datasets
Fig 1. Overview of IsoProfiler, with highlights indicating the new features. IsoProfiler can visualize one or more transcript-level RNA sequencing datasets in a single view and enables you to filter and sort to focus on isoforms that have biologically relevant attributes. The top right table shows each gene in your dataset with its associated transcripts and expression data. When a gene is selected, the bottom right table shows the specific isoform-level details for that gene. 1) A new column displaying diseases and functions known to be associated at the gene-level (as well as at the isoform level) has been added to the top table. This may help you identify the specific isoforms in your experiment that drive the known gene level associations. 2) New filters have been added to search for specific gene name or specific disease and function terms that are pertinent to your dataset(s). See Figure 2 for additional details. 3) New dynamically re-sizable schematics of the isoforms are now displayed in the lower table for the gene selected enabling you to see the overall splicing pattern of each transcript.
Fig 2. Gene-level Disease or Function filter in IsoProfiler. Simply start typing in the text box to focus the list down to relevant filters. In this example, “epith” has been typed, which instantly limits the list of filters to terms like “chemotaxis of epithelial cells”, etc. The same type of filter is now also provided for isoform-level diseases and functions.
IsoProfiler is available in IPA with Advanced Analytics.
Visualize phosphoproteomics data on networks and pathways
Enhance your multi-omics research approaches by uploading simplified phosphoproteomics datasets to IPA for overlay onto networks and pathways. In a first step to better support the understanding of phosphorylation state and the associated biology, a new “phospho” measurement type is being introduced with this release of IPA. Overlay phosphorylation and expression profiles on networks and pathways to identify key areas where phosphorylation is impacting the biological activity of the encoded proteins.
If you have performed both gene expression and phosphoproteomics profiling, you can visualize both of these data types simultaneously as bar charts on networks and pathways. Figure 3 below shows the upstream regulator MAPK1 which IPA predicted to be activated by alpha-toxin (hemolysin) treatment of S9 cells. This prediction was based on a Core Analysis of the gene expression data after exposure to the toxin. The expression data shows that MAPK1 is not itself differentially expressed, but overlaying the accompanying phosphoproteomics dataset on the MAPK1 network provides a possible mechanism for its activation—MAPK1’s phosphorylation level is increased which is likely to activate it and lead to the observed expression changes downstream. In Figure 3, you can see in contrast that JUN is both upregulated and exhibits higher protein phosphorylation after the treatment.
Fig 3. Upstream Regulator Network for MAPK1 with expression and phosphorylation data overlaid. MAPK1 is differentially phosphorylated, which may explain its predicted activation as a regulator of the expression of the genes connected to it in the network. In contrast, JUN is both phosphorylated and differentially expressed. The microarray and phosphoproteomics data used in this figure was obtained from http://dx.doi.org/10.1371/journal.pone.012208
What’s new in the IPA summer release (June 2016)
Discover significant isoforms in your RNA sequencing data
with the enhanced IsoProfiler
RNA sequencing technologies can generate datasets with thousands of differentially spliced transcripts. IsoProfiler helps you determine which isoforms have interesting biological properties relevant to your research project.
isoforms with significant pattern(s) of expression, such as:
- Genes where isoforms are both upregulated and down regulated in the same dataset, which may have important functional consequences
- Isoform switching– when the most highly expressed (highest RPKM) isoform for a gene differs between the experiment and the control samples
- Multiple protein-coding isoforms expressed for the same gene
the most significant isoforms by:
- The range of fold changes within each gene
- Highest differential expression
- Most or fewest transcripts per gene
on important attributes of the isoforms:
- Isoforms that exceed thresholds that you set such as fold change, p-value, FDR, or RPKM
- Associated with known diseases or functions
- Encode proteins (as opposed to those with retained introns or are pseudogenes for example)
- Encode a principal isoform as annotated by APPRIS (http://bioinfo.cnio.es).
isoform-level expression in one dataset or across multiple datasets:
- Visualize isoforms with moderate fold changes that are highly abundant as compared to isoforms with large fold changes that are expressed at lower levels
- See which isoforms are similarly differentially expressed across multiple datasets
- Overlay transcript-level expression on a Network, Pathway or Isoform View
Fig 1. Overview of IsoProfiler. Visualize one or more transcript-level RNA sequencing datasets; filter and sort to focus on isoforms that have biologically relevant attributes. The top table shows each gene and their associated transcripts while the bottom table shows isoform-level details for one gene at a time (based on the row you select in the top table). Click on the plus (+) sign in the left filter panel to display filter options that can be added. In the example shown above, the dataset is filtered for isoforms with fold change less than -2 or greater than +2, and only shows genes where isoforms are both up and down regulated in the dataset. Transcripts are represented as circles in the Expression Patterns column in the top table, with green circles indicating down regulation and pink or red circles corresponding to upregulated transcripts. The size of the circles represents the abundance of expression (for example RPKM) if you have included at least one such column in your dataset—larger circles have higher abundance transcripts.
Fig 2. Compare up to 20 transcript-level datasets in IsoProfiler. In this example, human endometrioid endometrial carcinoma (EEC) and hepatocellular carcinoma (HCC) RNA-seq datasets are compared. The results are shown after using IsoProfiler to set expression value cutoffs, filter for protein-coding isoforms, and keeping only those genes with isoforms in the dataset that have known disease and function associations.
Drill-down into the “IsoProfiler Findings” view to explore the details about the isoforms that have disease or biological function findings captured from the literature. This is done by selecting rows (or all rows) in the top table and clicking the IsoProfiler Findings button at the top of the table. This will open a special window as shown in Figure 3. Only isoforms with disease or function associations will appear in this window. This table enables filtering on findings-level details using the funnels, or filters, above each column.
Fig 3. Explore the details of isoform-level disease and function associations. Filter and explore the findings that connect isoforms to disease and functions.
IsoProfiler is part of Advanced Analytics.
What’s new in the IPA Spring 2016 release
Quickly compare results across ‘omics datasets on networks and pathways
Identify significant trends in genes involved in a pathway or network across conditions such as time or dose and elucidate possible mechanisms driving gene expression results with both variant gain or loss of function and expression results. Visualize multiple ‘omics datasets simultaneously on IPA networks and pathways.
- Overlay multiple gene expression datasets/analyses on a canonical pathway (or on any collection of genes) simultaneously to see how genes are regulated across various conditions. Visualize multiple measurements at once—for example both Fold Change and the Intensity of the expression (e.g. RPKM in the case of RNA-seq data) as shown in Figure 1.
Fig 1. Three RNA-seq time points taken during in vitro mouse cardiomyocyte development overlaid on the Integrin Signaling Pathway (zoomed in).
As the cells differentiate from embryonic stem cells into beating cardiomyocytes in vitro, a number of genes on this pathway are progressively upregulated. Several genes in the myosin subunit regulatory light chain family are upregulated over the time course. The new bar charts can show multiple measurements and datasets at one time to give you more insight into the details of the differential expression. In this example both the RNA-seq fold change and the intensity (RPKM) across the three analyses are shown. From this visualization, one can deduce that Myl7 becomes much more highly expressed than Myl2 (RPKM ~3800 vs ~115), even though Myl7 has a lower fold change than Myl2 (~955 vs. ~19,149). The fold changes alone don’t reveal this level of detail across the time points.
IPA also presents the multi-dataset / multi-measurement results in a table view that can be exported. Figure 2 shows an example of a portion of that table.
Fig 2. Clearly identify trends across genes, conditions, and datasets with the exportable table view.
The same genes shown in Figure 1 above are shown here in the new table view within the Overlay Datasets, Analyses & Lists tool, though in this table a line is drawn to connect the bars when possible to help visualize patterns.
Elucidate possible mechanisms driving gene expression
results by simultaneously overlaying both gene expression analysis and variant loss/gain datasets on a pathway or network. In this way you can see which genes are differentially expressed and harbor potentially deleterious variants.
Fig 3. Uncover possible mechanisms driving gene expression results. RNA-seq gene expression data from three hepatocellular carcinoma (HCC) patients was used to predict that the NONO protein is inhibited. Expression from the three patients was processed in Biomedical Genomics Workbench (BxWB) and then analyzed in IPA, which led to the prediction of NONO inhibition using Causal Network Analysis. Variants were also called on the transcript sequences from these patients using BxWB and analyzed using Ingenuity Variant Analysis. All three patients were found to have potentially deleterious frameshift and missense variants in the NONO gene. Data from both BxWB and Variant Analysis were exported directly to IPA. The three green bars in Figure 3 correspond to predicted loss of function variants for each of the patients, and the red bar indicates that the expression was upregulated in the patients, perhaps as a compensatory mechanism for loss of function. NONO has been found to be mutated in a number of cancer types.
IPA Fall Release 2015
What’s New in the IPA Fall Release (September 2015)
Find the biology hidden in your RNA-seq dataset with IsoProfiler
Quickly see which diseases, functions, and phenotypes are associated with differentially expressed isoforms in your RNA-seq experiment using IPA’s new IsoProfilerBETA
. Get early access to IsoProfiler as part of Advanced Analytics.
Simply filter to determine if certain isoforms (splice variants and their products) are known to drive a disease or process. For example, Figure 1 shows isoforms driving metastatic processes in a human breast cancer RNA-seq dataset.
Fig 1. IsoProfiler results. The table displays all the isoforms that have a curated relationship to a biological function, phenotype, or disease. In this example, the table has been filtered to display the isoforms known to be involved in metastasis. This isoform of ADAM12 is upregulated in the dataset, providing an avenue of experimental inquiry – perhaps this short form is responsible for the aggressiveness of these breast cancer cells.
Understand the biological impact of prioritized variants from DNA or RNA-sequencing experiments
Fig 2. ADAM12 isoform view shows that a shorter isoform, ADAM12S, is upregulated in the breast cancer cells, with a fold change of 66.3.
Import genetic gain/loss information for a set of genes and predict the variant effect on diseases, functions, phenotypes and canonical pathways. IPA now supports a new data type for gain or loss of function variants that result from genome or transcriptome sequencing data.
Overlay Gain or Loss of function variant values onto genes on networks and pathways to display their effects on genes and use MAP (Molecule Activity Predictor) to compute the impact on neighboring connected genes.
Discover mechanisms of upstream activation or inhibition by combining variant gain or loss of function results with expression data
Fig 3. Gain or Loss of function variants (green-colored nodes indicating loss of function variant) in genes on the ERK5 Signaling Pathway could lead to increased cell survival and decreased gene expression in this endometrioid endometrial carcinoma analysis.
Combining Gain or Loss of Function variant data with expression data unlocks the ability to investigate whether upstream regulator predictions based on expression data may in fact derive from variants that activate or inactivate the regulator itself.
Using Upstream Regulator Analysis, if there are cases where an upstream molecule has been predicted to be activated or inhibited, you can quickly discover if the gene for that regulator has a corresponding gain or loss of function variant.
Fig 4. Upstream regulator analysis of an endometrioid endometrial cancer patient (tumor vs. normal adjacent tissue). The result shows that the NFKBIA protein is predicted to be an inhibited upstream regulator AND has a likely loss of function (see red box above), which corresponds with and may explain the predicted loss of its activity as an upstream regulator.