Analysis Match* automatically discovers other IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms across experimental situations. IPA matches your analysis against other analyses you have created (in your Project Manager) as well as thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.
In this release, improvements to Analysis Match enable you to more easily control which of the Lands are used in the matching, and the detailed results in the heat map are more easily interpreted and available for follow up. You can now manually add experiment metadata to your own datasets to label them more clearly in the Analysis Match table and to find them using Project Search.
Summary of Analysis Match Improvements
Find additional matches in IPA with newly added Land comparison datasets from OmicSoft. IPA has been updated with approximately 700 additional analyses from OmicSoft in this release, including a new Land in OncoLand called MetastaticCancer.
Control which Lands are used in matching by simply selecting them from a drop-down menu in the Analysis Match tab (Figure 1).
In the Analysis Match heat map:
Focus on the most important z-scores in the heat map by setting a threshold that visually indicates which heat map cells have insignificant z-scores or p-values (Figure 2).
Follow-up and understand how dataset molecules from your analysis or OmicSoft analyses connect to the entity in the signature (e.g. an upstream regulator, disease, function or canonical pathway) by opening and visualizing the underlying networks or pathways represented by each heat map cell (Figure 3).
Explore clusters of signature entities or analyses by using the heat map dendrograms. Select groups of signature entities such as upstream regulators and diseases and function by clicking their dendrograms in order to add the entities to a pathway or list for further analysis (Figure 4), or use the column dendrogram to select up to 20 analyses for a full Comparison Analysis (Figure 5).
Details of Analysis Match Improvements
Fig 1. Filtering the Analysis Match results by source (Land). Use the enhanced Project menu in the Analysis Match tab to choose which Lands you would like to use for matching. Click on one or on multiple repository names to select them. You can also include your own projects by expanding the My Projects tree and clicking on your project’s name(s). Or use the radio button and switch to doing a free text search by project name (i.e. Land name). MetastaticCancer is a new Land in this release. Fig 2. New option in the Analysis Match heatmap to indicate signature entities that are NOT significant in the other analyses. The heat map in Analysis Match is constructed by showing all the signature entities from the analysis you opened (the analysis of interest), using color to represent each entity’s z-score in that analysis as well as in the other analyses you selected when you created the heatmap. However, although the heat map square for a particular entity in another analysis may be colored orange or blue, its underlying z-score may be too small to be considered significant. Now you can mark such instances as insignificant as shown above. In this example, a threshold of “2” was entered in the “Insignificance Threshold” field to label with a dot those heat map squares which have a smaller value than that threshold (i.e. <2), enabling you to visually ignore the insignificant z-scores.
Fig 3. Explore a signature entity’s underlying network by clicking on a heatmap square. By clicking on a heatmap square in the Analysis Match heatmap, you can now display its underlying network or pathway. As shown above in part A, clicking on the ACKR2 heat map square in the first column displays its network in the right panel. The molecules from the dataset are shown in the Molecules tab (part B above), and clicking on the name of an analysis in the header of the heatmap will display a tab showing the analysis’ metadata (if it has any) in the Metadata tab (part C above). See Figure 6 below to see how you can enter metadata for your own datasets.
Fig 4. Conveniently select a set of signature entities in the Analysis Match heatmap for further exploration via the row or column dendrograms. To explore a set of related signature entities, select them as a group by clicking on their dendrogram. For example, the top-most cluster of entities (rows) was clicked to select a group of related signature entities. The selected group can be sent to a new pathway or a new list by using the buttons along the top of the heat map. Or the selection can first be modified by command-clicking (Mac) or control-clicking (Windows).
Fig 5. Select a set of analyses for further exploration in a full Comparison Analysis. Select a set of related analyses by clicking on their cluster in the column dendrogram. As shown above, a cluster of analyses (columns) was selected by clicking on the portion of dendrogram above them. The analyses can then be viewed more fully by clicking on the View Comparison button. Up to 20 analyses can be viewed in a Comparison Analysis. The selection can first be modified by command-clicking (Mac) or control-clicking (Windows) to limit to <20.
Annotate and tag your datasets with IPA’s new metadata editor
Now you can annotate your uploaded datasets with information that will help you quickly find those datasets (or analyses created from them) using project search, or help you to remember details about them when interpreting the results of their analysis. This is especially useful in the context of Analysis Match, where metadata from the dataset can be displayed in columns in the Analysis Match tab.
When you upload your dataset, you can enter relevant metadata about it in the IPA user interface. For example, you could annotate them by leveraging existing OmicSoft fields such as “case.disease” or “case.tissue” by typing in values such as “asthma” or “lung”, or create your own custom fields to annotate. For example, you could create a new field called “eNotebook record” and enter a clickable hyperlink that points to an internal online record about the experiment that led to that dataset, or create a field called “Collaborators” and put in names of colleagues involved with that dataset. The metadata you add to a dataset is automatically propagated to any Core Analysis created from it. Keep in mind that the metadata you enter is for your purposes only, and is not used by IPA to influence the analysis results. Figure 6 shows how you can enter metadata for a dataset.
Figure 6. Entering metadata for a dataset. Existing keys from OmicSoft can be used, or you can create a custom field as shown above. In this instance, a new field called “Hyperlink to paper” was created and a hyperlink was pasted in (control-v). Other metadata was added as well such as tissue type, disease state etc. The metadata will propagate to any Core Analysis created from this dataset.Figure 7. Searching for datasets and analyses using the metadata you entered for the dataset. In this example, an analysis was found using the keyword “GSE11352”, which had been entered as metadata in the OmicSoft field “projectname” for the dataset. In this example, there are also OncoGEO analyses with that same GSE#.
Metadata can be added or edited either before or after saving the dataset file. It is also possible to insert metadata at the top of the dataset text or Excel file itself before you upload it, by following instructions here. This is especially useful when you wish to enter a large amount of metadata or if you have many similarly derived datasets that have mostly the same metadata. In this release, you can edit that uploaded metadata in the metadata tab (during upload), or after saving and re-opening it.
*Analysis Match requires additional licensing. Please contact us for info.
Other Updates to IPA
New criteria to select or highlight nodes on networks and pathways
IPA now gives you more flexibility to use your creativity to build and modify networks and pathways. You can globally select nodes on pathways by additional criteria to take further actions on the nodes. Specifically, you can highlight or select nodes by their overlay and by their connectivity. For example, if you have overlaid expression fold change values, you can first select only the up-regulated genes and move them all at once to a different place on the network canvas, and do the same for the down-regulated nodes. Or you can select all the unconnected nodes and delete them. Or you could highlight the most highly connected nodes in the network.
Figure 8. Highlighting or selecting nodes via their overlay. The Highlight menu in the Overlay tools has been renamed to “Highlight or Select” because you now have the choice to either highlight or to select nodes meeting your criteria. Highlighting means coloring the borders of the nodes purple (the “Outline” option in the menu at the bottom right of the window) or filling them with a dark blue color (the “Fill” option in the menu at the bottom right of the window). Selecting means coloring their borders blue (using the “Select” option in the menu at the bottom right of the window) to put them in a state where you can do further actions on them, for example deleting them or moving them around on the pathway canvas as a group. In the example above, nodes with no values in the overlaid dataset (i.e. white colored nodes) are selected as a group.Figure 9. Highlighting or selecting nodes via their connectivity. The new Node Connectivity filter is used to select nodes via how connected they are to other nodes on the network or pathway. As shown above, nodes connected to >6 other nodes were selected. This resulted in the 3 most highly connected nodes (“hubs”) being selected.Figure 10. Trimming nodes via their connectivity. The Node Connectivity filter is also available in Trim and Keep in the Build menu. In this example, the Node Connectivity filter is used in the Trim tool to remove all unconnected nodes.
Exert more control over your Core Analysis with separate up and down cutoffs.
Separate up and down cutoffs must now be entered (rather than a single absolute value) for directional measurement types such as fold change or log ratio. This gives you more control over the makeup of the set of molecules that IPA analyzes from your dataset, as compared to using a single absolute cutoff. Figure 11 below shows an example of this.
Figure 11. Set separate up and down cutoffs for Core Analysis. Now when setting up a Core Analysis, when you use a cutoff for a directional measurement (those with both positive and negative values like fold change or log ratio), you must enter a separate value for a negative and positive cutoff. As shown above in this example, a cutoff of -1.5 and 3 is used for Expr Fold Change for down and up, respectively. This means that genes with expression fold changes >-1.5 and <3 will not be used in the analysis. Notice that the counts of “down genes” vs. “up genes” that survive the cutoffs are displayed next to the recalculate button and indicated in the image above with red arrows.