A reference data management tool is now available via the Workbench interface. It can be used for finding, downloading and managing reference data, as well as downloading and managing sets of reference data, which can then be used when configuring the reference data to be used in workflows.
Reference data can be downloaded from public repositories such as Ensembl and sets of reference data of relevance to biomedical and panel data analysis can be downloaded from a QIAGEN repository.
If the Workbench is connected to a CLC Genomics Server, and that server has been configured to allow storage of reference data, then data can be downloaded using the CLC Genomics Server or grid nodes.
In some data selection wizard steps, two tabs are now available: the original Navigation Area view, where data of appropriate type to use as input are presented, and a new Reference Data tab. Under the Reference Data tab, reference data elements obtained using the QIAGEN Sets or Custom Sets tabs of the Reference Data Manager can be selected as input.
A new concept called a Worklow Role has been introduced, allowing workflow input elements to be linked to data elements of a Reference Data Set. Workflow input elements can be configured with a Workflow Role within a workflow design. Data in QIAGEN Sets have Workflow Roles pre-assigned. Workflow Roles can be assigned to other data elements using functionality found under the Custom Sets tab of the Reference Data Manager.
Bisulfite Sequencing Analysis
Three tools for analyzing cytosine methylation data are now available from a folder under the Epigenomics Analysis folder of the Toolbox: Map Bisulfite Reads to Reference, Call Methylation Levels, and Create RRBS-fragment Track. These tools reveal methylated cytosines genome wide and at single base level resolution, support statistical comparison between samples accommodating different experimental designs, and support reduced representation sequencing. These tools were formerly available via the Bisulfite Sequencing plugin, but are now integrated into the Workbench.
The Map Bisulfite Reads to Reference offers the option to enable global alignments to produce read mappings with no unaligned ends, which was not formerly possible.
The default "cost of insertions and deletions" in the Map Bisulfite Reads to Reference tool is now "affine". This improves results on internal benchmarks because it breaks a symmetry in the default "linear" scoring for reads ending in homopolymers (which are abundant in bisulfite mapping due to in-silico conversion of the reads and references to a 3 letter alphabet). This symmetry meant that either a mismatch or an insertion could be introduced at the ends of some of these reads without changing the mapping score. In practice the mismatch is more plausible, and this is favored by the affine penalties.
Other new tools
Import Primer Pairs for importing primer pair locations from a generic text format file or from a QIAGEN gene panel primer file. This tool was formerly only available in the Biomedical Genomics Workbench.
Copy Number Variation Detection (CNVs) for detecting copy number variations (CNVs) from targeted resequencing experiments. Using read mappings and target regions as input, it produces amplification and deletion annotations. It is available under Toolbox | Resequencing. This tool was formerly only available in the Biomedical Genomics Workbench.
Remove Information from Variants for removing annotations on variants. It can be found under the folder Toolbox | Resequencing Analysis | Variant Annotation. This tool was formerly only available in the Biomedical Genomics Workbench.
Differential Expression in Two Groups to be used instead of the more general Differential Expression for RNA-seq tool for testing differential expression between a single treatment group and a control group. Both these tools take the same input, but Differential Expression in Two Groups does not require a metadata table to describe the experimental design.
The Batch Rename tool, available under the Utility folder of the Toolbox, allows sets of data elements, or members of a data element (e.g. sequences in an alignment or reads within a read mapping) to be renamed. Changes can be simple, like adding text to the start or end of names, or more complex changes, using regular expressions or custom options. This tool was formerly delivered in the Batch Rename plugin, but is now integrated into the Workbench.
An item called CLC Server Connection has been introduced under the File menu. This launches a dialog for connecting from the Workbench to a CLC Server. This functionality was formerly delivered in the CLC Workbench Client Plugin, but is now integrated into the Workbench.
The new Welcome Center is presented when the Workbench is first started up. It provides an overview of CLC Genomics Workbench functionality, news as well as links to additional information, data sets, and tutorials, making it easy to start working with the Workbench.
The Extract Reads Based on Overlap tool has been renamed to Extract Reads. In Extract Reads, the "Overlap tracks" parameter is now optional, so all reads in a mapping can be easily extracted if desired. The Extract Reads tool can also generate either reads tracks or sequence lists as output.
The folders and locations of tools provided under the Workbench Toolbox have been updated, better reflecting the purposes of the tools. To easily find specific tools and their new locations, please run the Launch tool in the Workbench.
Tools formerly found under Toolbox | NGS Core Tools are now distributed in other folders to better reflect their purpose. Some also have a new name.
Map Reads to References, Local Realignment, Merge Read Mappings, Remove Duplicate Mapped Reads, and Extract Consensus Sequence are in the Resequencing Analysis folder.
Trim Reads and Demultiplex Reads are in the Prepare Sequencing Data folder.
Sample Reads is in the Utility folder.
Merge Overlapping Pairs is in the Legacy folder.
QC for Read Mapping (previously Create Detailed Mapping Report) is in the Resequencing Analysis folder.
QC for Sequencing Reads (previously Create Sequencing QC Report) is in the Prepare Sequencing Data folder.
Variant detection tools are now all found under the folder Toolbox | Resequencing Analysis | Variant Detection.
The Annotate and Filter Variants folder has been replaced by two folders, with some tools from other folders being included here also:
Variant Annotation, which contains Annotate from Known Variants, Remove Information from Variants, Annotate with Conservation Scores, Annotate with Exon Numbers, Annotation with Flanking Sequences
Variant Filtering, which contains Filter Variants on Custom Criteria, Filter against Known Variants, Remove Marginal Variants, Remove Reference Variants, Remove Variants Present in Control Reads (formerly called Filter against Control Reads)
A new folder called Quality Control has been introduced under Resequencing Analysis. It contains QC for Targeted Sequencing, QC for Read Mapping, Whole Genome Coverage Analysis.
The Compare Variants folder has been renamed Variant Comparisons, and contains Identify Shared Variants (formerly called Compare Variants within Group), Identify Enriched Variants in Case vs Control Groups (formerly called Fisher Exact Test), Trio Analysis.
The folder under Molecular Biology Tools called Sequencing Data Analysis is now called Sanger Sequencing Analysis.
A folder called Utility Tools has been introduced, which contains the tools Batch Rename, Extract Annotations, Sample Reads and Extract Reads (formerly called Extract Reads Based on Overlap).
A folder called Prepare Sequencing Data has been introduced, which contains QC for Sequencing Reads (formerly called Create Sequencing QC Report), Trim Reads and Demultiplex Reads.
The Toolbox | Workflows folder has been renamed Installed Workflows.
Workflows delivered by some QIAGEN plugins are placed under Toolbox | Ready-To-Use Workflows. This folder appears only when at least one Ready-To-Use-Workflow is installed.
Using the Launch tool to find a tool will work with both the new and previous name.
The side panel for a Reads Track now has a legend showing coloring information for the different read types. The legend also allows for customization of read colors.
A new location field makes it easier to navigate tracks. Track locations can be specified using range, positions, chromosome names ("MT:", "5:"), and gene/transcript names ("BRCA2", "DHFR-001").
Reads tracks now have a new coverage graph located above the reads, instead of the overflow graph that was previously placed below the reads.
Reads tracks now have a vertical scrollbar to make it easier to navigate through high-coverage regions.
When hovering the cursor over selected track types, a set of action buttons appear under the track name. These can be used to open the table view or to jump to the next or previous element.
For annotation and variant tracks, the table view is now synchronized, so that making a selection on the track view, will select the corresponding rows in the table.
Variant tracks now include Forward coverage and Reverse coverage annotations.
For large variant tracks, it is now possible to limit the corresponding table view by making a selection on the track.
It is now possible to Copy, BLAST or Open in a New View a selected portion of a read from a read track.
A new option allows users to extract a selected sequence from a track list: Right click a selection made on a sequence track and select "Extract Sequence".
Overlapping variants are now always shown in the same order as they were before when re-opening a variant track: Reference variants first, followed by lexicographic ordered variant based on alteration string (T > G). So snvs are ordered (top to bottom) by A, C, G, T.
Track lists now display additional information about an annotation when hovering on it with the mouse cursor: the name and strand of the annotation, which exon is currently being hovered over and the position of the mouse cursor relative to the start of the annotation. This information is available in the ruler shown in the reference track of a track list, as well as in the lower right corner of the workbench.
Standalone read mapping improvement: Read mappings now have a new coverage graph located above the reads. The overflow graph at the bottom have been removed.
RNA-Seq Analysis tool improvements
The RNA-Seq Analysis tool supports the alignment and quantification of reads that wrap around the ends of circular chromosomes.
The tool caches the data structure used by the read mapper to map reads to known mRNA annotations. This reduces run time by up to 3 minutes per sample, with the greatest benefits being observed when using large numbers of mRNA annotations on systems with few cpu cores.
A new row has been added to the "Strand specificity" section of the report produced by the tool. The row contains the number of "Reads with known strand", which is used in determining the percentage of reads ignored due to being on the wrong strand.
The "Detected transcripts" column has been renamed to "Uniquely identified transcripts" for both the gene-level and transcript-level expression tracks.
For the Statistical comparison track, the Volcano plot view has an option making it possible to visualize the smallest p-values (including p-value=0).
The "Reference Sequence" section of the report now lists the number and length of all chromosomes used during read mapping. Previously it reported only the length and number of chromosomes with at least one genes or transcript.
The RNA-Seq Analysis and Map Reads to Reference tools can now share cached copies of the read mapper indexes. This means that the average run time over many samples will be reduced if both tools are frequently used.
The RNA-Seq Analysis tool is now more efficient when handling large references, particularly when batch processing samples.
Read mappings produced by the RNA-Seq Analysis tool previously ignored deletions and insertions at exon-intron boundaries. This meant that such deletions/insertions would not be detectable in downstream variant calling. The tool has been updated to keep the deletions and insertions in the mapping, implicitly favoring the hypothesis of a deletion/insertion over a novel splice junction. This change does not affect expression levels.
Amino Acid Changes tool improvements
The Amino Acid Changes tool previously used square brackets to describe coding region and amino acid changes when a single variant affected multiple transcripts or proteins, e.g., NM_207170.3:c.[140C>T]; NM_015484.4:c.[266C>T]. These brackets have now been removed (e.g., NM_207170.3:c.140C>T; NM_015484.4:c.266C>T) to comply with the HGVS standards, which reserve the brackets for the reporting of alleles. These changes are also reported by the variant callers when run on a standalone read-mapping with CDS annotations.
The tool describes replacements in the compact format preferred by HGVS (112_117delinsTG). Previously the description included the reference sequence (112_117delAGGTCAinsTG). These changes are also reported by the variant callers when run on a standalone read-mapping with CDS annotations.
We implemented the 3' HGVS compliance rule for c. annotation of variants: When doing p. annotations (protein-level HGVS) we similarly annotate insertions that really are duplications as such.
The tool uses all positions covered by a variant when describing coding region changes, in accordance with HGVS recommendations. Previously the tool restricted its change descriptions to positions within a transcript (if supplied) or CDS. This fix will therefore mainly affect the descriptions of deletions that partially overlap a transcript. These changes are also reported by the variant callers when run on a standalone read-mapping with CDS annotations.
An option can add c. annotations (HGVS DNA-level) for variants that are within a certain distance from the transcript boundaries. The distance can be configured but defaults are set to 5 kb upstream and 3 kb downstream.
An option in the Amino Acid Changes tool allows users to output a variant track HGVS compliant.
An option allows the prioritization of a single transcript when several annotations are available for one variant.
VCF importer and exporter improvements
The VCF exporter and importer have been improved and now support VCF v4.2.
VCF Export "Enforce diploid" option has been replaced with an improved and more general "Enforce ploidy" option set by default to 2. This option gives more control over the exported genotype and better compatibility with external applications such as Ingenuity Variant Analysis.
Four complex variant representations can now be handled by the VCF importer and exporter, including the common reference overlap representation.
The VCF exporter has an option to write variant annotations as INFO fields.
In the VCF importer, we fixed and issue with the import of INFO IDs that contained non-alphabetical characters.
BED importer and exporter improvements
The BED exporter now replaces spaces in feature names with underscores, since white space is not allowed in the BED feature names.
The BED file exporter now always exports to BED12 format.
The BED importer limit for name lengths has been raised from 80 to 256 characters.
The De Novo Assembly tool has been updated to use the same version of the read mapper as the one used by the Map Reads to Contigs tool. This typically leads to more accurate mappings. For larger assemblies the run time is expected to decrease on average, but for small assemblies run time is likely to increase.
Filter Against Known Variants no longer adds duplicate annotations from known variants tracks. In addition, Overlap, Exact match and Partial MNV match annotations are now always added to the output variant track.
The Import Ion Torrent and Import PacBio tool support import of reads from SAM or BAM format files. Mapping information is discarded during this import. To import a read mapping from SAM or BAM format files, use the existing Import | SAM/BAM Mapping Files... tool.
Handling of RNA-Seq reads by the InDels and Structural Variants tool has been improved. This change affects breakpoint p-values and as a result, affects the number of breakpoints and variants reported. In addition, we have improved the calculations of the values reported for the "perfectly mapped" and "not perfectly mapped" breakpoint annotations.
When right-clicking a CDS annotation on a stand-alone sequence, the option "Translate CDS/ORF..." gives a choice between translating using a selected code translation table or by extracting the translation code from the annotation itself if this information is available.
The history information associated with results from the BLAST and BLAST at NCBI tools now includes the version of the BLAST software used for the search.
The Reverse Sequence tool now names the output sequence name with the input sequence name followed by "-R" .
The tool called Replace Selection With Sequence which appears in context menus for sequences in the cloning tool will now be disabled when the sequence is linear but a selection spanning the end to start position has been made. Reasons why sequences cannot be marked as circular or linear are now described more clearly in the tooltip.
For the Gene Set Test tool, the name of the columns "Occurrences in all genes", "Genes (universe)", "Occurrences in subset", "Genes (subset)" have been renamed to "Detected Genes", "Detected Genes (Names)', "DE Genes", "DE Genes (Names)".
For the GO Enrichment Analysis, the name of the columns "Occurrences in all genes", "Genes (universe)", "Occurrences in sample", "Genes (overlap)" have been renamed to "Matched Genes", "Matched Genes (Names)", "Genes with Variations", "Genes with Variations (Names)".
Data created in CLC Genomics Workbench 12.0 will be internally compressed by default. Options are available for exporting data without this compression or for disabling it entirely. A new option allows users to "Export table as currently shown" - including all filter settings and potential additional columns selected using the Side Panel.
Tooltips on data elements in the Navigation Area show the following additional information: type of the element (e.g. Sequence List), file size, and compression status.
On macOS, the standard file browser is now used for browsing files. Previously, a third-party library was used.
A new option allows users to "Export table as currently shown" - including all filter settings and potential additional columns selected using the Side Panel.
A new filtering button in all tables allows users to display in the table view only pre-selected rows.
The 'is in list' table filter now supports tabs as a list separator. This makes it possible to paste rows from Excel into the search field.
The Import tool allows for entering folder paths in the File name field.
It is now easier to reorder items in the Navigation Area. It was not previously possible to change the order of adjacent folders.
When starting the workbench with a "clc://" argument, the requested element is now selected in the Navigation Area.
The Show History View no longer has a restriction on the number of elements shown.
Workbench response times after logging into a CLC Genomics Server have been improved in the situation where many server jobs submitted from the Workbench had completed since the last login.
A message now warns if a bug report fails to reach Support using Help | Contact Support.
Searching in the Navigation Area is now available when using the workbench in Viewing Mode.
The Create installer for workflows dialog has additional fields for specifying information about the workflow's author.
A "Check for updates" functionality is now available from the Help menu.
Improved error message when attempting to save a file that is not the newest copy.
Sequence annotations where the strand is not known are now drawn without an arrow to distinguish them from annotations on the plus strand.
Various minor improvements.
Fixed an issue affecting the Map Reads to Reference tool when it was included in a workflow, where if the References parameter was connected to an input, and a masking track was configured, an error was reported stating that the masking track was incompatible with the reference genome, whether or not it was compatible.
Fixed a bug in the Import Tracks tool where one nucleotide exons would be skipped during import of GTF files. A consequence of this fix means that we do not support the import of UCSC SNPs typed as exons any longer.
Fixed a bug where the "Unaligned end" field provided in the Breakpoint track output of the Indel and Structural Variants tool was left blank when the value should have been "Mixed consensus" on all but one chromosome. The field is now filled for all chromosomes.
Fixed a problem introduced in CLC Genomics Workbench 11.0 where launching a tool from the Quick Launch window after sorting led to the wrong tool being started.
Fixed an issue where the Motif Search updated the history of the input file even when no changes to the input data element were made.
Fixed an issue where it was possible to create an empty alignment editor, causing the Workbench to crash.
Fixed bug that caused import of empty text files to stall.
Fixed an issue found in the History of a result generated by the Extract Annotations tool, that would incorrectly show that a reference sequence track was used when it was not.
Fixed an issue that led to some deletions being reported as multiple, separate deletions instead of a single, larger deletion when affine gap costs were used.
Fixed a very rare bug in the read mapper, where an alignment with a leading unaligned end could get a wrong score.
On Windows 10 and Windows Server 2016, it now runs with 'below normal' as the priority. Previously, it ran with 'normal' priority.
Fixed the links to the AmiGO Gene Ontology website used for GO annotations.
Fixed the links to the HGNC (HUGO Genome Nomenclature Consortium).
On Windows 10 and Windows Server 2016, the underlying program launched when running the Sample Reads tool now runs with 'below normal' as the priority. Previously, it ran with 'normal' priority.
On Windows 10 and Windows Server 2016, the default BLAST database location will now be either 'C:\Users\USERNAME\My Documents\CLCdatabases' or 'C:\Users\USERNAME\Documents\CLCdatabases'. Previously it was 'C:\Users\USERNAME\CLCdatabases'. When upgrading from earlier versions, an existing BLAST database location will not be modified and will continue to work.
On some Windows 10 and Windows Server 2016 systems, the log files, user settings file, and workflow files, normally stored in 'C:\Users\USERNAME\AppData\Roaming\CLC bio\Workbench' were installed in 'C:\Users\USERNAME\Application Data\CLC bio\Workbench'. Now we instead store these files in '%APPDATA%\CLC bio\Workbench'. User settings and workflow files will be automatically copied to the new location if they were previously stored in 'C:\Users\USERNAME\Application Data\CLC bio\Workbench'.
Various minor bug fixes.
Workflows installed on an earlier Workbench version can be installed on a new major release line by copying the installed Workflow in the earlier Workbench version, saving the copy in the Workbench Navigation Area and then opening this copy in the new Workbench version. The workflow can then be installed if desired.
The underlying read mapper and de novo binaries included in the CLC Genomics Workbench 12.0 are from CLC Assembly Cell 5.1.1.
The following tools been moved to the Legacy folder of the Workbench Toolbox:
Download Reference Genome Data: Download of reference data from public repositories such as Ensembl is now available from within the new Reference Data Manager
The Import SOLiD tool has been retired. It was previously in Legacy Tools. As a consequence:
The tools Map Reads to Reference, Map Reads to Contigs, Trim Reads, De Novo Assembly, Extract and Count, and Annotate and Merge no longer have special handling of SOLiD colorspace data. They will continue to work as expected for SOLiD data, but will not make use of color information to correct for phase shifts.
Import | SAM/BAM Mapping Files and Standard Import | Reads from SAM/BAM files no longer allow import of data where colorspace information is provided in the form of CS flags and sequence data is omitted (SEQ = "*") .
Export | SAM, Export | BAM, and Export | Fastq no longer have special handling of SOLiD colorspace data. They will continue to work as expected for SOLiD data, but will not make use of color information to correct for phase shifts.
The *.cas importer found in Import -> Standard Import no longer allows the import of read mappings where SOLiD color information has been used as part of the mapping algorithm.
The Import Tracks tool no longer supports the import of files in Complete Genomics master VAR file format. To import such files, it is necessary to first convert them to VCF using the tools provided by Complete Genomics.
The column "Ignored reads (wrong strand)" has been removed from the "Strand specificity" section of the report produced by the Create Combined RNA-Seq Report tool. The column has been removed to better fit the report's purpose of only providing high-level relevant information.
The "Whole Genome shotgun-reads (wgs)" database has been removed from the BLAST at NCBI tool. Growth in the database means that specialized variants of BLAST are now required for search. More details on these can be found here.
Biomedical Genomics Analysis 1.0 Installing this plugin on a CLC Genomics Workbench provides the functionality formerly available by running a Biomedical Genomics Workbench and installing the now-retired plugin, QIAseq Targeted Panel Analysis.
Bisulfite Sequencing The tools delivered by this plugin have been integrated into the Workbench and can be found in the Toolbox under the folder Epigenomics Analysis | Bisulfite Sequencing.
CLC Workbench Client Plugin The CLC Server Connection item under the File menu has replaced the need for this plugin.
Batch Rename The Batch Rename tool, formerly delivered by this plugin, is now available directly in the Workbench under the Utility Tools folder.
QIAseq Targeted Panel Analysis and QIAGEN GeneRead Panel Analysis Plugin These plugins were formerly available for use on Biomedical Genomics Workbenches only. Their functionality is now available via the new Biomedical Genomics Analysis plugin when installed on a CLC Genomics Workbench.
The following tools will be removed in a future release of the software:
Compare Sample Variant Tracks
Merge Overlapping Pairs
Create Track from Experiment
Identify Differentially Expressed Gene Groups and Pathways
Add Fold Changes
Add Information from Overlapping Genes
Create Fold Change Track
Download Reference Genome Data (The functionality via the Reference Data Manager is unaffected by this.)
The PPfold plugin will retired as of the next major release of the CLC Workbenches and Servers.
If you are concerned about these proposed changes, please contact our Support team by emailing [email protected].