QIAGEN powered by

Latest improvements for CLC Genomics Workbench early access

  Current line         Previous line          Archive

CLC Genomics Workbench early access 20.0

Release date: 2019-11-19

These are the draft release notes for CLC Genomics Workbench 20.0, due for release on December 11, 2019. The draft manual is available in PDF and HTML format. Installers for this product are available as "early access" via links at the bottom of this page. These products are not supported, and we recommend they are not used in production during the early access period. To download a commercial license for this product, you must have a license covered for Maintenance, Upgrades and Support (MUS). A 2 week evaluation license is available via the License Manager within the software.

New features

Workflows
  • NGS sequence data can be imported on the fly, as an initial action when a workflow is run, avoiding the need import the data prior to launching the workflow.
  • When launching workflows, batch units can now be defined using metadata, supplied either as a CLC medata table or by selecting an Excel format file containing information about the data.
  • Workflows with multiple inputs, where those inputs should be matched with each other, can now be launched in batch mode, making use of the ability to define batch units based on metadata. For example, a workflow where sets of reads should be mapped to different reference sequences can now be launched in batch mode.
  • Two new workflow elements have been introduced, Iterate and Collect and Distribute, which allow workflows to be designed where the execution of different parts of a workflow can be finely controlled. For example, using these elements, a single workflow can contain an RNA-Seq analysis step, typically run once per sample, as well as a Differential Expression for RNA-Seq step, typically run once for a set of samples. Similarly, a single workflow can be designed to run batches of trio analyses, producing cohort-level reports as outputs.
  • Workflows now produce a Workflow Result Metadata table, which contain one row per output, with the relevant data element associated with that row. When launched in batch mode, the batch the row relates to is clearly indicated.
Epigenomics analysis
  • Tools for detecting peaks in sequencing data are now available from a new 'Advanced Peak Shape Tools' folder found in the Epigenomics Analysis folder of the Toolbox:
  • A tool for detecting evidence for histone acetylation marks in genes or other predefined genomic regions is now in the Epigenomics Analysis folder of the Toolbox:
    • Histone ChIP-Seq This tool was formerly available via the Histone ChIP-Seq plugin.
miRNA analysis (small RNA)
Protein structure and homology
  • Generate Biomolecule A new tool available from the side panel of Molecule Projects allowing biomolecules to be generated or extracted based on symmetry information in PDB files.
  • Find and Model Structure A new tool that finds suitable protein structures for representing a given protein sequence. From the resulting table, a structure model (homology model) of the sequence can be created by one click using one of the found protein structures as template.
  • Molecule structures in a Molecule Project can now be exported to a PDB format file.
  • Search for PDB Structures at NCBI is now available when running the Workbench in Viewing Mode.
Import and export
  • Reports can now be exported in JSON format and in PDF format.
  • A new option in the Illumina importer, "Join reads from different lanes", will when enabled merge fastq files from the same sequencing run but from different lanes into a single sequences list.
  • It is now possible to select input files from multiple folders when importing high-throughput sequencing data.
  • Create Expression Browser can now use tables imported from CSV or Excel format files as an annotation resource. Using such tables, sort and filtering can be done according to numeric annotation values as well as textual annotations.
  • When exporting to PDF, there is now an option to export the history of the report.
Other new tools
  • Combine Reports Summarizes information from multiple reports and produces a single report. It can be used for combining different report types for a single sample, or combining reports for a set of samples.
  • Create Variant Track Statistics Report Creates a summary report for different types of variants in variant tracks.
RNA-seq Analysis
  • A new option, "Library type setting", in the RNA-Seq Analysis tool offers the selection of "Bulk", for analysis of samples where reads are expected to be uniformly distributed across the full length of transcripts , or "3' sequencing", which tailors the output and report quality control for samples generated using low input 3' sequencing applications. "Bulk" is the default, and corresponds to the behavior of the tool in previous software versions.
  • The definition of "Maximum number of hits for a read" in the RNA-Seq Analysis tool has been simplified. It now refers to the number of distinct places on the reference that a read maps best to. Previously, a more complex definition was used, involving checking for matches against genes and then against intergenic regions, with rules applied to the results.
  • The report generated by RNA-Seq Analysis now includes the percentage of reads mapped to transcripts of particular length ranges, aiding the interpretation of the "Coverage along normalized transcript length" graph.
Trim Reads
  • Sequences can now be trimmed to a fixed length from either the 3' or the 5' end.
  • New options have been added to allow homopolymer trimming to be finely tuned.
  • If "Trim ambiguous nucleotides" is enabled, ambiguous characters (e.g. N) at the end of sequences are removed, even if the number of these characters is lower than the limit set. Previously, such characters were left in place if their number was lower than the limit.
  • When included in a workflow, Trim Reads now always produces an output when an output element is connected to it. This includes the following situations:
    • Where no reads have been trimmed (either because all trimming options were deselected, or because none of the trim options matched any of the reads). In this case, the "Trimmed sequences" output will contain all input reads, "Discarded sequences" will be empty, and "Percentage trimmed" will be 100% in the report.
    • Where all reads have been trimmed. In this case, the "Discarded sequences" output will contain all input reads, "Trimmed sequences" will be empty, and "Percentage trimmed" will be 0%.
BLAST
  • A new option for the BLAST tool called Filter out redundant results, will when enabled cull HSPs on a per subject sequence basis by removing HSPs that are completely enveloped by another HSP.
  • The NCBI blast executables have been updated to version 2.9.0.
  • The option "Choose filter to mask low complexity regions" has been renamed to "Mask low complexity regions".
New options in other analysis tools
  • A new option for Local Realignment called "Allow guidance insertion mismatches" allows reads to be realigned using guidance insertions that have mismatches relative to the read sequences. This option is enabled by default.
  • The creation of reads tracks (mappings) is now optional in the RNA-Seq Analysis tool.
  • A new option in Copy Number Variant Detection (CNVs) called "Merge overlapping targets" allows overlapping target regions to be merged into one larger target region. CNV calls are made on this larger region.
  • A new option called "Report unmethylated cytosines" is available for the Call Methylation Levels tool. When enabled, methylation levels are reported for all sites with read coverage, rather than only for sites with methylated cytosines.
  • Two new options in Create Mapping Graph are available for generating coverage tracks for reads that mapped best to a single location on the reference sequence: "Specific read coverage" and "Paired read specific coverage".

Improvements

Workflows
  • All installed workflows can now be updated in a single operation from the Workflow Manager using the new Update All Workflows button.
  • Placeholder-based naming of outputs in workflows can now be configured at a finer level: the {input} or {2} placeholder is now replaced by the name of the first workflow input by default. This can then be further configured to use the names of other inputs by specifying them by number after a colon in the placeholder. For example: {2:1,3} would be replaced by the names of workflow inputs number 1 and 3. Previously, a workflow output configured as {2} was replaced by a concatenation of all the workflow input names.
  • The listing of items in the "Add Element" dialog in the Workflow Editor has been made improved. Installed workflows in the workbench, and no longer matches texts in the tooltips of the tools.
  • When running a workflow configured to use reference data, the Reference Data Set selection step has been updated to show the list of preconfigured elements in the tooltip.
  • The "Export to PDF" tool can now be used in workflows to export reports in PDF format.
Performance improvements:
  • Mapping of NGS reads on multicore systems is now approximately 25% faster. Tools benefiting from this improvement include Map Reads to References, Map Reads to Contigs and Map Bisulfite Reads to Reference.
  • Saving analysis results to an SSD is now considerably faster.
  • The import of ZIP files has been improved: temporary objects are cleaned up during the import process, reducing the required disk space.
  • Moving and deleting many elements at once is now faster.
  • Emptying the Recycle bin now takes place in the background.
  • Messages from tools are no longer presented in the form of black bubbles in the Processes area. Messages are still writtent to the log.
  • Basic Variant Detection, Fixed Ploidy Variant Detection, and Low Frequency Variant Detection have been optimized to work on machines with lower memory. The changes are most noticeable in situations where coverage is high or where many variants are called.
  • Improved memory handling when working with read mappings with very high coverage.
  • There are general performance improvements in the following areas:
    • The Navigation Area
    • BLAST and Add attB Sites tools when using large sequence lists
    • Opening large protein sequences
    • Making BLAST databases where most sequences have the same name.
Demultiplex Reads
  • Sequences with a single mismatch to a barcode and that can be grouped unambiguously can now be demultiplexed.
  • Demultiplex Reads is now multithreaded for faster execution.
  • The percentage or reads in each group is now reported to one decimal place in the report.
  • The percentage of reads not grouped is no longer included in the "Reads per barcode" plot in the report.
  • Various other minor improvements
QC reports
  • Plots in the "Per-base analysis" section of the graphical report produced by QC for Sequencing Reads no longer include a value for base position 0. Values at position 0 in these plots previously were not meaningful.
  • Base position numbering now starts with 1 in the coverage table of the supplementary report produced by the QC for Sequencing Reads tool. Previously the base position numbers started at 0.
  • The reports produced by the QC for Read Mapping and QC for Targeted Sequencing tools now also include the median coverage.
  • The coverage report generated by QC for Targeted Sequencing now includes the total length of target region positions with coverage below the specified level.
  • QC for Target Sequencing has been updated to:
    • Count circular reads (these were previously ignored)
    • Show relevant warning messages when the target region track contains regions that overlap or that cross the origin, both when the tool is ran, but also in the created report.
    • Report the correct number of mapped bases and specificity in Table 2.2, even when the target region track contains overlapping regions.
Tracks and track lists
  • When hovering over a position in a Reads track that is shown in non-aggregated view, a tooltip appears showing the read counts for each observed nucleotide in that position, together with the directions of the reads that contain that nucleotide.
  • When opening a track list, the first variant track is no longer opened if it is already open in an editor.
  • In a track view or track list view, the "Location" field in the side panel now accepts ranges that include spaces. For example, "X: 70,832,863 - 70,842,697". Previously, spaces were not supported.
Import and export
  • When importing BED files using the Import Tracks tool, only the first three columns (chromosome, start and end positions) are now required to match UCSC specifications for the BED format. Remaining columns that do not match these requirements will be imported as Var1, Var2, etc.
  • The CSV importer has been updated:
    • Values no longer need to be enclosed in quotation marks in the CSV file to be successfully imported.
    • Data values starting with a numeric character but also containing non-numeric characters are now interpreted as text. Such values were previously converted to numbers and then only imported up to the first non-numeric character.
  • The import of Nexus files has been updated to more closely match the format specifications.
  • When selecting files to import from an import/export directory via a CLC Workbench, right-clicking on a folder name now brings up a menu with the options: "Add the content of a directory" or "Add the full content (recursively) of a directory".
  • The "Excel 2010" and "Excel 97-2007" exporters now export NaN and +/-Infinity values to #N/A.
  • When importing multiple files using the Standard Import, the process ends with an error if at least one of the files failed to import. The details of which file failed and why can be seen in the log.
  • The GenBank exporter now replaces any spaces in annotation names with underscores.
Searching
Metadata related
Create Box Plot
  • Create Box Plot now calculates the median and percentile values in the same way as the "quantile" method in R. This aligns with the way these values are calculated by other tools in the CLC Genomics Workbench.
  • Whiskers of boxplots now range from the lowest data point within 1.5 times the inter quartile range (IQR) of the lower quartile and the highest data point within 1.5 IQR of the upper quartile. Previously, they extended 1.5 times the length of the box (IQR).
Improvements to other analysis tools
  • The algorithm used to auto-detect paired distances when mapping NGS reads has been improved. Tools benefiting from this include Map Reads to References, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
  • Improvements to SRA download:
    • The temporary disk space needed to download data has been reduced significantly.
    • Technical reads are now discarded.
    • Orphan reads are now put into a separate output for paired data.
  • When importing multiple files containing sequencing reads (QIAGEN GeneReader, Illumina, PacBio, Fasta Read, Ion Torrent) or when importing SAM format files, a single problematic file does not stop the import. The import process now continues with the next file if it encounters a file that could not be imported.
  • The "Chromosome coverages" section in the results report produced by the Copy Number Variant Detection (CNVs) tool is now a table.
  • The "CPM" expression option in the side panel setting of the Expression Browser has been renamed "CPM (TMM-adjusted)" to reflect how it is calculated.
  • The TMM Normalization used in the Expression Browser and in Create Heat Map for RNA-Seq, PCA for RNA-Seq, Differential Expression in Two Groups, and Differential Expression for RNA-Seq, has been changed. This change involves how a reference column is selected for TMM Normalization. It is unlikely to lead to noticeable differences in results. Changes are most likely to occur in situations where the majority of transcripts/genes have zero expression.
  • The "All group pairs" and "Across groups (ANOVA-like)" comparisons in Differential Expression for RNA-Seq now compare expressions in the same direction. Previously, the fold changes reported by these 2 tests for the same data, entered in an identical order, had opposite signs.
  • The long form of the HGVS nomenclature for DNA is now used by the Amino Acid Changes tool for annotating coding region changes: the bases of deletions and duplications not longer than 50 nucleotides are included, and repeated sequences are reported using the insertion form.
  • Exon information added by Annotate with Exon Numbers now includes a blank entry if a variant is located in an intronic region. For locations with multiple isoforms annotated, this gives a one to one relationship between the number of exon annotations and the number of isoforms.
  • InDels and Structural Variants now consistently assigns a count of 1 for a paired read, leading to improved statistics. Previously, regions where the R1 and R2 reads overlapped were assigned a count of 2.
  • Filter Variants on Custom Criteria now prints a message to the log if any columns specified in the criteria are not present in the data.
  • The QC for Targeted Sequencing tool now sets the direction of each read in a pair independently, which can lead to more accurate forward and reverse coverage values in some situations.
  • Identify Shared Variants now reports homozygous sample frequency, heterozygous sample frequency and mean allele frequency.
Other improvements
  • Outputs of tools provided by plugins now include the plugin name and version in the element history.
  • A new option when right-clicking on a table cell, Edit | Copy Cell, allows individual cells to be copied to the clipboard. Previously only whole rows could be copied
  • Tool and workflow logs now display an "Elapsed time" column.
  • In the tree view of phylogenetic trees, the "Reset Tree Toplogy" button will now also uncollapse any collapsed nodes.
  • The name of a non-default workspace is now shown in the Workbench title bar.
  • The table view ("Show Table") of plots has been improved in the cases where multiple data series are shown in the plot. The table now includes all of the x values from all data series, instead of the x values from just the first data series. If a data series is missing a y value for a specific x value, than the entry in the table will be empty.
  • The maximum size of a plot in a report displayed in the Workbench has been increased too 800 pixels, and the width/height ratio has been changed from 2/3 to 1/2.
  • The ranking of search results in Quick Launch has been improved.
  • CLC URLs have been made more compact.
  • The icon for the sequence view has been changed for protein sequences, so it is possible to distinguish protein sequence views from nucleotide sequence views based on the icon.
  • In the Reference Data Management dialog, the "usable free space" is shown instead of the previous "free space".
  • In the Batch Rename tool, the option 'Replace part of the name' fields have changed from 'From' and 'To' to 'Replace' and 'With' for clarification.

Bug fixes

  • Fixed a rare issue that could cause some jobs to fail when multiple instances of Filter Variants on Custom Criteria were run simultaneously.
  • Fixed an issue causing the file chooser dialog on Windows systems to freeze when selecting bzip2 format files for import.
  • Fixed a bug where failing import of Illumina .fastq files could leave files in the temporary files directory.
  • Fixed a bug where the track views and table views of Statistical Comparison tracks did not synchronize to show the same genomic location when an annotation was selected in one of the views.
  • Fixed an issue in the "Duplicated sequences" section of the QC for Sequencing Reads graphical report, where the relative sequence count for the duplicate count of 100 was incorrectly reported in the field for the duplicate count of 99.
  • Fixed an issue affecting mac OS X setups with accessibility settings enabled, where the "Replace Selection with Sequence" functionality available from within the Cloning editor could fail with an error.
  • Fixed a bug where workflow installer files did not include the specified icon.
  • Fixed a bug where Expression Browsers could not be displayed or exported if they contained GO annotation values that included parentheses but no database reference.
  • Fixed an issue in RNA-Seq Analysis where an error message was produced if the value entered for "Minimum read count fusion gene table" was 1 and no fusions were found.
  • Fixed an issue where multiple target tracks could be selected when running QC for Targeted Sequencing in a workflow context. If done, only the first target track selected was used. Now, only one target region can be selected.
  • Fixed an issue in Copy Number Variant Detection (CNVs) algorithm reports, where values in the "Start BIC" and "End BIC" columns in section 3.1 were truncated to a maximum of 4 digits in the integer part. The underlying calculations were not affected.
  • Fixed an issue where the Gene Set Test tool did not exclude relevant GO terms as computationally inferred if there were parentheses in the GO annotation description.
  • Fixed an issue in the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools that in rare cases could result in the QUAL value reported being slightly different between runs.
  • Fixed an issue in Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection where the tool could continue to use the CPU and write to disk even after a job was cancelled.
  • Fixed an issue in the GFF2 importer with different representation of stop codons in the CDS regions due to differences in input formats.
  • Fixed an issue in Map Reads to Reference where the summary statistics table in the report did not include paired read statistics for mappings with paired end reads if no reads were mapped in intact pairs.
  • Various minor bugfixes

Changes

  • The Java version bundled with CLC Genomics Workbench 20.0 is Java 11, where we use the JRE from AdoptOpenJDK.
  • Using Local Search, searches for sequences with a specific length or length range now only returns individual sequence elements that meet the search requirements. Previously searches were also done within other types of elements, e.g. sequences lists, read mappings, etc.
  • The "Create index" option of the BAM exporter can no longer be used in combination with zip compression or choosing to output the results as a single file.
  • Options relating to import of paired reads have been removed from the Ion Torrent importer.
  • The tool Remove Orphan Reference Variants is now called Remove Homozygous Reference Variants.
  • Reads tracks (mappings) are no longer generated by the RNA-Seq Analysis tool by default. Enable the new "Create reads track" output option when launching the tool if reads tracks should be created.
  • Some folders in the Toolbox have been renamed and reorganized:
    • The RNA-Seq Analysis folder is now called RNA-Seq and Small RNA Analysis, and this folder contains tools for both these areas of analysis.
    • The Microarray and Small RNA Analysis folder is now called Microarray Analysis and contains tools relevant to that area of analysis.
    • The Quality Control folder is now just above the Resequencing Analysis folder. Previously it was within the Resequencing Analysis folder.
  • The Help -> Tutorials menu item has been replaced with Help -> Online Tutorials, which opens the online tutorials in a browser.
  • The naming of some outputs from some tools have been updated:
    • Demultiplex Reads
      • Grouped reads Now: <sample name> <Barcode name> Previously: <Barcode name>
      • Ungrouped reasds Now: <sample name> Not grouped Previously: Not grouped
      • Report Now: <sample name> Demultiplex Reads report Previously: Demultiplex Reads report
      • Where multiple sequence lists are provided as input, the name of the first selected sample is used as the sample name.
    • Trim Reads
      • Trimmed, paired sequences Now: <sample name> (paired, trimmed pairs) Previously: <sample name> (paired) trimmed (paired)
      • Trimmed, broken pairs Now: <sample name> (paired, trimmed orphans) Previously: <sample name> (paired, trimmed orphans)
      • Discarded sequences Now: <sample name> (discarded) Previously: <sample name> (discarded)
      • Report Now: <sample name> report Previously: <sample name>(trim report)
      • In the case where multiple sequence lists were provided as input to the Trim Reads, the name of the first selected sample will be used in the output.
    • RNA-Seq analysis tool and Map Reads to Reference
      • Output names have been shortened: the content of the last set of parentheses of the input name is replaced by in the output name with a new tag denoting the specific type of output. Previously, tags were added to the input names when forming the output name.
      • The word "un-mapped" has been replaced with "unmapped" in output names.
      • When unmapped reads outputs are added to metadata tables the inputs are associated with, they are are now assigned the metadata role "Unmapped reads".
  • The following have been moved to the Legacy folder of the Workbench Toolbox and "(legacy") appended to their names. They will be removed in a future version of the software.
    • Create Combined RNA-Seq Report: The new Combine Reports tool includes this functionality, and should be used to combine RNA-Seq reports.
    • Create Track from Experiment
    • Remove Reference Variants
    • Reverse Sequence
    • The Small RNA Analysis folder, containing the following tools:
      • Extract and Count
      • Annotate and Merge Counts
      • Download miRBase
    • Remove Reference Variants The functionality of this tool can be replicated using "Filter Variants on Custom Criteria" with relevant criteria. To remove reference variants where the alternate allele has already been filtered away, use the new tool Remove Homozygous Reference Variants.

Functionality retirement

The following tools have been retired:
  • Identify Differentially Expressed Gene Groups and Pathways (legacy)
  • Add Fold Changes (legacy)
  • Add Information from Overlapping Genes (legacy)
  • Create Fold Change Track (legacy)
  • Download Reference Genome Data (legacy)
The import of the following formats is no longer supported:
  • qseq
  • scarf

Plugin notes

New plugins

  • Navigation Tools Provides the functionality formerly provided by the Bookmark Navigator and Recent Items Navigator plugins.
  • SignalP and TMHMM Provides the functionality formerly provided by the SignalP and TMHMM plugins

Plugin retirements

Functionality of the following plugins has been integrated into the CLC Genomics Workbench and can be found under the Epigenomics Analysis area of the Toolbox:
  • Histone CHIP-Seq
  • Advanced Peak Shape Tools
The following plugins have been retired, with their functionality being provided by a new plugin:
  • Bookmark Navigator
  • Recent Items Navigator
  • SignalP
  • TMHMM
The following plugins have been retired and their functionality is no longer available through the CLC Genomics Workbench:
  • PPfold
  • TRANSFAC

Advanced notice

The following will be removed in a future release of the software:
  • Create Combined RNA-Seq Report (legacy)
  • Create Track from Experiment (legacy)
  • Remove Reference Variants (legacy)
  • Reverse Sequence (legacy)
  • Roche 454 NGS import (legacy)
  • Tools under the Small RNA Analysis (legacy) folder:
    • Extract and Count (legacy)
    • Download miRBase (legacy)
    • Annotate and Merge Counts (legacy)
The "Run in Batch Mode..." functionality for installed workflows with multiple inputs will be retired in a future release. Workflows with multiple inputs can now be launched in batch mode by checking the "Batch" checkbox when selecting the input data. If you are concerned about these proposed changes, please contact our Support team by emailing [email protected].

Early Access installers

These products are not supported, and we recommend they are not used in production during the early access period.