Latest improvements for CLC Genomics Workbench

  Current line         Previous line          Archive

CLC Genomics Workbench 11.0.1

Release date: 2018-03-14

Improvements

  • Implemented the 3' HGVS compliance rule for c. annotation of variants: 
    - When doing c. annotations (DNA-level HGVS) we annotate insertions that really are duplications as such. 
    - For c. annotations we furthermore fulfill the 3' rule for insertions, deletions and duplications. 
    - When determining amino acid changes, the 3' rule is applied to the DNA change first. This may shift a variant in or out of the coding region, and that will affect whether or not we consider it as an amino acid change. 

    The 3' rule for p. annotations were previously fulfilled and are not affected by this fix.

Bug fixes

  • Fixed a bug in the VCF (Variant Calling Format) file format exporter that affected the QUAL score of the variant. Previously, the variant QUAL score was set to be the maximum QUAL score of all alleles (regardless of whether it was a reference allele or not). In some instances, e.g., when there are two alleles and one has poor QUAL score, this choice was suboptimal. Instead, the variant QUAL score is now chosen as the maximum QUAL scores among all non-reference variants.
  • Fixed an issue where the RNA-Seq Analysis tool would show an error if the first chromosome or contig contained no transcripts and the "Calculate expression for genes without transcripts" option was used.
  • Fixed an issue where the RNA-Seq Analysis tool would sometimes generate TE tracks that could not be used in downstream tools. The error occurred when the "Calculate expression for genes without transcripts" option was used on a gene track where two genes had the same name, one of the genes contained the other, and neither gene had a transcript.
  • Fixed an issue with the Trim Reads tool used in a workflow with multiple Trim adapter lists as input: all but the first list input were previously silently ignored, but the workflow now gives users a warning message.
  • Fixed an issue where importing a Trim Adapter List with an adapter with "Discard the read (end matches at 3')" was imported incorrectly. 
  • Fixed an issue that could cause some third party plugins to fail trying to retrieve the fastq exporter.
  • Fixed an issue where domain annotations added by the Pfam Domain Search tool started one amino acid later than expected. The corresponding start position in the table produced by the tool was correct.
  • Fixed an issue with the advanced table filter functionality that prevented the removal of empty entries from columns expected to contain text.
  • Fixed an issue where Excel formatted files (.xls, .xlsx) could not be imported as Trim Adapter Lists.
  • Fixed a license issue causing workbenches to not start properly on Turkish Operating Systems.
  • Fixed issue causing the license assistant dialog and EULA dialog to be too big for smaller screens. 
  • Fixed an issue where weblinks to Uniprot sequences led to the homepage.

Advanced notice

  • SOLiD colorspace data support, including import, will be retired and will not be available in the the next major release of the software.
  • Complete Genomics support, including import, will be retired and will not be available in the the next major release of the software. 
  • Roche 454 NGS import is now a legacy tool. We plan to retain it in the next major release of the software, but it may be retired in a future release.

If you are concerned about the proposed changes, please contact our Support team ([email protected]).



CLC Genomics Workbench 11.0.0

Release date: 2017-11-21

Improvements and new features

  • Trim Reads:
    • The Trim Sequences tool under the NGS Core Tools section of the Toolbox has been renamed to Trim Reads.
    • A new option has been added to the Trim Reads tool: "Automatic read-through adapter trimming". This option makes it possible to automatically identify overlap in paired reads and will trim the region that is not part of that overlap. This option is turned on by default. This new default affects workflows that include Trim Reads (or by its former name: Trim Sequences); the parameter will be turned on and locked by default. 
  • Trimming adaptor:
    • The New Trim Adapter List dialog has been updated to a new and more user-friendly interface. 
    • It is now possible to reverse complement an adapter sequence with a "Reverse Complement" button to the right of the sequence field. 
    • It is now possible to specify whether the trim should be performed on all reads, or only on the first or second read of a pair. 
    • A visual shows the adapter and the sequence being trimmed in relation to the rest of the sequence depending on the option chosen when an adapter is found.
  • Fastq Export
    • Paired sequence lists can now be exported to 2 fastq formatted files, one file containing the first member of each pair, the other containing the second member. This is now the default for Fastq Export when exporting paired data. 
    • The option "Output as single file" is now disabled by default.
    • The introduction of the new default setting "Export paired sequence lists to two files", has the implication that existing workflows that include a fastq export step will be in a state of conflict after they are updated for use on this release. This is because this option is not compatible with the option to "Output as single file", which was turned on by default in earlier versions. Affected workflows must be edited to either remove the option "Export paired sequence lists to two files" or the option "Output as single file". Messaging about this is provided when upgrading affected workflows. 
  • RNA-Seq Analysis:
    • RPKM is now always calculated when running the RNA-Seq Analysis tool with the options "Genome annotated with genes only" and "One reference sequence per transcript".
    • The default for the reference type parameter is now "Genome annotated with genes and transcripts". 
    • In the RNA-Seq Analysis tool, the option "Calculate RPKM for genes without transcripts" has been renamed to "Calculate expression for genes without transcripts".
    • The behavior of the RNA-Seq Analysis tool has been changed when the option “Genome annotated with genes and transcripts” is used together with the option “Calculate expression for genes without transcripts". 
      • The counts of genes without transcripts are calculated. Previously only the TPM and RPKM were calculated.
      • For a gene without a corresponding transcript, where that gene is overlapped by the intron of another gene, reads aligning to this region are counted towards the expression of the gene without the transcript. Previously such reads were counted as belonging to the intronic region of the overlapping gene.
      • A single-exon transcript for each gene without transcripts is now added to the output TE track.

 

  • Workbenches without a license can now be run in Viewing Mode. In this mode, data can be viewed, imported and exported. Plugins needed for viewing certain data types can be installed. Viewing mode, with its added functionalities, replaces Limited Mode.
  • A dialog is now presented on startup if there are installed workflows that need to be updated before they can be run. The information about what to do to when a workflow needs to be updated has been improved.
  • The history of a data element can now be exported as a CSV format file.
  • The Extract Consensus Sequence tool can now be connected in a workflow to many more tools that take nucleotide data as inputs, including the Map Reads to Reference and Map Reads to Contigs tools. 
  • An option to include reads that partially overlap variants has been added to the Identify Known Mutations from Sample Mappings tool, enabling detection of variants that are longer than the reads.
  • The Identify Known Mutations from Sample Mappings tool has been made slightly more strict when handling insertions and replacements, requiring reads to overlap adjacent reference positions to be counted as fully covering the variant.
  • The speed of the Illumina High-Throughput Sequencing Import has been substantially improved. The largest gains are seen on paired read files compressed by gzip with speed improvements of up to 30%.
  • Changed amino acids colors to better suit users with various forms of color blindness. 
  • The Download Pfam Database tool now downloads version 31. Updates can now be made independently of the release of CLC Genomics Workbench, so the version available for download could change over time from the one recorded here.
  • In table views, it is now possible to filter columns with the filters "Is in list" and "Is not in list" when the values are numbers.
  • When exporting files to SAM or BAM format files, information is now entered into the optional fields NM (edit distance) and MD (mismatch string).
  • The filter terms for the Identify Candidate Variants tool now include the numeric operators  '>=', '<=', 'abs value >=' and 'abs value <='.
  • Importing a GO annotation file with the Standard Import tool, specifying the format "Generic annotation file for expression data", now fails with an informative warning if any of the GO annotations are truncated.
  • Warnings are now reported if truncated GO annotations are found when opening data created by the Create Expression Browser tool.
  • The 'Expression Browser Table' (output from the  Create Expression Browser tool) now preserves sorting when changing the grouping, if sorting is not on any of the grouped columns. 
  • NCBI blast executables are upgraded to version 2.6.0.
  • All wizard steps are now shown in the wizard sidebar when starting a tool or workflow. 
  • Visualization of features that wrap around the origin of circular sequences has been improved for sequences and tracks.
  • Table filtering and search now interpret thousands and decimal separators in the same manner as the displayed table. Previously US punctuation was always used. This change means that if a table displays numbers in the form "123.456,7" then it is possible to find numbers less than ten by searching for "< 10,0" or "<10", but not "<10.0". If the table displays numbers in the form "123,456.7" then it is possible to find numbers less than ten by searching for "<10.0" and "<10", but not "<10,0".
  • When a tool is disabled in a right-click context menu, hovering the mouse over the tool name will now reveal why a tool was disabled in most cases.
  • The help window can now be closed by pressing the escape key.
  • The Download Reference Genome Data tool now downloads genome annotations from GFF3 files instead of previously as GTF files. Genome annotations for Homo sapiens versions hg18 and hg19 are still downloaded as GTF files, as these are not available as GFF3 files. 
  • HTML formatting tags are now removed during export of data to Excel .xlsx or .xls format. This change does not affect the export of hyperlinks.
  • This history information for data generated using the Identify Candidate Variants tool now  includes a match criteria field, recording if the option 'match all' or 'match any' was used.
  • For Reads tracks, the side panel option "Highlight reverse paired reads" is now enabled by default. 
  • For stand-alone read mappings, read pairs with reverse orientation are now highlighted with a lighter blue color. This is identical to the 'Highlight reverse paired reads' option for reads tracks.
  • Parameters for the Trim Sequences tool are now shown in the same order when running the tool from the Toolbox or within a workflow.
  • The column headings in the table containing statistics for each mapping, optionally produced by the Create Detailed Mapping Report tool, have been made more descriptive.
  • The Search for Reads in SRA tool  now reports in the top left corner the number of rows being displayed.
  • Communication of error messages from the NCBI when running the Search for Reads in SRA tool has been improved.
  • Map Reads to Reference now outputs an empty read mapping and report when the input contains 0 reads.
  • A warning message is now presented when the tool Extract Sequences is run with the "Extract to single sequences option" selected and more than 100 sequences would result.

Changes

  • The Roche 454 and SOLiD Import tools have been moved to the Legacy folder of the Workbench Toolbox.
  • The option "Search on both strands"  has been removed in the Trim Reads tool (formerly named Trim Sequences) and the Extract and Count tool.
  • The Search for Sequences at NCBI tool now uses accession.version identifiers instead of GI numbers, as GI numbers are being phased out by NCBI (see https://ncbiinsights.ncbi.nlm.nih.gov/2016/07/15/ncbi-is-phasing-out-sequence-gis-heres-what-you-need-to-know/. )
  • The Create Mapping Graph tool has been modified so that the coverage of overlapping paired end reads is now only counted as one in the overlapping region, instead of two as done previously. 
  • Removed the line "Total consensus length" from Detailed Mapping Report when using a Read Mapping Track as input, as these tracks do not contain consensus information.
  • Clicking "Select genes in other views" in a Volcano Plot with an empty selection no longer gives an error message.
  • The SAM and BAM Mapping Files importer now fails if there are reads with more than one primary alignment where both are marked as being the first in a pair or both are marked as being second in a pair.
  • Scrolling in a table now scrolls a fixed number of pixels, and not a fixed number of rows or columns.
  • The  Extract Consensus Sequence tool can no longer process protein BLAST results.
  •  The "Adapter trimming" section of the Workbench Preferences has been removed. This section supported functionality that was already retired.
  • The "Help" and "Reset" buttons in pop-up dialogs are now buttons with text labels. They were previously buttons with icons.
  • The GCG sequence exporter has been removed. The GCG alignment exporter is unaffected by these changes.

Bug fixes

  • Fixed an issue with the Create Statistics for Target Region tool where "GC %" was reported as a ratio. It is now reported as a percentage.
  • Fixed an issue where paired distances were calculated incorrectly for paired reads in Forward-Reverse orientation where there is adapter read-through. Paired distances can be seen in the report from the Map Reads to Reference tool and the RNA-Seq Analysis tool. The paired distance calculation is also used by the "auto-detect paired distances" option in these tools, although this issue is unlikely to affect the inferred distances.
  • Fixed an issue with the Amino Acid Changes tool when used with a circular sequence with a CDS annotation placed across the origin. Variants outside such a wrapped annotation could previously be incorrectly annotated with coding region changes.
  • Fixed an issue with the Amino Acid Changes tool when used with a circular sequence with an intron across the origin. Previously, nearby variants were not annotated with coding region changes. Now, variants in such introns and that are within 2 nucleotides of the nearest exon will be annotated with coding region changes, if such changes are identified.
  • Fixed a bug where the Amino Acid Changes tool would in some cases use the CDS reference instead of the RNA reference for annotating coding region changes. This would happen if the RNA and CDS annotations could not be matched, and it could cause variants in UTR regions to not be reported. The matching has now been improved by supporting the 'parent' field used by the GFF3 file format to pair CDS and RNA references.
  • Fixed a bug in the RNA-Seq Analysis tool where, when run in "Genes and transcripts" mode, and using "Total counts" as Expression value, the expression values reported for GE tracks would not include shared exon counts. Downstream analyses based on the Set Up Experiment tool could be affected by this issue. Using affected GE tracks as input to the following tools would *not* affect their results: Differential Expression for RNA-Seq, Create Heat Map for RNA-Seq and PCA for RNA-Seq.
  • Fixed an issue where the option to run the Differential Expression for RNA-Seq tool in batch mode was made available, leading to an error if it was selected.
  • Fixed an issue where it was possible to start the Create Heat Map for RNA-Seq tool with invalid parameters that would cause the tool to fail.
  • Fixed an issue where the number of input samples to the Map Reads to References and Map Reads to Contigs tools would be silently limited to 120. The execution is now aborted with a warning message. Each analysis must be started with 120 samples maximum. 
  • Fixed an issue with the mapping tool in the Workbench, which is used in tools involving a mapping stage, such as  Map Reads to References,  Map Reads to Contigs and RNA-Seq Analysis, where length and similarity fraction cut-offs in some cases were ignored for reads longer than 500bp.
  • Fixed an issue with the InDels and Structural Variants that caused it to crash if it encountered a particular set of conditions relating to reads with deletions.
  • Fixed an issues with the InDels and Structural Variants tool duplicate breakpoints and variants were reported if reads mapping as broken pairs were included in the analysis.
  • Fixed an issue where filtering a log for a job that was still running would result in error dialogs.
  • Fixed an issue that had previously prevented configuration of the export option "Output as single file" in workflows.
  • Fixed an issue where data exported with gzip or zip compression did not have the .gz or .zip suffix appended to the filename when earlier exports had been made with the same name and export location specified.
  • An issue has been fixed so that it is now possible to export in BAM format reads that contain synonyms, for instance 'X' as synonym for 'N'. 
  • Fixed bug which caused the fasta exporter to fail when exporting read mappings where one or more reference sequences have no reads mapped to it.
  • Fixed an issue that could cause exports of reports with line graphs to fail. 
  • Fixed an issue where resetting the default parameter values when configuring the  Identify Candidate Variants tool did not work.
  • Fixed an issue that would prevent the Trim Sequences tool being run with certain length filter settings.
  • Fixed an issue where the option to "Highlight reverse paired reads" in the side panel of a reads track would cause paired end reads to be colored incorrectly if the reads completely overlapped, as would happen in the case of adapter read-through.
  • Fixed a bug where a cell containing multiple hyperlinked URLs caused export to Excel 2010 or Excel 97-2007 format to fail. Such cell contents are now written in plain text.
  • Contigs with Gap annotations covering regions longer than 10 bp can now be successfully exported to AGP format. Sequences containing such gaps will be split into separate contigs on export. This issue will be particularly of interest to those using the Join Contigs tool of the CLC Genome Finishing Module.
  • Fixed an issue where the Low Frequency Variant Detection tool could return NaN for the Probability value in rare instances for small datasets.
  • Improved performance for several tools when handling genomes with many chromosomes. Examples include Annotate with Overlap Information, the BED Exporter, Filter Annotations On Name, and Motif Search.

Plugin Notes

  • Licenses for commercial modules are no longer required to install a module on a Workbench nor to view data generated by tools of a commercial module.
  • The flexibility associated with network module licenses has been improved. Workbench module licenses provided via a CLC License Server are now initially loaded only when a tool provided by that module is launched. Such licenses are returned when 4 hours lapses since the last module tool was launched from that Workbench.

Advanced notice

  • SOLiD colorspace data support, including import, will be retired and will not be available in the the next major release of the software.
  • Roche 454 NGS import has been moved to the Legacy Tools folder and will be removed in a future release, but will still be available in the next major release of the software.
  • If you are concerned about the proposed changes, please contact our Support team ([email protected]).



    © QIAGEN 2017. All rights reserved - Trademarks & Disclaimers