
biowdl-input-converter converts human-readable samplesheets into a format that can be easily processed by BioWDL pipelines.

For more information on BioWDL check out the documentation on


  • Create a new virtualenv
  • run pip install biowdl-input-converter


Parse samplesheets for BioWDL pipelines.

usage: biowdl-input-converter [-h] [-o OUTPUT] [--validate] [--old]
                              [--skip-file-check] [--check-file-md5sums]

Positional Arguments

samplesheet The input samplesheet. Format will be automatically detected.

Named Arguments

-o, --output The output file to which the json is written. Default: stdout

Do not generate output but only validate the samplesheet.

Default: False


Output old style JSON as used in BioWDL germline-DNA and RNA-seq version 1 pipelines

Default: False


Skip the checking if files in the samplesheet are present.

Default: False


Do a md5sum check for reads which have md5sums added in the samplesheet.

Default: False


A samplesheet provides information about fastq files.

  • Sample name
  • Library name (for each sample usually one library is used to prepare the sample for sequencing)
  • Readgroup name (which lane on the sequencer was used)
  • Location of the fastq file containing forward reads (R1) on the filesystem
  • Forward reads fastq (R1) md5sum
  • Location of the fastq file containing reverse reads (R2) on the filesystem
  • Reverse reads fastq (R2) md5sum
  • additional properties (if necessary)

CSV/TSV Format

A samplesheet can be a comma- or tab-delimited file. An example looks like this


The md5sum fields and the R2 field are optional and can be empty:


The R1_md5, R2 and R2_md5 columns are optional and can be left out entirely.


Additional properties at the sample level can be set using additional columns:


Additional properties for the same sample only have to be defined in one line. This saves a lot of duplication for samples with a high readgroup or library count an makes it easier to read the file.


If an additional column is filled with two conflicting values for the same sample an error will be thrown.

Creating comma-delimited files

These files can be easily generated using a spreadsheet program (such as Microsoft Excel or LibreOffice Calc).

Create a table:

sample library readgroup R1 R1_md5 R2 R2_md5 HiSeq4000 other_property
s1 lib1 rg1 r1_1.fq 181a657e3f9c3cde2d3bb14ee7e894a3 r1_2.fq   yes pizza
s2 lib1 rg1 r2_1.fq   r2_2.fq dc2776dc3a07c4f468455bae1a8ff872 no  


Optional fields can be left blank.

And save the table as CSV or TSV format from your spreadsheet program.

YAML format

Alternatively a YAML format can be used

    - id: s1
        - id: lib1
            - id: rg1
                R1: r1_1.fq
                R1_md5: 181a657e3f9c3cde2d3bb14ee7e894a3
                R2: r1_2.fq
                R2_md5: ebe473b62926dcf6b38548851715820e
    - id: s2
        - id: lib1
            - id: rg1
                R1: r2_1.fq
                R1_md5: 7e79b87d95573b06ff2c5e49508e9dbf
                R2: r2_2.fq
                R2_md5: dc2776dc3a07c4f468455bae1a8ff872

Optional fields can be omitted and extra properties can be added:

    - id: s1
      HiSeq4000: no
        - id: lib1
            - id: rg1
                R1: r1_1.fq
                R1_md5: 181a657e3f9c3cde2d3bb14ee7e894a3
                R2: r1_2.fq
    - id: s2
      HiSeq4000: yes
        - id: lib1
            - id: rg1
                R1: r2_1.fq
                R2: r2_2.fq



  • Bugfix: R1_md5 and R2_md5 columns are not required to be defined anymore in a csv file.


  • Make sure only one line of additional properties per sample is need in a csv file.
  • Fix a bug where an empty field for an additional property in a csv samplesheet would be defined as "" instead of None.


  • Added documentation and readthedocs page
  • Added changelog and release procedures
  • Added test suite with coverage metrics, enabled CI
  • Add validate flag to allow users to validate files
  • Added command line interface with ability to write to stdout and files
  • Added ability to check files for presence and md5sum checking
  • Added sample group -> old style JSON/YAML conversion
  • Added sample group -> new style JSON/YAML conversion
  • Added yaml -> sample group conversion
  • Reworked csv conversion by @DavyCats to fit the new sample group structure
  • Added sample group structure to enable any-to-any conversions