*** This site is under construction ***

logo8e TwistFlex program



General information:

The TwistFlex program analyzes DNA flexibility at the twist angle.
The user inserts a sequence and the analysis is done in overlapping windows along the sequence. The flexibility is calculated based on the following table of angles, adopted from:
Sarai et al., Biochemistry 28:7842-7849 (1989).

I+1 I A C G T
A 7.6 10.9 8.8 12.5
C 14.6 7.2 11.1 8.8
G 8.2 8.9 7.2 10.9
T 25 8.2 14.6 7.6

The TwistFlex program and this web page were written by Neta Ben-Porat, Eitan Zlotorynski and Prof. Batsheva Kerem.

All default values are set according to: Mishmar et al., PNAS 95:8141-8146 (1998) and: Mishmar et al., Am J Hum Genet 64(3):908-910 (1999).

The TwistFlex program is based on the FlexStab program, written by Yael Mandel-Gutfroind. The FlexStab program can be found at: http://bioinfo.md.huji.ac.il/marg/Flexstab/.

Recently we published a study in MCB ( Zlotorynski et al., Mol Cell Biol 23:7143-7151 (2003) ) in which we used the TwistFlex program.


Introduction:

The TwistFlex program calculates flexibility at the twist angle of the DNA.

  • The calculation is made for overlapping windows along a given sequence. Within each window the flexibility is calculated for consecutive di-nucleotide steps, and the average value of all steps in the window is assigned to the first di-nucleotide step.

  • Area: Adjacent overlapping windows with twist angle-scores exceeding a specified threshold are referred to as 'flexibility peaks'. A flexibility area (number of base pairs within the peak and their twist angle values) is calculated for each flexibility peak.

  • Unified peaks: When the distance between the last base of one peak and the first base of the next peak is smaller than the window size, these two peaks are considered as one peak also called unified peak. Data of the two types of peaks- with and without unification- is provided to the user.

  • Clusters: When there is a minimal number of peaks so that the distance between the last base of the peak to the first base in the next peak is smaller than the maximal distance between peaks in a cluster, this peaks are defined as clustered. The user should set the minimal number of peaks that can be defined as a cluster of peaks and the maximal distance allowed between two adjacent peaks in a cluster. Data of the two types of clusters- clusters comprised of peaks without unification of peaks and clusters comprised of peaks after unification- is provided to the user.



  • Using the program:

    The user should provide the program with values for the following variables:
    Input file format a) one line format - only the sequence, without a header, numbers or spaces, in one line.

    b) GCG format - if the sequence is given as a file, the sequence itself should start with two dots (..). If the user wishes to add a header- it should precede the two dots. If the sequence is pasted - no header is allowed. Spaces and numbers between bases can be added.

    c) FASTA format - if the sequence is given as a file, the first line should start with "greater than" sign (>). This line is for the header of the sequence. All the lines that follow should contain the sequence itself. If the sequence is pasted - no header is allowed. Spaces and numbers are between bases can be added.

    For more details please read the Sequence format page.
    Window size The number of bases in each window.
    The window size should be ≥ 2, because the flexibility is calculated for di-nucleotide steps.
    Default window value - 100 bp
    Leap The number of bases the window is moved in each step.
    The leap should be a positive integer.
    Default leap value - 1 bp
    Threshold Value The Threshold Value of the twist angle. If the value of the average twist angle of a window is greater than the Threshold Value- this window will be defined as a flexible window.
    The Threshold Value can be zero (and then the whole sequence will be defined as flexible) or any positive number.
    Default Threshold Value - 13.7
    Normalization
    Value
    This variable enables the user to create a common denominator for comparison of flexibility of sequences with different lengths.
    The Normalization Value should be a positive integer.
    Default Normalization Value - 10,000 bp
    Discrete Display
    Value
    The sequence can be divided into segments. The length of each segment is the Discrete Display Value. The length of each flexibility peak, the number of peaks and the number of clusters are added and displayed, for each segment. This enables the user to easily create a graphic representation of the results, in which the X-axis is the sequence itself and the units are the discrete display values. The Y-axis is the total length of the flexibility peaks or the number of peaks or the number of clusters in each segment.
    The Discrete Display Value should be a positive integer.
    Default Discrete Display Value - 100,000 bp
    Minimal number
    of peaks
    in a cluster
    The minimal number of peaks that defined a cluster.
    This minimal number should be a positive integer.
    Default minimal value - 3 peaks
    Maximal distance
    between peaks
    in a cluster
    The maximal number of windows between two adjacent peaks in a cluster. If the distance between two adjacent peaks is larger than the given maximal distance- those peaks can not define a cluster.
    The maximal distance should be a positive integer.
    Default maximal value - 5,000 windows


    Output files:

    There are three output files. The files will be displayed on the screen when the execution of the program ends.

    We no longer support sending links via e-mail.

    - Output file with the suffix .miscel.out. This file contains:
    a) Parameters of the specific analysis.

    b) Total length of the sequence.

    c) Nucleotides and di-nucleotides composition of the sequence.

    d) Total length of the peaks and normalized length of the peaks.

    e) Total length of the unified peaks and normalized length of the unified peaks.

    f) Average and standard deviation of all the twist angles along the sequence.

    - Output file with the suffix .peaks.out. This file contains:
    a) Parameters of the specific analysis.

    b) Total number of peaks.

    c) Discrete display of the number and length of peaks.

    d) Peaks ordered by their location in the sequence.
    For each peak - its location in the sequence, its length, area and nucleotide and di-nucleotide composition.

    e) Peaks ordered by their length.
    For each peak - its location in the sequence, its length and area.

    f) Number of flexibility peaks ordered in groups by their length.

    g) Total nucleotides and di-nucleotides composition of the peaks.

    h) Total length of the peaks.

    i) Average and standard deviation of the lengths of peaks.

    j) Total number of unified peaks.

    k) Discrete display of the number and length of unified peaks.

    l) Unified peaks ordered by their location in the sequence.
    For each peak - its location in the sequence, its length and nucleotides and di-nucleotides composition.

    m) Unified peaks ordered by their length.
    For each peak - its location in the sequence and its length.

    n) Number of unified peaks ordered in groups by their length.

    o) Total nucleotide and di-nucleotide composition of the unified peaks.

    p) Total length of the unified peaks.

    q) Average and standard deviation of the lengths of unified peaks.

    - Output file with the suffix .cluster.out. This file contains:
    a) Parameters of the specific analysis.

    b) Total number of clusters.

    c) Discrete display of the number of clusters.

    d) Clusters ordered by their location in the sequence.
    For each cluster - all the peaks in the cluster.
    For each peak - its location in the sequence, its length and area.

    e) Total number of clusters comprised of unified peaks.

    f) Discrete display of the clusters comprised of unified peaks.

    g) Clusters comprised of unified peaks ordered by their location in the sequence.
    For each cluster - all the peaks in the cluster.
    For each peak - its location in the sequence and its length.


    Notes:

  • The sequence length should exceed the window length.

  • The input file should be in GCG, FASTA or one-line format only. No other types of files are allowed.



  • Contact us:

    Comments, suggestions, bugs detected, and any other question please send to
    kerem@cc.huji.ac.il or to netab@hotmail.com


    Copyright ©, 1997, The Hebrew University of Jerusalem. All Rights Reserved.