|
|||
Table of content: How to fill in the form:Job Name Algorithm:Type 1 logo ============================================= How to fill in the form:Input a meaningful name for your job (for example, Ecoli_RBS) will. If you don't input anything, the program will produce a temporary name automatically. DNA or RNA: like ACGTCGTAGGAAATC or AGCUCCCUUUAGCAAUC; multi-fasta, like this: one sequence one line, like this: Calculated from your sequences: the program will calculate the background frequency of every symbol from the sequences you have inputted. When the sequences were codons, the background frequency of nucleotides could be calculated separately in the three positions of codon. Full length of your shortest sequence: If you have inputted sequences with different length, the program will create a logo with number of positions equal to the shortest length of your sequences. Logo type: The type 1 logo is similar to WebLogo and the type 2 logo was designed to shown sequence bias. The over-represented symbols were stacked above zero and under-represented symbols under zero. The height of every symbol is proportional to its information content. Middle height (Pixel): set the height of the figure. A high value will make the figure clearer and bigger. Algorithm
where L is the position in the sequences; i are symbols (A, T, C and G for nucleotides or 20 amino acids for protein sequences); P(i,L) is the average probability of symbol i at position L; and Pi is the background probability of symbol i. H(i,L) is positive when P(i,L) is bigger than Pi, and negative when P(i,L) is smaller than Pi. For type 1 logo, the total height of the symbols in a position L is equal to H(L), and the height of every symbol i is proportional to the its observed frequency. For type 2 logo, all the letters from each stack are ordered from the biggest H(i,L) to the smallest, and all letters with a positive H(i,L) are stacked above zero. The height of a symbol i equals to H(i,L).
For every symbol in a position of sequences, chi square test or Fisher's exact test was used to evaluate the statistical significance of the difference between the frequency of that symbol in the position and the background (expected) frequency of that symbol. The program chose chi square test in large samples (number of sequences > 400, observed or expected number of the symbol in each position > 20) and Fisher's exact test in small samples (other conditions). The difference was assumed to be significant if P-value was less than a threshold (the default value is 0.05 and this can be modified by user).
|