gp_mkmtx - calculate frequencies of nucleotides


gp_mkmtx [-a] [-g value] [-l] [-q] [-v] [-d] [-h] [inputfile] [outputfile]


print only the absolute numbers of occurencies

-g value
divide each frequency by the expected frequency at GC contents equal to value %.

do not apply logarythmic scaling (as a default, gp_mkmtx calculates the logarythm of the frequencies.

Prints the version information.

Prints lots of debugging information.

Shows usage information.

file to proces; if not given, will use standard input

file to write the data to; if not given, will use standard output


gp_mkmtx is supposed to be a tool for an easy creation of matrices for the gp_matrix program. It takes a set of sequences, calculates the frequency of a nucleotide at each position starting from the first nucleotide and ending with the last nucleotide of the shortest sequence. For each position, four values are printed in a row, respectively for A, C, G and T/U. Each value is the logarithm of the calculated frequency (logarythmisation can be suppresed with the -l option). If the -g option is used, prior to the logarithmic scaling the values are diveded by the expected frequency at the given GC contents (that is, for example, at GC=50%, 0.25 for each nucleotide).


gp_mkmtx -g 50 somesequence.fasta somesequence.mtx

will produce a matrix file somesequence.mtx which, after some editing, will be directly suitable for the gp_matrix program.


Genpak(1) gp_acc(1) gp_cusage(1) gp_digest(1) gp_dimer(1) gp_findorf(1) gp_gc(1) gp_getseq(1) gp_map(1) gp_matrix(1) gp_pattern(1) gp_primer(1) gp_qs(1) gp_randseq(1) gp_seq2prot(1) gp_slen(1) gp_tm(1) gp_trimer(1)


All Genpak programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid.

The Genpak programs do not write over existing files. I have found this feature very useful :-)


I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.


January Weiner III <>