-
Due to the large size of chromosome data files, any improvement in data compression methods saves bandwidth and transmission time. A 30% improvement in compression would help immensely in storage space also, and it is particularly welcome for those with slow internet access.
2bit compression uses the fact that 4 values (G, C, A, T) can be expressed in the binary numbers 00, 01, 10, and 11. Packing data based on this, followed by any of the standard compression programs, gives results that varied from 19% to 35% better than standard compression alone, in tests with chrY.fa. See the URL below for more details, and the GPL copylefted python program. It may require python 2.3; it hasn't been tested with anything older.
http://jcomeau.com/genome/#compress
Discussion forums: Better genome data compression
Expanded view | Monitor forum | Save place
Start a new thread:
You have to be logged in to post a reply.