[BiO BB] BLASTing SCO (re: Linux IP, IBM suite, shred algorithm, etc)

Harry Mangalam hjm at tacgi.com
Tue Sep 9 12:13:26 EDT 2003

I've been watching the SCO vs IBM suit with some interest and this piqued my 

Eric Raymond has apparently reworked some old 'shred' code which calculates MD5 
hashes for long (from the molbio perspective) words (~3 lines at a time) and 
then sorts the hashes to identify sections of the Linux source code tree  which 
are identical to those from SCO-owned System V Unix base.

This sounds a bit like the initial pass for BLAT, which generates hashes for 
much smaller words and uses the hashes in comparisons.


Could BLAST not be used to faster & much more sensitively identify not only 
identical but similar sections of code?

It would have to be modified to do an 'all against all' approach and would have 
to  also take into account line numbers and file names, but here'a a good 
undergrad programming project for someone, with the possibility of getting some 
good press and creating a tool that will undoubtedly be used again in litigation 
(read: it could be worth real money)

Then again, the Raymond's shred code approach is probably good enough.

Cheers, Harry
