[BiO BB] BLASTing SCO (re: Linux IP, IBM suite, shred algorithm, etc)
Harry Mangalam
hjm at tacgi.com
Tue Sep 9 12:13:26 EDT 2003
I've been watching the SCO vs IBM suit with some interest and this piqued my
interest.
Eric Raymond has apparently reworked some old 'shred' code which calculates MD5
hashes for long (from the molbio perspective) words (~3 lines at a time) and
then sorts the hashes to identify sections of the Linux source code tree which
are identical to those from SCO-owned System V Unix base.
This sounds a bit like the initial pass for BLAT, which generates hashes for
much smaller words and uses the hashes in comparisons.
http://www.eweek.com/article2/0,4149,1257617,00.asp
Could BLAST not be used to faster & much more sensitively identify not only
identical but similar sections of code?
It would have to be modified to do an 'all against all' approach and would have
to also take into account line numbers and file names, but here'a a good
undergrad programming project for someone, with the possibility of getting some
good press and creating a tool that will undoubtedly be used again in litigation
(read: it could be worth real money)
Then again, the Raymond's shred code approach is probably good enough.
Comments?
--
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hjm at tacgi.com
<<plain text preferred>>
More information about the BBB
mailing list