[Scriptome-users] Merging with scriptome command

Amir Karger akarger at CGR.Harvard.edu
Wed May 24 14:41:17 EDT 2006


> So this is a wonderful web page that was pointed out to me 
> the other day!

Thanks! Merging is one of the most popular things people use the
Scriptome for, since Excel doesn't do it.

(Pardon the long-windedness of my explanations below.)

> I have 2 files that I would like to combine.  File 1 contains:
> chrno,marker,homozygotes,rare,observed,expected,pvalue,af
> 01,rs11206709,714,6,178,169,0.2147,Cases
> 01,rs11206709,746,11,158,162,0.4557,Controls
> 
> File 2 contains counts of inheritance inconsistencies:
> 01,rs10157092,1
> 01,rs1075364,2
> 
> File 1 contains 952,185 lines minus header and File 2 
> contains 7632.  The 
> majority of the markers don't have an inconsistency.
> 
> I would like to merge these files together by marker and have File 1 
> contain a new column of counts of inheritance 
> inconsistencies.  I tried 
> using the merge command with $col1=1 but got an empty output file.

A few points.

1. The files for merge tools (in fact, for most of the tools) need to be
tab-separated, not comma separated. But don't panic! Just use
change_any_separator_to_tab (the first tool on the Change page) to
change the commas to tabs. In fact, the example for that tool already
has comma as the separator so all you need to edit is the filenames.

2. For the actual merge, it sounds like you want to use
merge_lines_based_on_shared_column.  It takes each pair of corresponding
lines from the two files and merges them together to make a new, single
line with more columns. The other merge tools, like merge_intersection,
take some lines from one file and print them, and after all of those,
they print some lines from the other file. (See the examples for the
different merge tools on the website to clarify what each one does.)
 
3. You'll need to enter the column number for each file to do the merge
on, which in this case means 
	$col1=1; $col2=1;

4. Once you've merged, Excel should be able to read the tab-delimited
output. (Excel can read up to 65000 lines.) You can use Excel or other
Scriptome tools to change back to comma-separated, get rid of some
columns, etc.

Please write again if you have more questions, or if things break.

- Amir Karger
Research Computing
Bauer Center for Genomics Research
Harvard University
617-496-0626

> 
> Thanks very much in advance, Peggy White 5/24
> 


More information about the Scriptome-users mailing list