E-MAP Imputation Version 1.1

This zip file contains the following files:

symmetricNN.py : a python implementation of the Symmetric Nearest Neighbour algorithm described in "Missing Value Imputation for Epistatic MAPs". This is licensed under the Apache License, Version 2.0, details of which are embedded in the file and detailed at the bottom of this document.

symmetricLLS.py : a python implementation of the Symmetric Local Least Squares algorithm described in "Missing Value Imputation for Epistatic MAPs". This is licensed under the Apache License, Version 2.0, details of which are embedded in the file and detailed at the bottom of this document.

test_imputation.py : a python script to perform K fold cross validation and estimate the accuracy of imputation on a given dataset

esp.txt : a sample input file - the Early Secretory Pathway dataset, downloaded from http://interactome-cmp.ucsf.edu/. This dataset is from the paper "Exploration of the function and organization of the yeast early secretory pathway through an epistatic mini array profile (E-MAP)." by "Schuldiner, M.,S. R. Collins, N. J. Thompson, V. Denic, A. Bhamidipati, T. Punna, J. Ihmels, B. Andrews, C. Boone, J. F. Greenblatt, J. S. Weissman and N. J. Krogan.", published in Cell. 2005 Nov 4;123(3):507-19.

The programs can be run as follows :

python symmetricNN.py -i <INPUT_FILENAME> -o <OUTPUT_FILENAME>

with the following optional parameters :

-k --neighbours (number): the number of neighbours to use for imputation, defaults to 50 for NN, 20 for LLS
-u --unweighted: run nearest neighbours without any weighting, has no effect on LLS
-s --separator (character): the separator used in the input file, defaults to tab
-m --missing (string): the string to indicate a missing value, defaults to ""
-h --help: displays the usage details for the program 

The expected input is a symmetric interaction matrix, with the first row and column containing gene names. 

Entries are expected to be separated by tabs, but other separators can be specified using the "-s" parameter.
Missing values are expected to be indicated by the empty string "", but again other strings can be specified using the -m "parameter".
This is useful for comparison with other imputation implementations which require "999.0" or "NA" to be used to indicate missing values, and spaces or commas can be used to separate entries.

The Numpy library is required for symmetricLLS.py, and has been tested with version 1.4.0. 
See http://numpy.scipy.org/ for details of how to obtain this library.

Further information, and updates to this implementation will be available at : http://www.bioinformatics.org/emapimputation

For any queries please contact colm.ryan@ucd.ie

######################################################

License for symmetricNN.py and symmetricLLS.py :

Copyright 2009 Colm Ryan 

The files symmetricNN.py and symmetricLLS.py are licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. 
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions 
and limitations under the License. 