From p.molenaar at amc.uva.nl Fri Oct 5 05:37:54 2007
From: p.molenaar at amc.uva.nl (piet molenaar)
Date: Fri, 5 Oct 2007 11:37:54 +0200
Subject: [BiO BB] Symposium: Integrative Bioinformatics;
schedule and titles of talks definitive
Message-ID: <554293e20710050237n316bc1aey65421de2c742834b@mail.gmail.com>
One day symposium
8 November 2007 - Academic Medical Center - Amsterdam- Netherlands
Integrative Bioinformatics:
At the cutting edge of network analysis and biological data integration
Each molecular biologist working with high throughput data is confronted
with the need to reconstruct and validate the underlying complex regulatory
networks. At this one-day symposium, some of the most prominent leaders in
the field will present their approaches and views. This symposium offers a
platform for a discussion of state of the art biological network analyses.
Confirmed speakers and titles:
- Leroy Hood: Biological networks and disease
- Ruedi Aebersold: Protein-centered networks in systems biology:
Analysis and visualization
- Trey Ideker: Gaining power in gene association studies with
Cytoscape
- Andrew Hopkins: Network Pharmacology: chemical opportunities for
systems biology
- Peter Sorger: Modeling Mammalian Death and Survival Pathways
- Ewan Birney: Reactome, Networks and Genomes
- Rogier Versteeg: Oncogenic networks of cancer pathways in childhood
cancer
- Benno Schwikowski: Computational tools increasing sensitivity and
reliability of mass spec-based proteomics
- Chris Sander: Systems Biology of Cancer Pathways: from Molecular
Perturbations to Cellular Phenotypes
- Jean-Daniel Fekete: Visualizing Dense Networks with Enhanced and
Hybrid Matrices
Please distribute the attached poster (pdf) in your institute.
The symposium is part of the 5th annual Cytoscape Public Symposium and
Developers Retreat.
For registration, Symposium programme and directions go to
www.cytoscape.org/retreat2007
We look forward to welcoming you to Amsterdam!
The Organizing Committee, 5th Cytoscape Retreat 2007
w: www.cytoscape.org/retreat2007
e: cytoretreat at cytoscape.org
Department of Human Genetics - M1-131
Academic Medical Center - University of Amsterdam
Meibergdreef 9 - 1100 DD - Amsterdam - the Netherlands
From jeedward at gmail.com Fri Oct 5 20:24:33 2007
From: jeedward at gmail.com (John Edward)
Date: Fri, 5 Oct 2007 20:24:33 -0400
Subject: [BiO BB] BCBGC-2008 Call for papers
Message-ID: <000301c807af$475d0990$6401a8c0@cisnotebookbp>
Call for papers
The 2008 International Conference on Bioinformatics, Computational Biology,
Genomics and Chemoinformatics (BCBGC-08) (website: www.PromoteResearch.org
) will be held during July 7-10 2008 in
Orlando, FL, USA. We invite draft paper submissions and session proposals.
The conference will be held at the same time and place where several other
major events are taking place. The website contains more details.
Sincerely
John Edward
From christoph.gille at charite.de Tue Oct 9 12:09:19 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue, 9 Oct 2007 18:09:19 +0200 (CEST)
Subject: [BiO BB] genbank2swissprot ?
Message-ID: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
Is there a mapping of identifiers from genbank nt sequences
to identifiers of swissprot (protein) ?
Using some tables in
ftp://ftp.ncbi.nih.gov/refseq/
ftp://ftp.ncbi.nih.gov/gene/DATA
there seems to be a way indirectly over the proteinkb.
But perhaps there is a more direct way?
Many thanks
Christoph
From boris.steipe at utoronto.ca Tue Oct 9 14:17:49 2007
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Tue, 9 Oct 2007 14:17:49 -0400
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
Message-ID: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
Does the UniProt ID mapping service fit your requirements?
http://www.pir.uniprot.org/search/idmapping.shtml
Boris
On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote:
> Is there a mapping of identifiers from genbank nt sequences
> to identifiers of swissprot (protein) ?
> Using some tables in
> ftp://ftp.ncbi.nih.gov/refseq/
> ftp://ftp.ncbi.nih.gov/gene/DATA
> there seems to be a way indirectly over the proteinkb.
> But perhaps there is a more direct way?
> Many thanks
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
From christoph.gille at charite.de Tue Oct 9 16:56:53 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue, 9 Oct 2007 22:56:53 +0200 (CEST)
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
<3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
Message-ID: <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de>
Many thanks Boris,
this is quite good, though the table is incomplete.
I think I somewhere saw
an assignment of mRNA to each swissprot entry.
Unfortunately I do not find it any more.
Christoph
From Stan.Gaj at BIGCAT.unimaas.nl Wed Oct 10 07:58:55 2007
From: Stan.Gaj at BIGCAT.unimaas.nl (Gaj Stan (BIGCAT))
Date: Wed, 10 Oct 2007 13:58:55 +0200
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
<3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
Message-ID: <1C6586068DF1054DA3899159C38B32C90B04ABD1@um-mail0136.unimaas.nl>
Hi Christoph,
There are two other possibilities:
a) Use BioMART at www.ensembl.org to retrieve EnsEMBL gene IDs using your list of RefSeq ID (You did mention you used NM_-ID's, so I assume you mean RefSeq IDs) and export this list with their UniProt crosslinking as well. A problem you'll surely encounter using this approach is that there are situations where more than one UniProt ID has been associated with an EnsEMBL gene. The generated list contains this information, but on seperate lines. You'll need to filter the list for this.
b) The RefSeq group has recently announced that they updated their databases with information towards UniProt (since they collaborated closely on this one). I can't find the archive of their Gene-Announce-list, but here is the announcement:
=======
Announcing the availability of RefSeq-UniProtKB cross-link data
In collaboration with UniProtKB (http://www.pir.uniprot.org/) , the RefSeq group is now reporting explicit cross-references to Swiss-Prot and TrEMBL proteins that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt's release cycle. The data are being made available from several sites within NCBI:
1. The full report from Entrez Gene, in the Reference Sequences section.
For an example, go to the Full Report page for the sevenless gene of Drosophila melanogaster (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=DetailsSearch&Term=32039%5Buid%5D) and click on the Reference Sequences section in the table of contents on the right. You will see
mRNA and Protein(s)
NM_078559.2?NP_511114.2 sevenless CG18085-PA [Drosophila melanogaster]
UniProtKB/Swiss-Prot P13368 <--- new data
2. Links in NCBI's Protein database
Explicit links between corresponding RefSeq and Swiss-Prot proteins are now provided within the NCBI Protein database. These links are available in the ?Links? menu located at the upper right of the protein display page. The link names are:
Protein (RefSeq): provides a link from a Swiss-Prot record the corresponding RefSeq record
Protein (UniProtKB): provides a link to the equivalent Swiss-Prot record
3. Filter choices in NCBI's Protein database
protein protein refseq2uniprot find RefSeq protein records with a link to a UniProtKB protein in NCBI's protein database
protein protein uniprot2refseq find UniProtKB protein records with a link to a RefSeq protein in NCBI's protein database
4. ftp sites
A new file was added to the gene and refseq ftp sites to report the relationship between NCBI Reference Sequence protein accessions and UniProtKB protein accessions. The new gene_refseq_uniprotkb_collab.gz file specifies the corresponding pairs of NCBI and UniProtKB protein accessions.
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz
or
ftp://ftp.ncbi.nlm.nih.gov/refseq/uniprotkb/gene_refseq_uniprotkb_collab.gz
The README file on the gene and refseq ftp sites has been updated to document this addition. See:
ftp://ftp.ncbi.nlm.nih.gov/gene/README
ftp://ftp.ncbi.nlm.nih.gov/refseq/README
5. the ASN.1 in Entrez Gene
New implementation of a gene-commentary:
Each cross-reference will be reported in a gene-commentary of type other. Note: more than one cross-reference per RefSeq protein record is possible.
type other,
source {
{
src {
db "UniProtKB/Swiss-Prot",
tag str "P23760"
},
anchor "P23760"
}
{
src {
db "UniProtKB/TrEMBL",
tag str "O23760"
},
anchor "O23760"
}
====
I haven't tested this one out myself, but I think it might do the trick for you (:
Best wishes,
-- Stan
-----Original Message-----
From: bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org [mailto:bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org] On Behalf Of Boris Steipe
Sent: 09 October 2007 20:18
To: General Forum at Bioinformatics.Org
Subject: Re: [BiO BB] genbank2swissprot ?
Does the UniProt ID mapping service fit your requirements?
http://www.pir.uniprot.org/search/idmapping.shtml
Boris
On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote:
> Is there a mapping of identifiers from genbank nt sequences
> to identifiers of swissprot (protein) ?
> Using some tables in
> ftp://ftp.ncbi.nih.gov/refseq/
> ftp://ftp.ncbi.nih.gov/gene/DATA
> there seems to be a way indirectly over the proteinkb.
> But perhaps there is a more direct way?
> Many thanks
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
From dan.bolser at gmail.com Wed Oct 10 05:26:07 2007
From: dan.bolser at gmail.com (Dan Bolser)
Date: Wed, 10 Oct 2007 11:26:07 +0200
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
<3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
<61125.84.190.65.179.1191963413.squirrel@webmail.charite.de>
Message-ID: <2c8757af0710100226r68e05c9elc1f03f636415036a@mail.gmail.com>
>From a recent 'refseq-announce' email;
*Announcing the availability of RefSeq-UniProtKB cross-link data*
In collaboration with UniProtKB (http://www.pir.uniprot.org/) , the
RefSeqgroup is now reporting explicit cross-references to Swiss-Prot
and TrEMBL
proteins that correspond to a RefSeq protein. These correspondences are
being calculated by the UniProtKB group, and will be updated every three
weeks to correspond to UniProt's release cycle. The data are being made
available from several sites within NCBI:
2.* **Links in NCBI's Protein database*
Explicit links between corresponding RefSeq and Swiss-Prot proteins are now
provided within the NCBI Protein database. These links are available in
the 'Links' menu located at the upper right of the protein display page.
The link names are:
* **Protein (RefSeq)*: provides a link from a
Swiss-Prot record the corresponding RefSeq record
* **Protein (UniProtKB*): provides a link to the
equivalent Swiss-Prot record
4*. ftp sites*
A new file was added to the gene and refseq ftp sites to report the
relationship between NCBI Reference Sequence protein accessions and
UniProtKB protein accessions. The new gene_refseq_uniprotkb_collab.gz file
specifies the corresponding pairs of NCBI and UniProtKB protein accessions.
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz
or
ftp://ftp.ncbi.nlm.nih.gov/refseq
/uniprotkb/gene_refseq_uniprotkb_collab.gz
The README file on the gene and refseq ftp sites has been updated to
document this addition. See:
ftp://ftp.ncbi.nlm.nih.gov/gene/README
ftp://ftp.ncbi.nlm.nih.gov/refseq/README
HTH,
Dan.
On 09/10/2007, Dr. Christoph Gille wrote:
>
> Many thanks Boris,
>
> this is quite good, though the table is incomplete.
> I think I somewhere saw
> an assignment of mRNA to each swissprot entry.
> Unfortunately I do not find it any more.
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
--
hello
From yanz at email.unc.edu Mon Oct 22 18:38:49 2007
From: yanz at email.unc.edu (Yan Zhang)
Date: Mon, 22 Oct 2007 18:38:49 -0400
Subject: [BiO BB] Call for Participation: NSF Biomedical Informatics
Workshop, Dec.
Message-ID: <20071022183849.mxbzw6t7kk4ksow8@webmail3.isis.unc.edu>
Call for Participation: NSF Biomedical Informatics Workshop, Dec. 4-5,
2007, Portland, OR
We are seeking US-based participants for the upcoming NSF sponsored
workshop on Biomedical Informatics. The core program of the workshop
will feature speakers and panelists who have been invited to join the
workshop. However, we reserved a few speaker and panel slots for
accommodating participants who have not been invited but may qualify
based on current research and interests.
All costs associated with the workshop event will be covered for most
participants selected based on this CFP. If you currently have funding
from a US Government Agency to conduct research on an area related to
the main themes of the workshop, it is highly likely you will be
selected to participate and you will be reimbursed for your cost from
_our_ NSF workshop grant. You will be expected to conduct a panel
presentation or a poster presentation at the workshop. Please contact
Javed Mostafa (jm at unc.edu) if you are interested in participating. In
your email to Javed, do include a paragraph describing your current
research and / or point to current research materials, and describe any
funded projects you are currently engaged in or recently completed. We
will be able to respond to you with an answer fairly soon after we
receive your interest statement.
Below we are providing a brief overview of the workshop. More
information on the workshop can be found at the workshop website:
http://biomedweb.info.
We are planning to explore research challenges and emerging solutions
for handling health data from the point of origin (i.e., data
collection) to the presentation and manipulation stages. We are also
interested in exploring a wide spectrum of health data ? starting at
the genomics and cellular level, to the patients? level, and ultimately
to the population level. Some specific topics that we will cover
include:
* Data Acquisition: Capturing data, conversion to appropriate
structures/formats, and natural language processing
* Data Standards: Limitations of current standards such as the HL7V3
standard but also potential utility of emerging standards
* Semantic Interoperability: Vocabularies, ontologies, and techniques
for semantic level sharing of data
* Data Management: Challenges associated with scale, heterogeneity,
distributed, and fragmentary nature of data
* Data Presentation: Visual, adaptive, and optimal presentation of data
for enhancing use and understanding
* Data Services: Emerging applications for supporting research, quality
and safety management, public health studies, etc.
Beyond the areas above, we are also interested in the the following
topics: Cyberinfrastructures for health information delivery,
social/economical/political barriers in expanding access to health
information, and privacy/security issues in health information
delivery. We are planning to have panels focusing on these topics at
the workshop.
To encourage further research on improving health care information
systems, NSF has provided funds to Indiana University and University of
North Carolina at Chapel Hill to conduct this workshop. Our aim is to
bring together experts from research, professional, and government
sectors. The workshop will dedicate two days to individual and panel
presentations and birds-of- feather events to identify and prioritize
key challenges. Current NSF and NIH grantees, academics with
substantial track records in the area, industry experts, and program
officers in key US agencies have been invited to participate. The
workshop will be held on December 4th and 5th, 2007, in Portland, OR.
Yan Zhang [Submitted on behalf of Dr. Javed Mostafa: jm at unc.edu]
From ngadewal at yahoo.com Tue Oct 23 06:24:49 2007
From: ngadewal at yahoo.com (nikhil gadewal)
Date: Tue, 23 Oct 2007 03:24:49 -0700 (PDT)
Subject: [BiO BB] sequence analysis
Message-ID: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Hello all
I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human.
Is there any specific tool available to do so.
Can BLAST or FASTA handle it by changing the parameters.
Thankyou in advance.
Nikhil
NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
From marty.gollery at gmail.com Tue Oct 23 19:27:26 2007
From: marty.gollery at gmail.com (Martin Gollery)
Date: Tue, 23 Oct 2007 16:27:26 -0700
Subject: [BiO BB] sequence analysis
In-Reply-To: <838860.9612.qm@web51504.mail.re2.yahoo.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID:
For an exact match you can simply use grep, or open the target in an
editor such as textpad and use the search function.
Marty
On 10/23/07, nikhil gadewal wrote:
> Hello all
>
> I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human.
> Is there any specific tool available to do so.
> Can BLAST or FASTA handle it by changing the parameters.
>
> Thankyou in advance.
>
> Nikhil
>
>
> NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people.
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
--
--
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451
From marty.gollery at gmail.com Tue Oct 23 19:54:02 2007
From: marty.gollery at gmail.com (Martin Gollery)
Date: Tue, 23 Oct 2007 16:54:02 -0700
Subject: [BiO BB] sequence analysis
In-Reply-To: <200710231843.55373.kanzure@gmail.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
<200710231843.55373.kanzure@gmail.com>
Message-ID:
No, the proteins are not too big, only about 20 MB.
Marty
On 10/23/07, Bryan Bishop wrote:
>
> "Of the protein sequences from human." Martin, it sounds like what
> Nikhil wants to do would require much more memory than textpad would be
> allowed to allocate. Right?
>
> - Bryan
>
> On Tuesday 23 October 2007 18:27, Martin Gollery wrote:
> > For an exact match you can simply use grep, or open the target in an
> > editor such as textpad and use the search function.
> >
> > Marty
> >
> > On 10/23/07, nikhil gadewal wrote:
> > > Hello all
> > >
> > > I am interested to find tetramer peptide matching 100% to the
> > > carboxyl terminal of the protein sequences from human. Is there any
> > > specific tool available to do so.
> > > Can BLAST or FASTA handle it by changing the parameters.
> > >
> > > Thankyou in advance.
> > >
> > > Nikhil
>
--
--
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451
From marchywka at hotmail.com Tue Oct 23 21:13:28 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 23 Oct 2007 21:13:28 -0400
Subject: [BiO BB] sequence analysis
In-Reply-To:
Message-ID:
> > > > I am interested to find tetramer peptide matching 100% to the
> > > > carboxyl terminal of the protein sequences from human. Is there any
> > > > specific tool available to do so.
> > > > Can BLAST or FASTA handle it by changing the parameters.
I took a quick look at BLAST documentation and didn't find anything.
I also tried standard PERL regex anchors $ and ^ but didn't seem to work.
What is your objective? I've had a heck of a time with downloaded prosite
rules that have
anchors as I have to turn them off for translation products from non-edited
transcripts. The reason I mention this is that you may in fact not want to
limit
yourself until you see what the other hits look like. With only a few acids
you may have to sort through a lot of hits but also see if the conserved
domain
tools offer any help as apparently some of these can be position sensitive.
I've got my own code that does the opposite of what you want ( :) ).
That is, given a few hundred sequences and some patterns in the form of
PERL regex, I can find each one in each sequence and use this information
for
alignment and clustering ( at least that is the hope, and initial results
don't look
foolish).
If there is a real need , I guess I could modify this to do the opposite.
That is, if you get a few 1000 proteins that contain your query anywhere,
and you can't separate with existing
tools but you can phrase your query as a PERL regex, I may have something
that helps.
But, sure you can just download all the blast hits that have your peptide
anywhere
and grep for all the occurences ( this can be a hassle as some straddle line
breaks
etc) anchored to the end.
Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.
_________________________________________________________________
Make every IM count. Download Messenger and join the i?m Initiative now.
It?s free. http://im.live.com/messenger/im/home/?source=TAGHM
From christoph.gille at charite.de Wed Oct 24 03:17:08 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Wed, 24 Oct 2007 09:17:08 +0200 (CEST)
Subject: [BiO BB] sequence analysis
In-Reply-To: <838860.9612.qm@web51504.mail.re2.yahoo.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
You could use fgrep. Fgrep is faster than grep.
Before, you should transform the database file such that
each sequence takes one line without blank and without
line breaks (using tr and sed)
Database files are optimized for hole cards for
historical reasons. Lines are wrapped after at least after 72
characters, preventing the use of fgrep.
From Sterten at aol.com Wed Oct 24 03:26:09 2007
From: Sterten at aol.com (Sterten at aol.com)
Date: Wed, 24 Oct 2007 03:26:09 EDT
Subject: [BiO BB] genbank orthography
Message-ID:
names are not spelled uniformly, e.g. Viet Nam and Vietnam,
also many typos, this makes it very difficult to sort and analyse the entries
by computer.
I'm looking for a complete list of different spellings
(thousands of entries...) and the suggested standard so we can
correct/uniformify them automatically.
From dan.bolser at gmail.com Wed Oct 24 04:41:05 2007
From: dan.bolser at gmail.com (Dan Bolser)
Date: Wed, 24 Oct 2007 10:41:05 +0200
Subject: [BiO BB] genbank orthography
In-Reply-To:
References:
Message-ID: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com>
On 24/10/2007, Sterten at aol.com wrote:
>
> names are not spelled uniformly, e.g. Viet Nam and Vietnam,
> also many typos, this makes it very difficult to sort and analyse the entries
> by computer.
> I'm looking for a complete list of different spellings
> (thousands of entries...) and the suggested standard so we can
> correct/uniformify them automatically.
Great idea. The PDB needs something similar also!
>
>
>
>
> _______________________________________________
> General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
--
hello
From marchywka at hotmail.com Wed Oct 24 08:27:36 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Wed, 24 Oct 2007 08:27:36 -0400
Subject: [BiO BB] sequence analysis
In-Reply-To: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
Message-ID:
>Before, you should transform the database file such that
I've taken my local blast databases and used their fasta form for
"grepping" ( using my own code that calls either greta or boos regex
libraries)
against various genome sequences. It turns out to be too slow for repetitive
usage but I would comment as follow.
The patterns of biological interest tend to be subsets of regex so you can
implement special code that is a lot faster when your query isn't
blast-friendly.
For example, a "conserved" domain may look like "neutral"-many irrlelvant-
cysteine-X-cysteine-many irrelevant-H- etc (I just made this up but it is
based on
many thing I've seen in the literature). You may have a hard time blasting
for this
but you can grep for it with something like
[ANCQGILMFPSTWYV].{50,60}C.C.{10,100}H
If you want a real-life example, here are some from prosite using my prosite
to
PERL translation scheme ( I hate illustrating with real things that may not
be right):
[LIVM][VIC].[^H]G[DENQTA].[GAC][^L].[LIVMFY]{4}.{2}G >rule|16|PEPDTIDE
Prosite CNMP_BINDING_1
[EQ][^LNYH].[ATV][FY][^LDAM][^T]W[^PG]N >rule|18|PEPDTIDE Prosite ACTININ_1
>From what I've seen, this is too slow for grep against many genes ( or
pre-translated peptides)
but you can compile the query and target for much faster searching ( similar
to a transient database index ). Even literal string matching can be slow
without
doing this - I have 500k empirically discovered ( highly-redundant lots of
junk )
repeats that I can now label against 100, 60kb sequences in "reasonable"
time
which I could not do before. This works fine for the 600 or so mirna
sequences
I finally figured out how to download from sanger too :)
Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.
>From: "Dr. Christoph Gille"
>Reply-To: "General Forum at Bioinformatics.Org"
>
>To: "General Forum at Bioinformatics.Org"
>
>Subject: Re: [BiO BB] sequence analysis
>Date: Wed, 24 Oct 2007 09:17:08 +0200 (CEST)
>
>You could use fgrep. Fgrep is faster than grep.
>
>Before, you should transform the database file such that
>each sequence takes one line without blank and without
>line breaks (using tr and sed)
>
>Database files are optimized for hole cards for
>historical reasons. Lines are wrapped after at least after 72
>characters, preventing the use of fgrep.
>
>
>
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_________________________________________________________________
Get a FREE Web site and more from Microsoft Office Live Small Business!
http://clk.atdmt.com/MRT/go/aub0930004958mrt/direct/01/
From Sterten at aol.com Wed Oct 24 09:38:00 2007
From: Sterten at aol.com (Sterten at aol.com)
Date: Wed, 24 Oct 2007 09:38:00 EDT
Subject: [BiO BB] genbank orthography
Message-ID:
I once made a list for influenza, but not nearly complete. Just some hundred
of the
most common (mis-)spellings i.e. wrt. the genes
In einer eMail vom 24.10.2007 10:41:32 Westeurop?ische Normalzeit schreibt
dan.bolser at gmail.com:
On 24/10/2007, Sterten at aol.com wrote:
>
> names are not spelled uniformly, e.g. Viet Nam and Vietnam,
> also many typos, this makes it very difficult to sort and analyse the
entries
> by computer.
> I'm looking for a complete list of different spellings
> (thousands of entries...) and the suggested standard so we can
> correct/uniformify them automatically.
Great idea. The PDB needs something similar also!
From marchywka at hotmail.com Wed Oct 24 10:25:25 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Wed, 24 Oct 2007 10:25:25 -0400
Subject: [BiO BB] genbank orthography
In-Reply-To:
Message-ID:
I deleted most of the posts on this thread but as with other thread
if you can reduce the db to text, there are plenty of good tools for
one-time text processing- this is easy with sed and perl. There are
indexing scripts on the web that are only 10-20 lines long. It isn't hard
to find typos in such a list, with or without a spelling dictionary.
>From: Sterten at aol.com
>Reply-To: "General Forum at Bioinformatics.Org"
>
>To: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] genbank orthography
>Date: Wed, 24 Oct 2007 03:26:09 EDT
>
>
>names are not spelled uniformly, e.g. Viet Nam and Vietnam,
>also many typos, this makes it very difficult to sort and analyse the
>entries
>by computer.
>I'm looking for a complete list of different spellings
>(thousands of entries...) and the suggested standard so we can
>correct/uniformify them automatically.
>
>
>
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_________________________________________________________________
i'm making a difference.?Make every IM count for the cause of your choice.
Join Now. http://im.live.com/messenger/im/home/?source=TAGHM
From kanzure at gmail.com Tue Oct 23 19:43:55 2007
From: kanzure at gmail.com (Bryan Bishop)
Date: Tue, 23 Oct 2007 18:43:55 -0500
Subject: [BiO BB] sequence analysis
In-Reply-To:
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID: <200710231843.55373.kanzure@gmail.com>
"Of the protein sequences from human." Martin, it sounds like what
Nikhil wants to do would require much more memory than textpad would be
allowed to allocate. Right?
- Bryan
On Tuesday 23 October 2007 18:27, Martin Gollery wrote:
> For an exact match you can simply use grep, or open the target in an
> editor such as textpad and use the search function.
>
> Marty
>
> On 10/23/07, nikhil gadewal wrote:
> > Hello all
> >
> > I am interested to find tetramer peptide matching 100% to the
> > carboxyl terminal of the protein sequences from human. Is there any
> > specific tool available to do so.
> > Can BLAST or FASTA handle it by changing the parameters.
> >
> > Thankyou in advance.
> >
> > Nikhil
From Lambert at Chatham.edu Tue Oct 23 20:46:51 2007
From: Lambert at Chatham.edu (Lambert, Lisa)
Date: Tue, 23 Oct 2007 20:46:51 -0400
Subject: [BiO BB] sequence analysis
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID: <7980CF1A43C6564184A9A92D278F839E06AFBDDD@hickory.chatham.local>
I would suggest using PatScan: http://www-unix.mcs.anl.gov/compbio/PatScan/.
Unlike doing a plain text search, it will let you specify that the pattern must be at the end or the beginning of a sequence.
Lisa
-----Original Message-----
From: bio_bulletin_board-bounces+lambert=chatham.edu at bioinformatics.org on behalf of nikhil gadewal
Sent: Tue 10/23/2007 6:24 AM
To: bio_bulletin_board at bioinformatics.org
Cc:
Subject: [BiO BB] sequence analysis[Scanned]
Hello all
I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human.
Is there any specific tool available to do so.
Can BLAST or FASTA handle it by changing the parameters.
Thankyou in advance.
Nikhil
NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India Great minds discuss ideas; Average minds discuss events; Small minds discuss people.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
From Richard.Squires at UTSouthwestern.edu Wed Oct 24 11:10:48 2007
From: Richard.Squires at UTSouthwestern.edu (Richard Squires)
Date: Wed, 24 Oct 2007 10:10:48 -0500
Subject: [BiO BB] genbank orthography
Message-ID: <471F1A28020000E50003C47E@swnw124.swmed.edu>
For correct spellings you can use a Gazetteer. As far as a resource of misspellings I am not aware of one.
Burke
---
Burke Squires
BioHealthBase BRC
Influenza Bioinformaticist
University of Texas Southwestern Medical Center at Dallas
richard.squires at utsouthwestern.edu
(214) 648-4952
>>> 10/24/07 2:26 AM >>>
names are not spelled uniformly, e.g. Viet Nam and Vietnam,
also many typos, this makes it very difficult to sort and analyse the entries
by computer.
I'm looking for a complete list of different spellings
(thousands of entries...) and the suggested standard so we can
correct/uniformify them automatically.
_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
From ma11 at gen.cam.ac.uk Wed Oct 24 12:58:38 2007
From: ma11 at gen.cam.ac.uk (Michael Ashburner)
Date: Wed, 24 Oct 2007 18:58:38 +0200
Subject: [BiO BB] genbank orthography
In-Reply-To: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com>
References:
<2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com>
Message-ID:
I agree it is a terrible mess. Not the new gaz.obo project. This is
an attempt to build an artefact in OBO
format for geographical locations. The current version has about
20,000 locations. We have a parse of about
45,000 from Genbank but it will take some time to check them and get
them in to this file.
http://obo.cvs.sourceforge.net/obo/obo/ontology/environmental/gaz.obo?
view=log
Michael Ashburner
The file is available from the OBO CVS site
On 24 Oct 2007, at 10:41, Dan Bolser wrote:
> On 24/10/2007, Sterten at aol.com wrote:
>>
>> names are not spelled uniformly, e.g. Viet Nam and Vietnam,
>> also many typos, this makes it very difficult to sort and analyse
>> the entries
>> by computer.
>> I'm looking for a complete list of different spellings
>> (thousands of entries...) and the suggested standard so we can
>> correct/uniformify them automatically.
>
> Great idea. The PDB needs something similar also!
>
>
>>
>>
>>
>>
>> _______________________________________________
>> General Forum at Bioinformatics.Org -
>> BiO_Bulletin_Board at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>
>
> --
> hello
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
From postelv at gmail.com Thu Oct 25 06:33:38 2007
From: postelv at gmail.com (vladi postal)
Date: Thu, 25 Oct 2007 12:33:38 +0200
Subject: [BiO BB] readseq(converting genbank to gcg format)
Message-ID: <81e4d3400710250333k75b9823fv80480ba8280a23e0@mail.gmail.com>
Hi,
I use readseq program in unix to convert genbank files to gcg format.
The problem is that for some files the program don't work,
for example: gbpat2.seq, gbpat3.seq,gbpat8.seq don't work but for
gbpat1.seq, gbpat4.seq, gbpat6 , seq,gbpat7.seq it works fine.
does anyone have the same problem?
How to solve it?
any help will be appreciated.
vladi,
From jeff at bioinformatics.org Fri Oct 26 18:00:05 2007
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Fri, 26 Oct 2007 18:00:05 -0400
Subject: [BiO BB] Course: Mitochondriomics
Message-ID: <47226365.8030309@bioinformatics.org>
Mitochondriomics
Online at the Bioinformatics Organization
In collaboration with Roskilde University, Denmark
November 5-9, 2007
CONTENTS:
1. BACKGROUND
2. OBJECTIVE
3. INSTRUCTORS
4. COURSE OUTLINE
5. ADDITIONAL INFORMATION
----------------------------------------
1. BACKGROUND
----------------------------------------
Mitochondria are semiautonomous organelles, presumed to be the evolutionary product of a symbiosis between a eukaryote and a prokaryote. The organelle is present in almost all eukaryotic cells in an extent from 103-104 copies. The main function of mitochondria is production of ATP by oxidative phosphorylation and its involvement in apoptosis. The organelles contain almost exclusively maternally inherited mtDNA, and they have specific systems for transcription, translation and replication of mtDNA.
Mitochondrial dysfunction has been correlated with mitochondrial diseases where the clinical pathologies are believed to include infertility, diabetes, blindness, deafness, stroke, migraine and heart-, kidney-, and liver diseases. Recently cancer was added to this list when investigations into human cancer cells from breast, bladder, neck, and lung, revealed a high occurrence of mutations in mtDNA. With the emerging understanding of the role of mitochondria in a vast array of pathologies, research of mitochondria and mitochondrial dysfunction have in the last decade yielded a huge amount of data in form of publications and databases. Nevertheless, the field of mitochondrial research is still far from exhausted with many unknown factors yet to be discovered.
----------------------------------------
2. OBJECTIVE
----------------------------------------
The purpose of this course is to introduce the student to the various databases and wet-lab methods available. Furthermore the course will through selected articles give an understanding of the pitfalls and limitations of the various databases and methods.
----------------------------------------
3. INSTRUCTORS
----------------------------------------
* Claus Desler (cdesler at ruc.dk)
* Prashanth Suravajhala (prash at ruc.dk)
----------------------------------------
4. COURSE OUTLINE
----------------------------------------
* Day 1: Introduction to mitochondria and its pathways, genetics; proteomics of mitochondria
* Day 2: Various assays used and advances in mitochondrial research
* Day 3: Tools and databases used in mitochondrial research; exercises
* Day 4: Exercises, report, review of literature continues
* Day 5: Summary and questions and answers; evaluation
----------------------------------------
5. ADDITIONAL INFORMATION
----------------------------------------
Please visit:
* http://wiki.bioinformatics.org/BI221A_Mitochondriomics
From harry.mangalam at uci.edu Fri Oct 26 20:07:11 2007
From: harry.mangalam at uci.edu (Harry Mangalam)
Date: Fri, 26 Oct 2007 17:07:11 -0700
Subject: [BiO BB] sequence analysis
In-Reply-To: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
<37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
Message-ID: <200710261707.11239.harry.mangalam@uci.edu>
And not to start a 'grep' war, but nrgrep and agrep are also faster
than either fgrep or grep and allow searching with errors. nrgrep
decays more smoothly with more complex patterns but agrep has more
implementations.
Both allow searching across arbitrary boundaries.
Both are free.
hjm
On Wednesday 24 October 2007, Dr. Christoph Gille wrote:
> You could use fgrep. Fgrep is faster than grep.
>
> Before, you should transform the database file such that
> each sequence takes one line without blank and without
> line breaks (using tr and sed)
>
> Database files are optimized for hole cards for
> historical reasons. Lines are wrapped after at least after 72
> characters, preventing the use of fgrep.
>
>
>
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
--
Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway,
UC Irvine 92697 949 824 0084(o), 949 285 4487(c)
harry.mangalam at uci.edu
From me.lixue at gmail.com Sat Oct 27 00:14:06 2007
From: me.lixue at gmail.com (Xue Li)
Date: Fri, 26 Oct 2007 23:14:06 -0500
Subject: [BiO BB] need help on HMM(Hidden Markov Model) package
Message-ID: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com>
Hello all,
Does anyone know some good HMM(Hidden Markov Model) package? It would be
perfect if it is written in Perl, or can be called in Perl.
Thank a lot!
--
Xue, Li
Bioinformatics and Computational Biology program @ ISU
Ames, IA 50010
515-450-7183
From landman at scalableinformatics.com Sat Oct 27 00:20:14 2007
From: landman at scalableinformatics.com (Joe Landman)
Date: Sat, 27 Oct 2007 00:20:14 -0400
Subject: [BiO BB] need help on HMM(Hidden Markov Model) package
In-Reply-To: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com>
References: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com>
Message-ID: <4722BC7E.8010207@scalableinformatics.com>
Xue Li wrote:
> Hello all,
>
> Does anyone know some good HMM(Hidden Markov Model) package? It would be
> perfect if it is written in Perl, or can be called in Perl.
HMMer (http://hmmer.janelia.org/) is one of the standard tools, and it
can be used from within BioPerl (http://www.bioperl.org/wiki/Main_Page)
fairly easily (see
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SearchIO/hmmer.html
)
> Thank a lot!
Joe
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615