MedlineR: an open source library in R for Medline literature data mining
Authors Simon M. Lin1,2, Patrick McConnell1, Kimberly F. Johnson1, and Jennifer Shoemaker1,2
1
Duke Comprehensive Cancer Center and 2Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC 27710
Abstract

 

We describe an open source library written in the R programming language for Medline literature data mining. This MedlineR library includes programs to query Medline through the NCBI PubMed database; to construct the co-occurrence matrix; and to visualize the network topology of query terms. The open source nature of this library allows users to extend it freely in the statistical programming language of R. To demonstrate its utility, we have built an application to analyze term-association by using only ten lines of code. We provide MedlineR as a library foundation for bioinformaticians and statisticians to build more sophisticated literature data mining applications.

Correspondence to Simon Lin, M.D.
Duke University Medical Center, Box 3958, Durham, NC 27710
Phone: 919-681-9646   Email: Lin00025(at)mc.duke.edu
Publication URL Link to the journal's website. TBA
PubMed URL TBA
Publication Citation Lin et al., MedlineR: an open source library in R for Medline literature data mining, 2004 (submitted)
Keywords literature data mining, Medline, Pubmed, co-occurrence
 
Download the Source Code
System Requirements
To run the MedlineR library, you need:
  • R version 1.6 or above
  • XML library in R
  • Pajek for visualization 
  • Internet connection
Installation
To maintain the transparency of the source code, we provide the following two simple methods of download and installation.
  • Method A
In R, simply type
          source ("http://dbsr.duke.edu/pub/MedlineR/MedlineR_v20.txt")
  • Method B
Download the following source code into a local directory, and then use the source( ) command in R to install it.
Developer's Corner: open source, open development
The following website is created to facilitate further development of this open source library. It includes the program schema, older releases, and bug tracking.
http://bioinformatics.org/project/?group_id=358

Supplemental Information
Files



 
Description File Name
Poster slides (PDF format)
MedlineR_slide.PDF
Source code (RTF format) MedlineR_v20.rtf
About this webpage
Created 2-22-04. Last updated 5-10-04. Hosted by the Duke Bioinformatics Shared Resource.

DBSR logo