User:DanBolser
From Bioinformatics.Org Wiki
Hi,
Feel free to Speak!
I my spare time I work on MetaBase.
See my profile on OpenWetWare
I was asked by Jeff to create a profile on Bioinformatics.Org, which you can find here.
Contents |
Bioinformatics.Org Wiki TODO
- Test the rewrite rules for shifting the old URL system to the new one here: User:Dan/RewriteRule test links.
Bioinformatics.Org Wiki TODO
Reorganizing content
- CATEGORIES
- Seems they were never used.
- Structure data
- As we will soon have SMW installed, we can start to organise some of the database like data in the wiki. The first thing to do will be books because there is an existing well defined template used to store book data. Next we can look at software.
New sections to add
- Projects
- Every project should have a wiki page. Categorize 'featured projects'.
- Grants
- Some pages related to grant proposals.
- Papers
- Some pages related to paper proposals.
Data to import...
I had planned to import a list of software for NGS into the wiki here (specifically using SMW). However, I now see that the list is already in an impressive wiki (the time that one click can save!). So now I need to decide what to do... The site is being maintained by a community of users, so what are the advantages of dumping the data here?
https://wiki.nbic.nl/index.php/High_throughput_sequencing#Software
Extensions to look at
- Semantic sign up
- OpenID
CODE
Book hack
I used this to tidy up the value of the 'title' field of the ISBN template (137 pages). It strips out the formatting from that field in the template call and defines a new field 'subtitle'. This was done because the template should do the formatting internally. The user should not pass formatted text to the template. This was necessary to integrate the data with Semantic MediaWiki (SMW). The next step is to apply similar clean up to other fields and annotate them with semantic properties.
#!/usr/bin/perl -w
## a
use strict;
use Data::Dumper;
use MediaWiki::API;
my $mw = MediaWiki::API->new();
$mw->{config}->{api_url} =
'http://www.bioinformatics.org/w/api.php';
# log in
$mw->login( { lgname => 'Dan Bolser Bot',
lgpassword => 'xxxxxx' } )
|| die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
## Make the above error code a bit more convenient...
$mw->{config}->{on_error} = sub {
print "Error code: ". $mw->{error}->{code}. "\n";
print "Details: ". $mw->{error}->{details}. "\n\n";
print $mw->{error}->{stacktrace}."\n";
die;
};
## Get a list of pages in the Book Category (fewer than 500, so we
## don't need to worry about the size of the results set.
## For an explanation of the query parameters, see:
## http://www.mediawiki.org/wiki/API
## http://www.mediawiki.org/wiki/API:Query_-_Lists
my $data =
$mw->list({ action => 'query',
list => 'categorymembers',
cmtitle => 'Category:Book',
#cmlimit => 3
},
{ max => 20 }
);
## Get 'cmlimit' pages at a time, with a 'max' of 20 batches
warn "processing ", scalar(@$data), " pages\n";
for my $x (@$data){
my $title = $x->{'title'};
print $title, "\n";
my $page =
$mw->get_page( { title => $title } );
my $text = $page->{'*'};
## Try to grab the title parameter from the ISBN template call.
## Note! This regexp is not correct in the general case! It happens
## to work given the style of template layout used here!
$text =~
/{{ISBN(.*?)^\| title = (.*?)$/sm
or die "Fail : '$text'\n";
my $isbnTitle = $2;
#print $isbnTitle, "\n";
$isbnTitle =~
/^'''\[\[Special:Booksources&isbn=.*?\|(.*?)\]\]'''(?:<br \/>(.*?))?$/
or die "Fail : '$isbnTitle'\n";
my $newIsbnTitle = "| title = $1\n| subtitle = ". ($2 || ''); # ' # Geshi bashi!
#print $newIsbnTitle, "\n";
## OK, now we match in one step, with string replacement
$text =~
s/{{ISBN(.*?)^\| title = '''\[\[Special:Booksources&isbn=.*?\|(.*?)\]\]'''(?:<br \/>(.*?))?$/{{Book$1$newIsbnTitle/sm
or die "Epic fail\n";
#print "'$text'\n\n";
## Time to write the page back!
# to avoid edit conflicts
my $timestamp = $page->{timestamp};
$mw->edit({ action => 'edit',
title => $title,
basetimestamp => $timestamp,
text => $text,
bot => 'true',
summary => "Renaming template. Removing markup from the 'title' string (template handles formatting). Explicitly setting 'subtitle' field (template handles formatting)"
});
#exit;
}
warn "OK\n"
Code for categorizing the 'software' pages in the wiki
Without categories like these, the pages are practically invisible to the user.
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use MediaWiki::API;
use Data::Dumper;
# For debugging
my $force = 0;
my $verbose = 0;
GetOptions( "force" => \$force,
"verbose+" => \$verbose,
)
or die "problem with command line arguments\n";
# Start talking to the wiki
my $mw = MediaWiki::API->new();
$mw->{config}->{api_url} =
'http://www.bioinformatics.org/w/api.php';
# log in
$mw->login( { lgname => 'Dan Bolser Bot',
lgpassword => 'xxxxxx' } )
|| die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
## Make the above error code a bit more convenient...
$mw->{config}->{on_error} = sub {
print "Error code: ". $mw->{error}->{code}. "\n";
print "Details: ". $mw->{error}->{details}. "\n\n";
print $mw->{error}->{stacktrace}."\n";
die;
};
warn "logged in\n";
## Get a list pages based on links to a page certain other page
my $list =
$mw->list ( { action => 'query',
list => 'backlinks',
bltitle => 'Software',
bllimit => '250'
},
{ max => 4 }
);
print "got ", scalar(@$list), " pages\n";
# Process the page list
for my $page (@$list){
#print Dumper $page, "\n";
# Ignore redirects
next if exists $page->{'redirect'};
# Get the page name
my $pagename = $page->{title};
# Get the page
my $mwPage =
$mw->get_page( { title => $pagename } );
# Get the page text
my $text = $mwPage->{'*'};
#print $mwPage->{'*'}, "\n";
# Does this look like one of Jeff's pages?
if($text =~ /^==Description==$/sm &&
$text =~ /^==Home page==$/sm &&
$text =~ /^==Supported platforms==$/sm &&
$text =~ /^==Documentation==$/sm &&
$text =~ /^==See also==$/sm){
# Make sure we are not duplicating our efforts...
next if $text =~ /^\[\[Category:Software\]\]$/sm;
warn "updating $pagename\n"
if $verbose > 0;
# Avoid edit conflicts
my $timestamp = $mwPage->{timestamp};
$mw->edit( { action => 'edit',
title => $pagename,
# Avoid edit conflicts
basetimestamp => $timestamp,
text => $text. "\n\n[[Category:Software]]",
bot => ''
} );
warn "OK\n"
if $verbose > 0;
#exit;
}
else{
warn "not touching $pagename\n";
}
#exit;
}
Journal attack
#!/usr/bin/perl -w use strict; use Getopt::Long; use MediaWiki::API; use Data::Dumper; # For debugging my $force = 0; my $verbose = 0; GetOptions( "force" => \$force, "verbose+" => \$verbose, ) or die "problem with command line arguments\n"; # Start talking to the wiki my $mw = MediaWiki::API->new(); $mw->{config}->{api_url} = 'http://www.bioinformatics.org/w/api.php'; # log in $mw->login( { lgname => 'Dan Bolser Bot', lgpassword => '000006' } ) || die $mw->{error}->{code} . ': ' . $mw->{error}->{details}; ## Make the above error code a bit more convenient... $mw->{config}->{on_error} = sub { print "Error code: ". $mw->{error}->{code}. "\n"; print "Details: ". $mw->{error}->{details}. "\n\n"; print $mw->{error}->{stacktrace}."\n"; die; }; warn "logged in\n"; ## Process the journal list (taken from ## http://www.bioinformatics.org/wiki/Journals) open( J, '<', 'journal_list.tab' ) or die "wuz that?\n"; while( <J> ){ chomp; my @data = split(/\t/, $_); die "problem\n" unless @data == 6; my $title = $data[1]; warn "doing '$title'\n"; ## Minor reformatting $data[2] = '' if $data[2] eq 'n/a'; ## Get a page object my $page = $mw-> get_page( { title => $title } ); ## Test if this page exists in the wiki if($page && !defined($page->{'missing'})){ warn "page already exists in the wiki\n\n"; next unless $force == 1; #sleep 5; } warn "creating '$title'\n\n"; ## Prepare the page text my $text = <<EOT; {{journal |influence=$data[2] |copyright=$data[3] |access=$data[4] |publication charge=$data[5] |homepage=$data[0] }} EOT ## Write the page text to the wiki ## Using timestamp avoids edit conflicts my $timestamp = $page->{timestamp}; $mw->edit( { action => 'edit', title => $title, # to avoid edit conflicts basetimestamp => $timestamp, text => $text, bot => '', reason => 'automatic import', } ); # Debugging #exit; } warn "OK\n";