User:DanBolser

From Bioinformatics.Org Wiki

Jump to: navigation, search

Hi,

Feel free to Speak!

I my spare time I work on MetaBase.

See my profile on OpenWetWare

I was asked by Jeff to create a profile on Bioinformatics.Org, which you can find here.


Contents

Bioinformatics.Org Wiki TODO

Bioinformatics.Org Wiki TODO

Reorganizing content

CATEGORIES 
Seems they were never used.
Structure data 
As we will soon have SMW installed, we can start to organise some of the database like data in the wiki. The first thing to do will be books because there is an existing well defined template used to store book data. Next we can look at software.

New sections to add

Projects 
Every project should have a wiki page. Categorize 'featured projects'.
Grants 
Some pages related to grant proposals.
Papers 
Some pages related to paper proposals.

Data to import...

I had planned to import a list of software for NGS into the wiki here (specifically using SMW). However, I now see that the list is already in an impressive wiki (the time that one click can save!). So now I need to decide what to do... The site is being maintained by a community of users, so what are the advantages of dumping the data here?

https://wiki.nbic.nl/index.php/High_throughput_sequencing#Software


Extensions to look at


CODE

Book hack

I used this to tidy up the value of the 'title' field of the ISBN template (137 pages). It strips out the formatting from that field in the template call and defines a new field 'subtitle'. This was done because the template should do the formatting internally. The user should not pass formatted text to the template. This was necessary to integrate the data with Semantic MediaWiki (SMW). The next step is to apply similar clean up to other fields and annotate them with semantic properties.

  1. #!/usr/bin/perl -w
  2.  
  3. ## a
  4.  
  5. use strict;
  6. use Data::Dumper;
  7.  
  8. use MediaWiki::API;
  9.  
  10. my $mw = MediaWiki::API->new();
  11.  
  12. $mw->{config}->{api_url} =
  13.   'http://www.bioinformatics.org/w/api.php';
  14.  
  15. # log in
  16. $mw->login( { lgname => 'Dan Bolser Bot',
  17. 	      lgpassword => 'xxxxxx' } )
  18.   || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
  19.  
  20. ## Make the above error code a bit more convenient...
  21.  
  22. $mw->{config}->{on_error} = sub {
  23.   print "Error code: ". $mw->{error}->{code}. "\n";
  24.   print "Details: ". $mw->{error}->{details}. "\n\n";
  25.   print $mw->{error}->{stacktrace}."\n";
  26.   die;
  27. };
  28.  
  29.  
  30. ## Get a list of pages in the Book Category (fewer than 500, so we
  31. ## don't need to worry about the size of the results set.
  32.  
  33. ## For an explanation of the query parameters, see:
  34. ## http://www.mediawiki.org/wiki/API
  35. ## http://www.mediawiki.org/wiki/API:Query_-_Lists
  36.  
  37.  
  38. my $data =
  39.   $mw->list({ action  => 'query',
  40. 	      list    => 'categorymembers',
  41. 	      cmtitle => 'Category:Book',
  42. 	      #cmlimit => 3
  43. 	    },
  44. 	    { max => 20 }
  45. 	   );
  46.  
  47. ## Get 'cmlimit' pages at a time, with a 'max' of 20 batches
  48.  
  49. warn "processing ", scalar(@$data), " pages\n";
  50.  
  51. for my $x (@$data){
  52.   my $title = $x->{'title'};
  53.  
  54.   print $title, "\n";
  55.  
  56.   my $page =
  57.     $mw->get_page( { title => $title } );
  58.  
  59.   my $text = $page->{'*'};
  60.  
  61.   ## Try to grab the title parameter from the ISBN template call.
  62.  
  63.   ## Note! This regexp is not correct in the general case! It happens
  64.   ## to work given the style of template layout used here!
  65.  
  66.   $text =~
  67.     /{{ISBN(.*?)^\| title = (.*?)$/sm
  68.       or die "Fail : '$text'\n";
  69.  
  70.   my $isbnTitle = $2;
  71.   #print $isbnTitle, "\n";
  72.  
  73.   $isbnTitle =~
  74.     /^'''\[\[Special:Booksources&isbn=.*?\|(.*?)\]\]'''(?:<br \/>(.*?))?$/
  75.       or die "Fail : '$isbnTitle'\n";
  76.  
  77.   my $newIsbnTitle = "| title = $1\n| subtitle = ". ($2 || ''); # ' # Geshi bashi!
  78.   #print $newIsbnTitle, "\n";
  79.  
  80.  
  81.  
  82.   ## OK, now we match in one step, with string replacement
  83.  
  84.   $text =~
  85.     s/{{ISBN(.*?)^\| title = '''\[\[Special:Booksources&isbn=.*?\|(.*?)\]\]'''(?:<br \/>(.*?))?$/{{Book$1$newIsbnTitle/sm
  86.       or die "Epic fail\n";
  87.  
  88.   #print "'$text'\n\n";
  89.  
  90.  
  91.   ## Time to write the page back!
  92.  
  93.   # to avoid edit conflicts
  94.   my $timestamp = $page->{timestamp};
  95.  
  96.   $mw->edit({ action => 'edit',
  97. 	      title  => $title,
  98. 	      basetimestamp => $timestamp,
  99. 	      text => $text,
  100. 	      bot => 'true',
  101. 	      summary => "Renaming template. Removing markup from the 'title' string (template handles formatting). Explicitly setting 'subtitle' field (template handles formatting)"
  102. 	    });
  103.  
  104.  
  105.   #exit;
  106. }
  107.  
  108.  
  109. warn "OK\n"


Code for categorizing the 'software' pages in the wiki

Without categories like these, the pages are practically invisible to the user.

  1. #!/usr/bin/perl -w
  2.  
  3. use strict;
  4. use Getopt::Long;
  5. use MediaWiki::API;
  6.  
  7. use Data::Dumper;
  8.  
  9.  
  10.  
  11. # For debugging
  12. my $force = 0;
  13. my $verbose = 0;
  14.  
  15. GetOptions( "force" => \$force,
  16. 	    "verbose+" => \$verbose,
  17. 	  )
  18.   or die "problem with command line arguments\n";
  19.  
  20.  
  21.  
  22. # Start talking to the wiki
  23.  
  24. my $mw = MediaWiki::API->new();
  25.  
  26. $mw->{config}->{api_url} =
  27.   'http://www.bioinformatics.org/w/api.php';
  28.  
  29. # log in
  30. $mw->login( { lgname => 'Dan Bolser Bot',
  31. 	      lgpassword => 'xxxxxx' } )
  32.   || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
  33.  
  34.  
  35.  
  36. ## Make the above error code a bit more convenient...
  37.  
  38. $mw->{config}->{on_error} = sub {
  39.   print "Error code: ". $mw->{error}->{code}. "\n";
  40.   print "Details: ". $mw->{error}->{details}. "\n\n";
  41.   print $mw->{error}->{stacktrace}."\n";
  42.   die;
  43. };
  44.  
  45. warn "logged in\n";
  46.  
  47.  
  48.  
  49. ## Get a list pages based on links to a page certain other page
  50.  
  51. my $list =
  52.   $mw->list ( { action  => 'query',
  53. 		list    => 'backlinks',
  54. 		bltitle => 'Software',
  55. 		bllimit => '250'
  56. 	      },
  57. 	      { max => 4 }
  58. 	    );
  59.  
  60. print "got ", scalar(@$list), " pages\n";
  61.  
  62.  
  63.  
  64. # Process the page list
  65.  
  66. for my $page (@$list){
  67.   #print Dumper $page, "\n";
  68.  
  69.   # Ignore redirects
  70.   next if exists $page->{'redirect'};
  71.  
  72.   # Get the page name
  73.   my $pagename = $page->{title};
  74.  
  75.   # Get the page
  76.   my $mwPage =
  77.     $mw->get_page( { title => $pagename } );
  78.  
  79.   # Get the page text
  80.   my $text = $mwPage->{'*'};
  81.  
  82.   #print $mwPage->{'*'}, "\n";
  83.  
  84.  
  85.  
  86.   # Does this look like one of Jeff's pages?
  87.   if($text =~ /^==Description==$/sm &&
  88.      $text =~ /^==Home page==$/sm &&
  89.      $text =~ /^==Supported platforms==$/sm &&
  90.      $text =~ /^==Documentation==$/sm &&
  91.      $text =~ /^==See also==$/sm){
  92.  
  93.     # Make sure we are not duplicating our efforts...
  94.     next if $text =~ /^\[\[Category:Software\]\]$/sm;
  95.  
  96.     warn "updating $pagename\n"
  97.       if $verbose > 0;
  98.  
  99.     # Avoid edit conflicts
  100.     my $timestamp = $mwPage->{timestamp};
  101.  
  102.     $mw->edit( { action => 'edit',
  103. 		 title => $pagename,
  104. 		 # Avoid edit conflicts
  105. 		 basetimestamp => $timestamp,
  106. 		 text => $text. "\n\n[[Category:Software]]",
  107. 		 bot => ''
  108. 	       } );
  109.  
  110.     warn "OK\n"
  111.       if $verbose > 0;
  112.     #exit;
  113.   }
  114.   else{
  115.     warn "not touching $pagename\n";
  116.   }
  117.  
  118.   #exit;
  119. }

Journal attack

#!/usr/bin/perl -w
 
use strict;
use Getopt::Long;
use MediaWiki::API;
 
use Data::Dumper;
 
 
 
# For debugging
my $force = 0;
my $verbose = 0;
 
GetOptions( "force" => \$force,
	    "verbose+" => \$verbose,
	  )
  or die "problem with command line arguments\n";
 
 
 
# Start talking to the wiki
 
my $mw = MediaWiki::API->new();
 
$mw->{config}->{api_url} =
  'http://www.bioinformatics.org/w/api.php';
 
# log in
$mw->login( { lgname => 'Dan Bolser Bot',
	      lgpassword => '000006' } )
  || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
 
 
 
## Make the above error code a bit more convenient...
 
$mw->{config}->{on_error} = sub {
  print "Error code: ". $mw->{error}->{code}. "\n";
  print "Details: ". $mw->{error}->{details}. "\n\n";
  print $mw->{error}->{stacktrace}."\n";
  die;
};
 
warn "logged in\n";
 
 
 
## Process the journal list (taken from
## http://www.bioinformatics.org/wiki/Journals)
 
open( J, '<', 'journal_list.tab' )
  or die "wuz that?\n";
 
while( <J> ){
  chomp;
 
  my @data =
    split(/\t/, $_);
 
  die "problem\n"
    unless @data == 6;
 
  my $title = $data[1];
  warn "doing '$title'\n";
 
  ## Minor reformatting
  $data[2] = '' if $data[2] eq 'n/a';
 
 
 
  ## Get a page object
 
  my $page = $mw->
    get_page( { title => $title } );
 
 
 
  ## Test if this page exists in the wiki
 
  if($page && !defined($page->{'missing'})){
    warn "page already exists in the wiki\n\n";
    next unless
      $force == 1;
    #sleep 5;
  }
  warn "creating '$title'\n\n";
 
  ## Prepare the page text
  my $text = <<EOT;
{{journal
 |influence=$data[2]
 |copyright=$data[3]
 |access=$data[4]
 |publication charge=$data[5]
 |homepage=$data[0]
}}
EOT
 
 
 
  ## Write the page text to the wiki
 
  ## Using timestamp avoids edit conflicts
  my $timestamp = $page->{timestamp};
 
  $mw->edit( { action => 'edit',
	       title => $title,
	       # to avoid edit conflicts
	       basetimestamp => $timestamp,
	       text => $text,
	       bot => '',
	       reason => 'automatic import',
	     } );
 
  # Debugging
  #exit;
}
 
 
 
warn "OK\n";
Personal tools
Namespaces
Variants
Actions
wiki navigation
Toolbox