User:DanBolser

From Bioinformatics.Org Wiki

Jump to: navigation, search

Hi,

Feel free to Speak!

I my spare time I work on MetaBase.

See my profile on OpenWetWare

I was asked by Jeff to create a profile on Bioinformatics.Org, which you can find here.

Bioinformatics.Org Wiki TODO

Test the rewrite rules for shifting the old URL system to the new one here: User:Dan/RewriteRule test links.

Bioinformatics.Org Wiki TODO

Reorganizing content

CATEGORIES: Seems they were never used.
Structure data: As we will soon have SMW installed, we can start to organise some of the database like data in the wiki. The first thing to do will be books because there is an existing well defined template used to store book data. Next we can look at software.

New sections to add

Projects: Every project should have a wiki page. Categorize 'featured projects'.
Grants: Some pages related to grant proposals.
Papers: Some pages related to paper proposals.

Data to import...

I had planned to import a list of software for NGS into the wiki here (specifically using SMW). However, I now see that the list is already in an impressive wiki (the time that one click can save!). So now I need to decide what to do... The site is being maintained by a community of users, so what are the advantages of dumping the data here?

https://wiki.nbic.nl/index.php/High_throughput_sequencing#Software

Extensions to look at

Semantic sign up
OpenID

CODE

Book hack

I used this to tidy up the value of the 'title' field of the ISBN template (137 pages). It strips out the formatting from that field in the template call and defines a new field 'subtitle'. This was done because the template should do the formatting internally. The user should not pass formatted text to the template. This was necessary to integrate the data with Semantic MediaWiki (SMW). The next step is to apply similar clean up to other fields and annotate them with semantic properties.

```
#!/usr/bin/perl -w
```
```
 
```
```
## a
```
```
 
```
```
use strict;
```
```
use Data::Dumper;
```
```
 
```
```
use MediaWiki::API;
```
```
 
```
```
my $mw = MediaWiki::API->new();
```
```
 
```
```
$mw->{config}->{api_url} =
```

  'http://www.bioinformatics.org/w/api.php';

```
 
```
```
# log in
```

$mw->login( { lgname => 'Dan Bolser Bot',

```
	      lgpassword => 'xxxxxx' } )
```

  || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};

```
 
```

## Make the above error code a bit more convenient...

```
 
```
```
$mw->{config}->{on_error} = sub {
```

  print "Error code: ". $mw->{error}->{code}. "\n";

  print "Details: ". $mw->{error}->{details}. "\n\n";

  print $mw->{error}->{stacktrace}."\n";

```
  die;
```
```
};
```
```
 
```
```
 
```

## Get a list of pages in the Book Category (fewer than 500, so we

## don't need to worry about the size of the results set.

```
 
```

## For an explanation of the query parameters, see:

```
## http://www.mediawiki.org/wiki/API
```

## http://www.mediawiki.org/wiki/API:Query_-_Lists

```
 
```
```
 
```
```
my $data =
```
```
  $mw->list({ action  => 'query',
```
```
	      list    => 'categorymembers',
```
```
	      cmtitle => 'Category:Book',
```
```
	      #cmlimit => 3
```
```
	    },
```
```
	    { max => 20 }
```
```
	   );
```
```
 
```

## Get 'cmlimit' pages at a time, with a 'max' of 20 batches

```
 
```

warn "processing ", scalar(@$data), " pages\n";

```
 
```
```
for my $x (@$data){
```
```
  my $title = $x->{'title'};
```
```
 
```
```
  print $title, "\n";
```
```
 
```
```
  my $page =
```

    $mw->get_page( { title => $title } );

```
 
```
```
  my $text = $page->{'*'};
```
```
 
```

  ## Try to grab the title parameter from the ISBN template call.

```
 
```

  ## Note! This regexp is not correct in the general case! It happens

  ## to work given the style of template layout used here!

```
 
```
```
  $text =~
```
```
    /{{ISBN(.*?)^\| title = (.*?)$/sm
```
```
      or die "Fail : '$text'\n";
```
```
 
```
```
  my $isbnTitle = $2;
```
```
  #print $isbnTitle, "\n";
```
```
 
```
```
  $isbnTitle =~
```

    /^'''\[\[Special:Booksources&isbn=.*?\|(.*?)\]\]'''(?:<br \/>(.*?))?$/

```
      or die "Fail : '$isbnTitle'\n";
```
```
 
```

  my $newIsbnTitle = "| title = $1\n| subtitle = ". ($2 || ''); # ' # Geshi bashi!

```
  #print $newIsbnTitle, "\n";
```
```
 
```
```
 
```
```
 
```

  ## OK, now we match in one step, with string replacement

```
 
```
```
  $text =~
```

    s/{{ISBN(.*?)^\| title = '''\[\[Special:Booksources&isbn=.*?\|(.*?)\]\]'''(?:<br \/>(.*?))?$/{{Book$1$newIsbnTitle/sm

```
      or die "Epic fail\n";
```
```
 
```
```
  #print "'$text'\n\n";
```
```
 
```
```
 
```
```
  ## Time to write the page back!
```
```
 
```
```
  # to avoid edit conflicts
```
```
  my $timestamp = $page->{timestamp};
```
```
 
```
```
  $mw->edit({ action => 'edit',
```
```
	      title  => $title,
```
```
	      basetimestamp => $timestamp,
```
```
	      text => $text,
```
```
	      bot => 'true',
```

	      summary => "Renaming template. Removing markup from the 'title' string (template handles formatting). Explicitly setting 'subtitle' field (template handles formatting)"

```
	    });
```
```
 
```
```
 
```
```
  #exit;
```
```
}
```
```
 
```
```
 
```
```
warn "OK\n"
```

Code for categorizing the 'software' pages in the wiki

Without categories like these, the pages are practically invisible to the user.

```
#!/usr/bin/perl -w
```
```
 
```
```
use strict;
```
```
use Getopt::Long;
```
```
use MediaWiki::API;
```
```
 
```
```
use Data::Dumper;
```
```
 
```
```
 
```
```
 
```
```
# For debugging
```
```
my $force = 0;
```
```
my $verbose = 0;
```
```
 
```
```
GetOptions( "force" => \$force,
```
```
	    "verbose+" => \$verbose,
```
```
	  )
```

  or die "problem with command line arguments\n";

```
 
```
```
 
```
```
 
```
```
# Start talking to the wiki
```
```
 
```
```
my $mw = MediaWiki::API->new();
```
```
 
```
```
$mw->{config}->{api_url} =
```

  'http://www.bioinformatics.org/w/api.php';

```
 
```
```
# log in
```

$mw->login( { lgname => 'Dan Bolser Bot',

```
	      lgpassword => 'xxxxxx' } )
```

  || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};

```
 
```
```
 
```
```
 
```

## Make the above error code a bit more convenient...

```
 
```
```
$mw->{config}->{on_error} = sub {
```

  print "Error code: ". $mw->{error}->{code}. "\n";

  print "Details: ". $mw->{error}->{details}. "\n\n";

  print $mw->{error}->{stacktrace}."\n";

```
  die;
```
```
};
```
```
 
```
```
warn "logged in\n";
```
```
 
```
```
 
```
```
 
```

## Get a list pages based on links to a page certain other page

```
 
```
```
my $list =
```
```
  $mw->list ( { action  => 'query',
```
```
		list    => 'backlinks',
```
```
		bltitle => 'Software',
```
```
		bllimit => '250'
```
```
	      },
```
```
	      { max => 4 }
```
```
	    );
```
```
 
```

print "got ", scalar(@$list), " pages\n";

```
 
```
```
 
```
```
 
```
```
# Process the page list
```
```
 
```
```
for my $page (@$list){
```
```
  #print Dumper $page, "\n";
```
```
 
```
```
  # Ignore redirects
```
```
  next if exists $page->{'redirect'};
```
```
 
```
```
  # Get the page name
```
```
  my $pagename = $page->{title};
```
```
 
```
```
  # Get the page
```
```
  my $mwPage =
```

    $mw->get_page( { title => $pagename } );

```
 
```
```
  # Get the page text
```
```
  my $text = $mwPage->{'*'};
```
```
 
```
```
  #print $mwPage->{'*'}, "\n";
```
```
 
```
```
 
```
```
 
```

  # Does this look like one of Jeff's pages?

```
  if($text =~ /^==Description==$/sm &&
```
```
     $text =~ /^==Home page==$/sm &&
```

     $text =~ /^==Supported platforms==$/sm &&

     $text =~ /^==Documentation==$/sm &&

```
     $text =~ /^==See also==$/sm){
```
```
 
```

    # Make sure we are not duplicating our efforts...

    next if $text =~ /^\[\[Category:Software\]\]$/sm;

```
 
```
```
    warn "updating $pagename\n"
```
```
      if $verbose > 0;
```
```
 
```
```
    # Avoid edit conflicts
```

    my $timestamp = $mwPage->{timestamp};

```
 
```
```
    $mw->edit( { action => 'edit',
```
```
		 title => $pagename,
```
```
		 # Avoid edit conflicts
```
```
		 basetimestamp => $timestamp,
```

		 text => $text. "\n\n[[Category:Software]]",

```
		 bot => ''
```
```
	       } );
```
```
 
```
```
    warn "OK\n"
```
```
      if $verbose > 0;
```
```
    #exit;
```
```
  }
```
```
  else{
```
```
    warn "not touching $pagename\n";
```
```
  }
```
```
 
```
```
  #exit;
```
```
}
```

Journal attack

#!/usr/bin/perl -w
 
use strict;
use Getopt::Long;
use MediaWiki::API;
 
use Data::Dumper;
 
 
 
# For debugging
my $force = 0;
my $verbose = 0;
 
GetOptions( "force" => \$force,
	    "verbose+" => \$verbose,
	  )
  or die "problem with command line arguments\n";
 
 
 
# Start talking to the wiki
 
my $mw = MediaWiki::API->new();
 
$mw->{config}->{api_url} =
  'http://www.bioinformatics.org/w/api.php';
 
# log in
$mw->login( { lgname => 'Dan Bolser Bot',
	      lgpassword => '000006' } )
  || die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
 
 
 
## Make the above error code a bit more convenient...
 
$mw->{config}->{on_error} = sub {
  print "Error code: ". $mw->{error}->{code}. "\n";
  print "Details: ". $mw->{error}->{details}. "\n\n";
  print $mw->{error}->{stacktrace}."\n";
  die;
};
 
warn "logged in\n";
 
 
 
## Process the journal list (taken from
## http://www.bioinformatics.org/wiki/Journals)
 
open( J, '<', 'journal_list.tab' )
  or die "wuz that?\n";
 
while( <J> ){
  chomp;
 
  my @data =
    split(/\t/, $_);
 
  die "problem\n"
    unless @data == 6;
 
  my $title = $data[1];
  warn "doing '$title'\n";
 
  ## Minor reformatting
  $data[2] = '' if $data[2] eq 'n/a';
 
 
 
  ## Get a page object
 
  my $page = $mw->
    get_page( { title => $title } );
 
 
 
  ## Test if this page exists in the wiki
 
  if($page && !defined($page->{'missing'})){
    warn "page already exists in the wiki\n\n";
    next unless
      $force == 1;
    #sleep 5;
  }
  warn "creating '$title'\n\n";
 
  ## Prepare the page text
  my $text = <<EOT;
{{journal
 |influence=$data[2]
 |copyright=$data[3]
 |access=$data[4]
 |publication charge=$data[5]
 |homepage=$data[0]
}}
EOT
 
 
 
  ## Write the page text to the wiki
 
  ## Using timestamp avoids edit conflicts
  my $timestamp = $page->{timestamp};
 
  $mw->edit( { action => 'edit',
	       title => $title,
	       # to avoid edit conflicts
	       basetimestamp => $timestamp,
	       text => $text,
	       bot => '',
	       reason => 'automatic import',
	     } );
 
  # Debugging
  #exit;
}
 
 
 
warn "OK\n";

User:DanBolser

From Bioinformatics.Org Wiki

Contents

Bioinformatics.Org Wiki TODO

Bioinformatics.Org Wiki TODO

Reorganizing content

New sections to add

Data to import...

Extensions to look at

CODE

Book hack

Code for categorizing the 'software' pages in the wiki

Journal attack

Personal tools

Namespaces

Variants

Views

Actions

Search

wiki navigation

Toolbox