Simon Twigger @ MCW

bioinformatics and related stuff

Configuring New BioMart instance with Taverna

By Simon • Dec 1st, 2005 • Category: Uncategorized

I recently played around with installing BioMart with a view to loading our own data and using the Taverna workflow tool as a way of integrating this data with other web services. I’ve documented the procedures and steps I took in the hope that this might be of use to others.

My installation was done on a Mac G5, dual processor, 4Gb RAM running OS 10.4 with MySQL 4.0.23-standard

Step 1 - Download and install BioMart

  • I wanted to install Uniprot so I FTP’d the current version of Uniprot Mart from the BioMart FTP site.
  • I then followed the BioMart installation instructions as written, creating a MySQL database called msd_mart_3 and importing the Uniprot table data into the database.
  • I had some annoying install issues related to file and directory permissions for the mysqlimport command so here’s what should work to avoid these:
  • download the various data files from the FTP site into a directory that is readable by the mysql user that you use to load the database, also make sure all the files are also readable by this user. Installing on OS X, I ended up moving the FTP’d directory and files to /tmp (readable by all) and making the directory and all the downloaded files world readable - chmod 666 *.*
  • gunzip *.gz
  • create the tables as described in Section 3.1.5 of the Biomart manual.
  • run the mysqlimport as described. Note that the files are all *.txt, not *.txt.table as in the documentation. If there are mysqlimport errors related to ‘cant stat’, error codes 2 or 13, refer to this page for help on the paths and permissions for mysqlimport to work: http://electrictoolbox.com/article/mysql/mysqlimport-errors/
  • On my machine (dual processor, 4Gb RAM) this all took a while to load but I didn’t have to set a tmpdir for MySQL to use for scratch space during the load process.

Step 2a - Test BioMart Installation

  • Having installed BioMart on my machine I wanted to test it to make sure it was functioning as expected.
  • I downloaded Martj, the java libarary and tools developed by the BioMart team and used the martexplorer to connect to the database. You add a new mart connection via the Settings > Add Mart menu option which brings up a configuration screen like the one below.

Martj Database Config

  • Make sure the user that you connect with has select privileges on your new mart database from the host that you are connecting from. The MySQL documentation gives more information about granting users appropriate access to databases from local and remote machines. If this isnt correct then you will get a connection error. If its available, you can test by using the mysql command line from the machine you are trying to connect from and trying to log in to the mart database directly. If this works then Martj should have no problem.
  • Once you have connected to the database using martexplorer you can create a new query and test the connection and see if you can get some data (see figure below).

Martj Database Query

Step 2b - Test BioMart Installation with a config file

• You can configure new sources for the martexplorer using the Add Mart menu option as described above and these connections are stored on your local machine (~/.martj_preferences on UNIX) so they are available when you use martexplorer later on.

  • another way to configure BioMart is using a registry configuration file, defaultMartRegistry.xml, which can be found in martj-0.3/data/defaultMartRegistry.xml
  • This isnt a required step as the connection we made above will work fine, however, to connect with Taverna you will need a configuration file like this so this is a good way to test it before you throw it into the Taverna environment.
  • When you first look at the file it looks like this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE MartRegistry>
<MartRegistry>

<RegistryDBLocation
name = “pointer”
databaseType = “mysql”
host = “martdb.ebi.ac.uk”
port = “3306″
instanceName = “central_registry”
user = “anonymous”
password = “”
/>
</MartRegistry>

  • This contains a RegistryDBLocation that points to the main BioMart registry at the EBI. You can add your new mart to this configuration file by adding a <DatabaseLocation> entry, for the example shown above it might look something like this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE MartRegistry>
<MartRegistry>

<RegistryDBLocation
name = “pointer”
databaseType = “mysql”
host = “martdb.ebi.ac.uk”
port = “3306″
instanceName = “central_registry”
user = “anonymous”
password = “”
/>

<DatabaseLocation
name = “biomart_test”
databaseType = “mysql”
host = “hostname.mcw.edu”
port = “3306″
instanceName = “msd_mart_3″
schema = “msd_mart_3″
user = “mysql_username_here”
password = “mysql_password_here”
includeDatasets = “uniprot”
default = “1″
visible = “1″
/>

</MartRegistry>

  • When I was testing my connection, the EBI MySQL servers were having some problems and the connection would hang so I ended up commenting out the RegistryDBLocation block, just leaving the DatabaseLocation block and this seemed to work fine.
  • By adding entries to the defaultMartRegistry.xml file you can now share this file with other users so they dont have to manually configure different datasources.
  • To test that its working, remove any files from ~/.mart_preferences that were added in the previous manual configuration step (they all begin with ‘.’ so use ls -a on a unix machine to see them)
  • Now we know the configuration file works as expected, you can move onto the final step of the process, connecting your Mart database to Taverna.

Step 3 - Linking your Mart into Taverna

  • We now have BioMart installed and working, we have a defaultMartRegistry.xml file that we know works for our installation, its now time to connect this all into the Taverna system.
  • There are two ways to do this, manually via the ‘Add new BioMart Instance’ option in the Available Services window, or by using the registry file you just created. The manual route works fine (Taverna 1.2, MacOSX) but has to be done each time you use Taverna, if you want to use a resource regularly, adding an entry to the configuration file is the best way to go.
  • To add this entry, find the mygrid.properties file (on Mac it can be found here:/Applications/Taverna1.2/Taverna1.2.app/Contents/taverna-workbench-1.0/conf/mygrid.properties) and find the DEFAULT SERVICES section and the line that reads:

taverna.defaultmartregistry = http://www.ebi.ac.uk/~tmo/defaultMartRegistry.xml

  • This is the existing default setting that points back to EBI and provides access to all the BioMart instances at EBI.
  • The file is just the same as the one we created above so if you have access to a webserver you can place your defaultMartRegistry.xml file in an accessible place on your server, alter the line in mygrid.properties to point to your file and then when you fire up Taverna, your Mart instances will be available for use.
  • It is also possible to append your own registry URL(s) to this line by comma separating them as shown below (see also the taverna.defaultwsdl configuration that is above the taverna.defaultmartregistry line in the mygrid.properties file)

taverna.defaultmartregistry = http://www.ebi.ac.uk/~tmo/defaultMartRegistry.xml, http://myurl.com/defaultMartRegistry.xml

You can find more information on configuring the registry file BioMart in Section 4.1 of the BioMart Documentation. The Taverna documentation has a lot more information on integrating Mart instances with workflows and other web services.

Simon is
Email this author | All posts by Simon

Leave a Reply