Friday 2 August 2013

Migrating from InterProScan v4.x to InterProScan v5

First, a bit of background..

It's been a while in development but finally we are ready to make InterProScan v5 the official release of InterProScan, meaning that v4.x will be retired.  The development of InterProScan 5 was motivated by several factors - both feedback from our users and internal needs to have a robust pipeline for keeping all the calculations of InterPro signatures against UniProtKB proteins up-to-date.

As explained on the Google Code wiki for InterProScan 5, there are numerous differences between the two versions, most notably the change from a Perl-based architecture to a Java-based one.  We've also simplified the installation process, added a new analysis type (Phobius), improved the output formats that users can get their results in and tailored this version to work on a large-scale better than ever before.

Migrating from v4 to v5


In order to help users transition from using v4 to v5, we have created a page on our Google Code wiki that explains what might need to be changed to migrate from using v4 to v5.  In the initial version of the page (which will be added to as and when we get requests from users for clarification) we describe changes to the command-line options available and the output formats generated.  This is mainly of interest to users who have downloaded InterProScan and installed it locally.

Changes will also be reflected in the EBI-hosted versions, with both the web interface and SOAP/REST web services changing to v5 shortly.  We will continue to run the hosted InterProScan v4.8 in parallel with the v5 whilst people swtich over but the older version will no longer be actively developed (including releases of data).

Timelines


It is our intention to officially launch InterProScan 5 with the next public release of the InterPro database, currently scheduled for 19th September 2013.  

If people have any questions about the above or suggestions, please contact us

Thursday 1 August 2013

InterPro 43.1 is released, fixing the previous data problems

Good news!

Last week we were able to finally release an update to release 43 to fix the problems we identified.

43.1 contains the same member databases as v43.0 (i.e. an upgrade to Pfam from v26.0 to v27.0 compared with the InterPro 42.0 release) but there are 2 main differences:
  1. The pre-calculated match information we supply via the InterPro website and the downloadable match_complete.xml file that is used by InterProScan is now correct
  2. Additional Pfam signatures have been integrated into InterPro entries since v43.0.  This is why in the release notes for 43.1, 13579/14831 Pfam signatures belong to an InterPro entry but in 43.0 only 13079/14831 signatures were included.  (Our curators have been busy!)
We  apologise again for this problem and the inconvenience it might have caused our users.  We have now put in place measures to ensure that this particular problem won't happen again.

Sarah

Monday 17 June 2013

InterPro is temporarily reverted to v42.0

An update on our progress.  We decided to revert the InterPro website back to the previous release's data (v42.0).  This means that the Pfam release that was incorporated into release v43.0 is no longer visible via the website, at least, until the fix is completed.  The full status of all our services is now as follows:

InterPro website

Currently displays v42.0 data - all protein match information visible on the site is now correct and can be used with confidence.  The version of Pfam that is visible is v26.0, however.

InterProScan5 (downloadable)

The InterProScan5 current version (RC6) was built against v42.0.  We hadn't built and distributed the version (RC7) that was for v43.0 of the data and so users are still safe using InterProScan5 RC6.

InterProScan4 (downloadable)

Standalone InterProScan4 (downloadable from our FTP site) had data released for v43.0 which included Pfam 27.0, however, it was only the match_complete.xml file that was affected by the data.  Users could either run their InterProScan4 installation with 43.0 data with the -nocrc option on the command-line or can download the data for release 42.0 from the FTP site (ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/DATA/iprscan_MATCH_DATA_42.0.tar.gz and ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/DATA/iprscan_DATA_42.0.tar.gz) and revert back to that version.

InterProScan4 (EBI-hosted)

InterProScan4 is currently running using InterPro release 42.0 data and can therefore be used with confidence.  The version of Pfam included is v26.0.

Next steps

We will hopefully make a new public release next week which will contain Pfam 27.0 and correct protein match information to the website.  Updates to InterProScan v4 data and InterProScan 5 (RC7) will follow shortly afterwards.  These updates will be announced on the twitter feed and mailing lists as v43.1

Again, many thanks for your patience whilst we sort out these issues.

Friday 14 June 2013

Update on fix to InterPro 43.0

We're still working on fixing release 43.0 and we are aiming to release a fixed version (v43.1) next week.  We're sorry it's taking so long to sort out but we are working hard to do so.  It's highly likely we'll temporarily revert the public data to release 42.0 if we've not fixed 43 by Monday.

InterProScan

We have had some questions about the use of InterProScan

Users of InterProScan v5 will be pleased to know that we noticed the problem with 43.0 before we had updated I5, therefore, all the data coming from InterProScan 5 should be correct and you can use it with confidence.

InterProScan4, however, is affected by this problem if you have not used "-nocrc" option on the commandline of the standalone version, or if you have used the EBI-hosted version without specifying "-nocrc".   Running InterProScan 4 with the lookup disabled (using "-nocrc") will not use the problemmatic dataset, and so the results should be OK.

Once again, we apologise for any inconvenience this might have caused our users.

Monday 10 June 2013

Problem with InterPro release 43.0

For the first time, we've discovered a major problem with the match data generated for InterPro.

This has resulted in incorrect InterPro calculations for approximately 3 million protein sequences in the UniParc database - therefore, it is highly likely that a number of UniProtKB proteins will have incorrect match data visible in the InterPro web interface.  At the same time, we have noticed that some of the pathway mappings associated to InterPro entries (e.g. mappings of entries to KEGG, Reactome, etc.) are incorrect.

We are currently working to fix this problem and re-release the data as soon as possible.  Note that this potentially affects the data in both the InterPro website and InterProScan XML files.

We apologise for the inconvenience and will make a new announcement once the problem is fixed and the new data is available (it will be called InterPro v43.1)

Please let us know if you have any questions about the above by using our support channels.

Tuesday 26 March 2013

New (RESTful) interface for InterPro


In September 2012, InterPro launched a new look and feel for its website (http://www.ebi.ac.uk/interpro/).  We re-designed the pages based on feedback from our users (we did a lot of usability testing, user surveys and talking to people directly about what they wanted from our website).  We hope that, as a consequence, people have found it much easier to find the information they need and the site more enjoyable to use.
The redesigning of the website was almost entirely cosmetic and very little of the underlying codebase and database was changed before it was released. However, immediately after launching the re-styled site, we decided to re-factor the entire web application and its back-end database. This may seem like a back-to-front way to do things but in fact, doing things that way round meant we were confident that all of the data we were delivering through our web app was utilised in some way. Rather than using our production database schema (as in the past), instead we generated a new query-optimised warehouse, and a Spring MVC web application was built on top of that. The new web site that we launched on March 11th is faster, less prone to failures and much easier for the team to maintain (hurrah!) An additional happy consequence is that we've been able to produce RESTful URLs for the data in InterPro.  

For example, if you:


There are RESTful URLs for all of the entry and protein entity pages in InterPro.  Please take a look around the site and let us know if there are any features that you think we should add.  If you link to us already, please update to using these new URLs in your resource.

Sarah
on behalf of the InterPro team

Wednesday 20 March 2013

A new blog for InterPro

So many changes have been happening at InterPro recently, we thought it was time we explained a bit of what we've been doing.  And starting a blog seemed like the perfect way to do this.

So, from time to time, we'll be using this blog to explain what new features are coming in the InterPro website and InterProScan software; to highlight some of the content of the database and to give you a heads-up when we are going to be attending conferences, or giving training courses.

Most importantly, it gives our users a chance to directly feedback what they think of these things, adding to the number of channels you can use to contact us.

Currently, we have a twitter account and a number of mailing lists.  These are listed below.

Twitter: @InterProDB

interproscan-announce@ebi.ac.uk - used to announce InterPro releases and major changes.  Subscribe to this mailing list if you want to receive these kinds of updates directly to your inbox

interhelp@ebi.ac.uk or support@ebi.ac.uk - contact us with questions, suggestions or problems with using InterPro or InterProScan

We hope that our users will find this useful!

Sarah
on behalf of the InterPro team