![]() |
|
International Union of Biological Sciences Taxonomic Databases Working Group |
|
1999 TDWG Meeting , Harvard Herbarium, Cambridge, USA, 29th - 31st October, 1999 Registration for the meeting was available from 12 until 5.30pm on Friday, 29th October, followed by a "mixer" reception, giving everyone the chance to meet fellow attendees and start the discussion! Saturday 30th October Mike Donoghue, Director of the Harvard University Herbaria, welcomed everyone to Harvard, and to the 1999 TDWG (Taxonomic Databases Working Group) Meeting. He thanked the organisers, led by David Boufford, and hoped that it would be a useful meeting. He felt that the topics under discussion were very important and the outcome was of general interest.
Stan Blum - TDWG and the Standards Process Stan Blum - TDWG and the Standards Process The main aim of the meeting was to clarify some general standards issues, such as the role of standards in interoperability, when recommendations might be more appropriate, and TDWG's general role in facilitating the development of standards and recommendations. To begin this, Stan Blum, from the California Academy of Sciences, gave a thought provoking presentation [link to Powerpoint presentation] about standards and where TDWG fits in [see also http://www.tdwg.org/process/tdwg99_blum.html]. He began by identifying TDWG's original aims, which have since been expanded to include all biological organisms. In the past TDWG has published several "standards" but these have not been widely accepted - perhaps because they are too specific, not well publicised and not easily available over the internet. After describing the different types of standards, Stan suggested ways in which TDWG could be useful in the development of more widely accepted standards, and how the organisation of TDWG could be improved.
Dave Vieglas - Integrating Disparate Biodiversity Resources using the Information Retrieval Standard Z39.50 Dave Vieglas, from the Museum of Natural History, University of Kansas, illustrated the use of a general information retrieval standard to retrieve biological information from very different resources - the Z39.50 search and retrieval protocol (maintained by the Library of Congress). This protocol is a stable standard which has been used for several years now, to transfer structured metadata (before html and xml). The Z39.50 server software presents different schemas so that they all look the same to the Z39.50 client. The server returns data generally in one of 3 record formats: SUTRS (unstructured text), MARC (library catalogue format) or GRS-1 (XML-like tree structure). There are some problems with the protocol (some areas are not well defined), but these are being worked on, and with the work of ZBIG, the Z39.50 Biology Implementers Group, the protocol now forms the basis of a number of projects focussed primarily on collections, and has led to the development of the "The Species Analyst", see http://habanero.nhm.ukans.edu/SpeciesAnalyst. Using this tool, it is possible to search several databases at once, and merge the results into applications such as MS Excel. A web interface has been developed and tests have been carried out with IT IS (Integrated Taxonomic Information System), Genbank and Zoological Record. There can be problems with the merged data - names may not necessarily mean the same in each database, and this is really the limiting factor for effective combination of collections databases. To help with this, the Taxonomic Authority system is being developed - there is a need for two distinct though interconnected databases: a name tracking database (derived from the printed literature); and a name grouping (classification) database. There are scientific name translations between these, to enable searches of local name databases using the local name. This "Z-Thes" profile is currently only partially implemented, and a major limitation is populating the taxonomic lineage database. What about Z39.50 versus http + XML? A well-defined stable standard versus a rapidly developing but widely implemented standard with immense industry support? Both return structured data, so both can co-exist. It is straightforward to encode Z39.50 in XML. So adoption of newer ("easier") technology (e.g. XML) does not preclude existing infrastructure.
John Rumble - The Standards Process John Rumble, from the National Institute of Standards and Technology, and currently President of CODATA (Committee on Data for Science and Technology, see http://www.codata.org), gave a concise presentation on the need for standards, and the standards process. His presentation can be viewed. He described how CODATA fit into this process, illustrating how it might be useful for TDWG to be associated with CODATA (thus beginning the debate of where TDWG should "sit" institutionally, to be more visible and effective). CODATA, a multi-national, interdisciplinary scientific committee of ICSU (International Council of Scientific Unions), has a lot of experience in the development of data reporting requirements. The "workings" of CODATA involve task groups, working groups and commissions. The next meeting will be in October 2000, and there is a quest for new working groups, so TDWG's interest is well timed.
Walter Berendsohn - TDWG & CODATA/GBIF Walter Berendsohn (Botanischer Garten und Botanisches Museum, Berlin) began by mentioning that the Global Plant Checklist project he was involved in was "sponsored" by CODATA and they had found it very useful in bringing people together. But he also wanted the meeting to consider possibilities with GBIF, the Global Biodiversity Information Facility (see http://www.york.biosis.org/gbif/index.htm for further details). One of their prime areas of concern was the creation of checklists, and digitisation of databases, and Walter thought it should be possible for TDWG to get secretarial support, and financial help to attend standardisation meetings etc. He asked for permission to raise this with GBIF officials, and there was general agreement that he should go ahead with this. [Top of page] The meeting continued after lunch with some presentations representing the Taxon x Character data "problem", experienced by users of packages such as DELTA, LucID, PAUP, etc. One aim of the meeting was that TDWG could facilitate a dialogue among the software developers. Mike Dallwitz - Data Requirements for Natural-language Descriptions and Identification Mike Dallwitz (CSIRO, Canberra) discussed two uses of descriptive taxonomic data, the generation of descriptions in natural language and identification, and how the current DELTA system could be used for these. Mike's paper can be seen in full at http://biodiversity.uno.edu/delta/www/descdata.htm.
Kevin Thiele - LucID Kevin Thiele (Centre for Biodiversity, Canberra) began by saying that it was possible that people might want different data structures for different purposes, and this might also be true for characteristics e.g. LucID for identification DELTA for natural language description NEXUS for phylogeny, a taxon by character matrix LucID for a state by character matrix (whether characteristics are present or not). He felt that there had been a need to develop LucID, because DELTA was not adequate to use for keys. They had received criticism for not using the "standard", but Kevin thought that some of the competition has been good for all concerned. With LUCID, it is possible to have relationships between blocks of data - the matrices can be linked together to build hierarchical sets of data (keys and sub keys). An animated discussion followed these talks, highlighting the various merits of LUCID, DELTA and NEXUS. It was suggested that it would be good if a superset from the three systems could be created and become a standard.
Gregor Hagedorn - How Should Original Observations be Recorded and Documented? Gregor Hagedorn (Institut fur Mikrobiologie, BBA, Germany) began by saying that much of the recording of original descriptive data was not integrated into current central databases, but that relational databases were good at handling complex data. He summarised the advantages of relational databases as provision of multi-user support, network support, record locking, and security. The use of XML will help with transfer of data, but the documentation about data needs to be better for interoperability. XML isn't the complete answer but will help. [Top of page] The final session of the day was a "Round Table Discussion" on XML and descriptions of organisms, following short presentations. Jim Beach, from the Museum of Natural History, University of Kansas, introduced the session by talking about the increasing popularity of XML and its usefulness in data exchange. Robert Stevenson and Robert Morris, from the University of Massachusetts, Boston, talked about their Electronic Field Guide Project - see http://www.cs.umb.edu/efg/ for details. The web interface uses no html, just style sheets (XSL). They demonstrated that this makes it particularly easy to take data and display it differently, as well as easing data exchange. Don Kirkup, from Royal Botanic Garden Kew, described a project at Kew that was using XML to markup the Flora of East Africa. He said they were interested in XML because there are a lot of descriptive data that are currently not available to most people, mainly because the current standards are not flexible enough. Dave Vieglas gave a further demonstration of retrieving data from a variety of sources, and displaying it by using XML e.g. producing an author list in Word. A lively discussion followed, showing that there was much interest in the use of XML, and an increasing number of projects beginning to actually use it. The discussion was widened to include the positioning of TDWG. It seemed to be generally agreed that approaches to both GBIF and CODATA were both favourable, since there was much work to be done, and any help would be gratefully accepted. Sunday 31st October Bryan Heidorn - Biological Information Browsing Environment Presentation can be seen at http://oak.lis.uiuc.edu/~pbh/TDWG1999/ Bryan Heidorn (University of Illinois) has been looking at dealing with natural language strings at a higher level than fields. He has been working on web based information retrieval, using vector model retrieval, allowing multiple searches simultaneously. For further information, look at http://bibiana.lis.uiuc.edu/~webvibe/ with ID = vibe and password = VIBE. It is planned to test the system for ease of use by asking high school students to try to identify unknown plants using information retrieval rather than keys. It is not specific, but it is possible to get close, then browse the search results to get the answer. Currently the project is using Excite and Alta Vista, but it is hoped to make use of Z39.50 servers in future.
Kevin Thiele and David Yeates - LucID v2 beta Kevin Thiele and David Yeates (University of Queensland, Australia) demonstrated the beta of version 2 of LucID. They reported that the interface was not very different from the current version, but there was added functionality such as the ability to download keys from the internet using ftp, and incorporating html to reference other sites. This new version is scheduled for release in next 6 months.
TreeBASE stores phylogenetic trees and the data matrices used to generate them from published research papers. This is quite different from The Tree of Life project, which is a collection of web pages stored hierarchically, with lots of phylogenetic information, but not really a database. TreeBASE is based on the literature, not summaries of the literature, and does not make judgements. It is available over the web, see http://herbaria.harvard.edu/treebase/, and keyword searches are based on abstracts, although the abstracts are not yet publicly available. It is requested that as people are about to get their phylogenetic studies published, they submit to TreeBASE as well - in fact, some journals are now requiring this (e.g Mycologia).
Nicholas Lander - Florabase http://www.calm.wa.gov.au/science/florabase.html Nicholas Lander (Western Australia Herbarium) demonstrated the searchable web interface of FloraBase. This database has nomenclatural, specimen, descriptive and library data as well as digital maps and photos. Many community sources such as floral societies have contributed their time to help populate the database. It is an Oracle-based system, with descriptions in DELTA (but accessible online), and also follows the HISPID standard (see below).
Tony Kirchgessner - New York Botanic Gardens Online Specimen Catalog During the morning break, a poster by Tony Kirchgessner from the New York Botanical Garden was presented. This described the NYBG's Online Specimen Catalog - see http://www.nybg.org/bsci/hcol for further details.
Barry Conn - HISCOM http://www.rbgsyd.gov.au/HISCOM/ Barry Conn (Royal Botanic Gardens, Sydney) reported that overall, HISPID (the Herbarium Information Standards and Protocols for Interchange of Data) had been successful in enabling data sharing, but that it was now time to look back and assess the direction for future work. He felt there was a need to work more closely with collections managers - quite often the people who would benefit from exchange standards do not know about them. The Virtual Australian Herbarium is in the early stages of development, but was a big step politically. It "will comprise primarily an interactive web front end linked to a shared scientific names database (the Australian Plant Names Index) with remote internet links to distributed specimen and other taxon- or specimen-associated datasets (e.g. vouchered spatial collection locality data) in the Australian herbaria".
Stuart Poss - Database-related Activities Associated with ICZ XVIII [18th International Zoological Congress] Stuart Poss (Gulf Coast Research Laboratory, Ocean Springs, MS) started by saying that zoology was very fractionated and that there had not been an international conference for 40 years. Information on the conference can be seen on the web site at [http://www.ims.usm.edu/~musweb/icz_xviii/icz_home.html] http://museum.ims.usm.edu/~musweb/icz_xviii/icz_home1.html[modified 26th January 2000], with a link to the general discussion topic "Coordinated Development and Use of Collections Databases" organised by Stuart. He hopes there will be some pre-conference activity on the web, and wondered if there could be some input from botanists who appear to be much more active in this area. Charles Hussey - Natural History Museum Projects Update Charles Hussey (NHM, London) briefly described the NHM's new Collections Management System, using a nested relational product by Unidata. The public will not have direct access to the operational systems, but will have access to a summary system covering all five departments in the museum. On top of this there will be a Dublin Core multi-platform layer. Library access is also needed, as are subject specific "hubs" to feed into UK higher education projects. Other projects the NHM is involved in include CIMI (Consortium for the Computer Interchange of Museum Information - see http://www.cimi.org), SPICE (for Species 2000 - NHM are contributing datasets - see http://biodiversity.soton.ac.uk/spice/), VIADOCS (advances in data capture, in conjunction with University of Essex), and projects in conjunction with the UK National Biodiversity Network (see http://www.nbn.org.uk). Larry Speers - Biological Data Mining Larry Speers (ECORC, Agriculture Canada) emphasised that access to names was very important, especially vernacular names for clients not used to scientific names. ITIS*ca, the Canadian version of ITIS illustrates this, http://res.agr.ca/itis/. It is based on the same data model as ITIS, and the text for web pages comes out of the database - this means that different language versions can be produced within a day. The system "enables users to search for biological information associated with scientific names, vernaculars and synonyms and then seamlessly expand their query to the rest of the Internet" i.e. it is all based on names, but "mines" further information from the internet, interconnecting the nomenclatural core to other sources held elsewhere. Synonymy is built in, and the system allows for multiple taxonomies and classification systems. The system uses the Z39.50 software BookWhere, and there are links to The Species Analyst at http://habanero.nhm.ukans.edu/SpeciesAnalyst in Kansas. And the live version of ITIS*ca now uses XML. Sunday pm - TDWG Plenary Session Subgroup reports: Economic Botany Mark Jackson from Kew gave a report for Frances Cook, chair of the economic uses sub-group. The sub-group had conducted a questionnaire to get feedback, so Mark summarised the results of that exercise: a) users wanted a web site where they could go to view the standard itself b) developers wanted information as to how the standard might be implemented c) some people wanted to extend the standard d) people wanted to reappraise the structure of the standard e) some people wanted a simpler version The members present strongly urged that a) and b) be addressed, and Mark said that he thought there was no technical problem with Kew hosting this information by the next meeting. He suggested people take a look at the Internet SEPASAL database to see an example of the standard in use - see http://www.rbgkew.org.uk/ceb/sepasal/internet/
Geography Mark Jackson also gave a report for Dick Brummitt, chair of this sub-group. The sub-group had completed a revision of the standard, intends to publish it by the end of the year, and Mark provided a list of the changes to anyone interested. Since the changes are a revision rather than a new standard, the meeting felt that publication could proceed. Several people asked that the standard be published on the web, which Mark said he thought would be possible. In fact, it emerged at this point that some changes had been submitted to the TDWG constitution which required standards to be made electronically available. Peter Stevens announced that both Dick and Sue Hollis were retiring from the sub-group, so a new chairperson was needed, raising the question of how to deal with subgroups with no real director. Stan Blum suggested that information about subgroups and "standards" be posted openly on the web site, but it was pointed out that one of the traditional way of working within TDWG was that interested groups discussed interim documents etc before presenting their thoughts and recommendations to the main body. It was generally agreed that standards should be widely available on the internet.
Taxonomic literature, ed. 2 - TL2 Richard Pankhurst reported his progress on transferring the TL2 information to database format. Currently, only some of the TL2 information is available in the database, and there were some inconsistencies e.g. abbreviations for Africa. There was discussion on where TL2 belonged and how data could be made more available, with discussion on the possibility of getting funding to help with the databasing work. Richard and Peter agreed to work on ways of finding a home for and upkeep of TL2. It was thought that a group of those interested should get together to discuss this further. Francisco Pando (Paco) had sent in his report and a copy should have been in everyone's registration packet. After being secretary for 6 years, he was now standing down. Everyone agreed that he should be thanked for his efforts, without which TDWG might no longer exist. Peter Stevens reported that he had approached Georgina MacKenzie to be the next secretary, and she had agreed. There seemed to be general acceptance of this and she was voted in.
Treasurer's Report: Peter Stevens presented John Wiersema's report, saying that the expenses for this meeting (about $3000) were still to be subtracted from the balance. Currently the money is held in a non-interest bearing account so John proposed that some of the balance should be moved to an interest bearing account. There was a suggestion that TDWG use some of the money to support projects, but the treasurer had made the observation that if we were trying to step up activities, there were not really enough funds. It was suggested that invoices for dues should be sent out, and that payment by credit card would certainly make life easier for institutions not in the US. Further investigations should be made into this (possibly asking IAPT how they achieved it), and paying via the web.
The agenda item of where TDWG should "sit" institutionally had been discussed earlier in the meeting, and it was generally agreed that both GBIF and CODATA should be approached to see if they could help with making TDWG a more prominent body, and with funding.
Next year's meeting: Walter Berendsohn had agreed to organise this, but had suggested that it should be at Senckenberg Museum in Frankfurt, which would better suit the wider scope of TDWG, rather than the botanic gardens in Berlin. He had gone ahead and booked rooms at the Museum, from 3rd to the 5th November, 2000 [moved to 10-13 November 2000, as of 16th December 1999] - see http://www.bgbm.fu-berlin.de/tdwg/2000 for further details. A theme for the meeting was thought to be a good idea, so Walter asked for suggestions. There was a general feeling that some time should be set aside for subgroup discussions - possibly on the Friday before the main meeting began. It was also felt that there needed to be time to make decisions - there really hadn't been enough time to discuss the presentations on Saturday morning about the focus of TDWG.
Stan Blum suggested that there should be an email list to facilitate discussion about the organisation of TDWG, with a committee to spearhead ideas. He agreed to organise it.
Gregor Hagedorn reported on the meeting of those interested in descriptive data - the subgroup would discuss a requirements analysis, and would present a draft document in XML by the next meeting.
Ballot on Constitution Changes: There had been a ballot on proposed constitution changes regarding removing specific references to plants in the name etc, on the updating of standards, and on the actual voting mechanism. David Boufford reported that there was only was vote against, so these changes have been accepted. The final discussion of the meeting concerned making standards generally available, particularly electronically. There was to be a recommendation made to Bob Kiger to make BPH (Botanico-periodicum-huntianum) available electronically. The meeting closed with a final thanks to the organisers. |