speciesLink A System for integrating distributed primary biodiversity data Vanderlei Perez Canhos Centro de Referência em Informação Ambiental, CrIA Overview • CRIA • SinBiota and The Species Analyst • speciesLink • Type of collections involved • Number of records • Technical features • Future plans Focus on Biodiversity Informatics • Open source software • Standards and protocols • Systems interoperability • Partnerships CrIA Reference Center on Environmental Information http://www.cria.org.br http://speciesanalyst.net/ Location of participant collections: mainly United States Taxonomic groups: several taxa Protocol: Z39.50 (migration to DiGIR on process) Number of records: ~ 50.000.000 Importance of data sharing Paris Field Museum British Museum KU – Natural History Museum The main goal of speciesLink was to build a distributed system integrating several biological collections and making their primary data available on the Internet. speciesLink Distributed Information System for Biological Collections http://splink.cria.org.br São Paulo State Collections fish: 3 mites: 2 herbaria: 4 microorganisms: 3 inventories: SinBiota Geographic distribution of the participant collections – phase I Number of Records available Herbaria existing 72,000 of 740,000 1,000 of 2,700 Mites 18,000 of 22,000 Fish 70,000 of 123,000 Inventories (species) 38,000 of 38,000 ~200,000 of ~1,000,000 Microorganisms Collection Management Software Botanical Collections ESA Zoological Collections 730 80,000 SP 11,280 350,000 IAC 25,245 45,000 SPF 21,828 UEC 12,860 5,382 7,000 12,392 15,000 DSZSJRP (fish) 5,714 23,000 133,500 LIRP (fish) 4,314 30,000 130,000 MZUSP (fish) 60,000 110,000 Microbial Collections ACARISJRP ACARIESALQ Observational Data CBMAI 110 700 IBSBF 929 2,000 SinBiota 38,109 38,109 Support to collections • Providing basic equipment and network infrastructure • Helping to choose a management system, when needed • Helping to train and to import data, when needed Protocol and Content Schema • DiGIR protocol (Distributed Generic Information Retrieval) Potential to be globally accepted • DiGIR software (Java Portal & PHP Provider) Collaborative development • DarwinCore v.2 Covers the basic content elements (taxonomic identification, location and date of collecting event) Simple Search Interface speciesLink site System’s Architecture DiGIR Portal (Java) Perl Presentation Layer Fast and stable connectivity Collection A Data SQL Regional Server PHP Provider Collection Management System Postgres SQL PHP Provider SOAP Server Slow or unstable connectivity Collection B Data SQL Collection Management System Collection C SOAP client Data Repository Data SQL Collection Management System SOAP client Data Repository Network Design Regional Server Regional Server Regional Server Regional Server speciesLink site System’s Architecture DiGIR Portal (Java) Perl Presentation Layer Fast and stable connectivity Collection A Data SQL Regional Server PHP Provider Collection Management System Postgres SQL PHP Provider SOAP Server Slow or unstable connectivity Collection B Data SQL Collection Management System Collection C SOAP client Data Repository Data SQL Collection Management System SOAP client Data Repository Data Migration Client • Platform independent (java) • Connects to any database accessible via JDBC (simple text files are also supported) • Complete control over data • Low traffic • Possibility to filter sensitive data using a regular expression speciesLink site System’s Architecture DiGIR Portal (Java) Perl Presentation Layer Fast and stable connectivity Collection A Data SQL Regional Server PHP Provider Collection Management System Postgres SQL PHP Provider SOAP Server Slow or unstable connectivity Collection B Data SQL Collection Management System Collection C SOAP client Data Repository Data SQL Collection Management System SOAP client Data Repository Regional server Postgres SQL SOAP Server (perl) Features • perl / PostgreSQL combination • Can hold data from several collections • Interpretation rules can be applied to specific data Provider PHP Query Result (brief) speciesLink – phase II >35 collections available Future plans • Mapping tools Future plans • Mapping tools • Data cleaning tools Future plans • Mapping tools • Data cleaning tools • Modelling framework Infrastructure for Species Distribution Modelling Modelling algoritms Bioclim Neural Net Environmental layers GARP Vegetation DiGIR Portal ACME Precipitation BioCASE Portal Temperature specimens Acknowledgements (phase I) Universidade Estadual Paulista Instituto de Botânica Universidade Estadual de Campinas Instituto Agronômico de Campinas Escola Superior de Agricultura “Luiz de Queiroz” Instituto Biológico Universidade de São Paulo Fellowships • Visiting researchers – Andrew Townsend Peterson (3 months) – Arthur Chapman (1 year) • Pos-doctor – Ingrid Koch • Technical training (6 TT fellowships) Summing up • Achieved proof of concept • Data is already available • Low cost for connecting new collections • Triggered off a movement within the collections to improve the quality of data and to increase the amount of available information • Adoption of standards and protocols • International partnerships: DiGIR, modelling framework • Interoperability with similar initiatives Thank you! http://splink.cria.org.br [email protected]