speciesLink
A System for integrating distributed primary biodiversity data
Vanderlei Perez Canhos
Centro de Referência em Informação Ambiental, CrIA
Overview
• CRIA
• SinBiota and The Species Analyst
• speciesLink
• Type of collections involved
• Number of records
• Technical features
• Future plans
Focus on Biodiversity
Informatics
• Open source software
• Standards and protocols
• Systems interoperability
• Partnerships
CrIA
Reference Center on Environmental Information
http://www.cria.org.br
http://speciesanalyst.net/
Location of participant collections: mainly United States
Taxonomic groups: several taxa
Protocol: Z39.50 (migration to DiGIR on process)
Number of records: ~ 50.000.000
Importance of data sharing
Paris
Field Museum
British Museum
KU – Natural History
Museum
The main goal of
speciesLink was to
build a distributed
system integrating
several biological
collections and
making their primary
data available on the
Internet.
speciesLink
Distributed Information System for Biological Collections
http://splink.cria.org.br
São Paulo State Collections
fish: 3
mites: 2
herbaria: 4
microorganisms: 3
inventories: SinBiota
Geographic distribution of the participant collections – phase I
Number of Records
available
Herbaria
existing
72,000
of
740,000
1,000
of
2,700
Mites
18,000
of
22,000
Fish
70,000
of
123,000
Inventories
(species)
38,000
of
38,000
~200,000
of
~1,000,000
Microorganisms
Collection Management Software
Botanical Collections
ESA
Zoological Collections
730
80,000
SP
11,280
350,000
IAC
25,245
45,000
SPF
21,828
UEC
12,860
5,382
7,000
12,392
15,000
DSZSJRP
(fish)
5,714
23,000
133,500
LIRP
(fish)
4,314
30,000
130,000
MZUSP
(fish)
60,000
110,000
Microbial Collections
ACARISJRP
ACARIESALQ
Observational Data
CBMAI
110
700
IBSBF
929
2,000
SinBiota
38,109
38,109
Support to collections
• Providing basic equipment and network infrastructure
• Helping to choose a management system, when needed
• Helping to train and to import data, when needed
Protocol and Content Schema
• DiGIR protocol (Distributed Generic Information Retrieval)
Potential to be globally accepted
• DiGIR software (Java Portal & PHP Provider)
Collaborative development
• DarwinCore v.2
Covers the basic content elements (taxonomic
identification, location and date of collecting event)
Simple Search
Interface
speciesLink site
System’s
Architecture
DiGIR
Portal
(Java)
Perl
Presentation Layer
Fast and stable connectivity
Collection A
Data
SQL
Regional Server
PHP
Provider
Collection
Management
System
Postgres
SQL
PHP
Provider
SOAP Server
Slow or unstable connectivity
Collection B
Data
SQL
Collection
Management
System
Collection C
SOAP
client
Data
Repository
Data
SQL
Collection
Management
System
SOAP
client
Data
Repository
Network Design
Regional
Server
Regional
Server
Regional
Server
Regional
Server
speciesLink site
System’s
Architecture
DiGIR
Portal
(Java)
Perl
Presentation Layer
Fast and stable connectivity
Collection A
Data
SQL
Regional Server
PHP
Provider
Collection
Management
System
Postgres
SQL
PHP
Provider
SOAP Server
Slow or unstable connectivity
Collection B
Data
SQL
Collection
Management
System
Collection C
SOAP
client
Data
Repository
Data
SQL
Collection
Management
System
SOAP
client
Data
Repository
Data Migration Client
• Platform independent (java)
• Connects to any database accessible via JDBC
(simple text files are also supported)
• Complete control over data
• Low traffic
• Possibility to filter sensitive data using a regular expression
speciesLink site
System’s
Architecture
DiGIR
Portal
(Java)
Perl
Presentation Layer
Fast and stable connectivity
Collection A
Data
SQL
Regional Server
PHP
Provider
Collection
Management
System
Postgres
SQL
PHP
Provider
SOAP Server
Slow or unstable connectivity
Collection B
Data
SQL
Collection
Management
System
Collection C
SOAP
client
Data
Repository
Data
SQL
Collection
Management
System
SOAP
client
Data
Repository
Regional server
Postgres
SQL
SOAP Server
(perl)
Features
• perl / PostgreSQL combination
• Can hold data from several collections
• Interpretation rules can be applied to specific data
Provider
PHP
Query Result (brief)
speciesLink – phase II
>35 collections available
Future plans
• Mapping tools
Future plans
• Mapping tools
• Data cleaning tools
Future plans
• Mapping tools
• Data cleaning tools
• Modelling framework
Infrastructure for Species Distribution Modelling
Modelling algoritms
Bioclim
Neural
Net
Environmental layers
GARP
Vegetation
DiGIR
Portal
ACME
Precipitation
BioCASE
Portal
Temperature
specimens
Acknowledgements (phase I)
Universidade Estadual
Paulista
Instituto de Botânica
Universidade Estadual de
Campinas
Instituto Agronômico de
Campinas
Escola Superior de
Agricultura “Luiz de
Queiroz”
Instituto Biológico
Universidade de São Paulo
Fellowships
• Visiting researchers
– Andrew Townsend Peterson (3 months)
– Arthur Chapman (1 year)
• Pos-doctor
– Ingrid Koch
• Technical training (6 TT fellowships)
Summing up
• Achieved proof of concept
• Data is already available
• Low cost for connecting new collections
• Triggered off a movement within the collections to improve the
quality of data and to increase the amount of available
information
• Adoption of standards and protocols
• International partnerships: DiGIR, modelling framework
• Interoperability with similar initiatives
Thank you!
http://splink.cria.org.br
[email protected]
Download

vcanhos - Centro de Referência em Informação Ambiental