Synergies among Grid, Peer-to-Peer
and Cloud Computing
(Towards e-Science Communities)
Luís Antunes Veiga
[email protected]
Distributed Systems Group
INESC-ID Lisboa / Instituto Superior Técnico
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
...Why...: e-Science
Most science is becoming e-Science
large data repositories
growing every day
processed in myriads of ways
powerful applications
computational intensive
increasing demand for resources
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
2
...Why...: e-Science Communities
Researchers form natural communities
they tend to gather around...
research areas
tools, instruments, applications
data repositories used
affiliation, geography
projects, consortia
special kind of “social” network
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
3
...Why...: Synergies...
Leverage globally available computing resources
harness resources of whatever shape or source
e.g., Clusters, Grids, multiprocessors
P2P voluntary cycle-sharing, Desktop Grids
Utility and Cloud Computing
provide uniform and easy-to-use interface to resources
data storage, sharing, transfer
resource allocation and scheduling
work/task distribution
most e-scientists will not be programmers (no-code)!
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
4
...Where...:e-Science At Large...
E-Science examples...
video coding
video and image processing
raytracing, high-res rendering
face recognition in pictures/movies
mollecular modeling
chemical reaction simulation
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
5
...e-Science At Large...
...E-Science examples
network protocol simulation
financial investments
stock exchange
derivatives
statistical
numerical methods, data processing
language/speech processing
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
6
...e-Science At Large
What is common to all these e-Science activities?
large amounts of data
complex methods/algorithms
long processing times, resource intensive
no software development /classical programming
languages, API, sockets, synchronization, MPI, etc.
use mostly pre-developed/deployed applications
scripting, customization, configuration
possible intrincate and very advanced
comprises large numbers of parallelizable tasks
most can be made completely independent
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
7
Synergy Vision
Resources from P2P, Grid, Utility Computing
Deployed Tasks
Job Result
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
8
...What:...Synergy
Application Execution Model
Gridlet concept: intuitive, simple to use, data-centered
suited to most applications used by researchers
Resource Sharing Architecture
leverage mostly any computing and storage provider
a P2P-based Cloud encompassing Clusters, Grids, PCs
Community Support
social network integration (facebook,hi5)
deployable via BOINC (SETI@home)
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
9
...How:...Gridlet
Gridlet
uniform basis of workload division, computation off-load
chunk of data with associated operations to be performed
parameters, scripts, configuration files, ...
cost estimate: G$: (CPU, Bandwidth, Memory, Disk)
jobs are gridlets sent to applications
allow adaptation of unmodified applications
operation/data transformation via XML policies
intuitive approach to
data-partitions, task-spawning, resource management
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
10
...How:...Infrastructure
Synergy Infrastructure
Extendable peer-to-peer architecture
harness cycles of desktops, clusters, utility-computing
gathers asymmetric participants, different capabilities
Hybrid structured/unstructured overlay
structured: data repository, caching, results, indexes
unstructured: execution scheduled on any node
Hierarchical overlay
super-peers aggregate information of neighbors
resources, applications, reputation, cached data, ...
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
11
Synergy Infrastructure
cloud on overlay/mesh
oceans of gridlets
flow across the overlay
lifecycle
cost estimate
G$ = (CPU, BW)
P2P
P2P
overlay
overlay
network
network
gridlets
served
gridlets
received
gridlets
submitted
gridlets
returned
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
12
...How:...Community Support
e-Science Infrastructure driven by Communities.
Social network integration
facebook,hi5, widgets on web pages
execute code on idle computers of “friends”
discover similar interests
e.g., tools, applications
Community-driven portals
data sets, benchmark data, results
algorithm, topology, process descriptions
ask/donate storage and CPU
code deployable via BOINC (SETI@home)
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
13
...What For:...Current and Next Activities
Application Scenarios
Video Transcoding
Network Topology/Protocol Simulation
Raytracing for 3D rendering
Face Dectection on Film Archives (e.g., Cinemateca)
Synergy VM for transactional-memory applications
Execution Infrastructures (combined)
P2P cycle-sharing, volunteer computing
Clustered Virtual Machines (e.g., Java, .Net)
Grids, Utility Computing Infrastructures
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
14
...What For:...Video Transcoding (1)
file splitting
semantics-aware
data-partitioning
append/prepend
gridlet-data
complete frames
movie header
information
keep full (intra) &
predicted frames
XML description:
format
headers
boundaries
constraints
transformations
movie file (e.g., mpg, avi, flv, mov, wmv)
H
I1 P1 P1 P1 I2 P2 P2 I3 P3 P3 P3 P3 I4 P4 P4 I5 P5 P5 I6 P6
XML Format
Description
<>
<>
..............
..............
</ >
</ >
Gridlet Manager
H
H
I1 P1 P1
H
I2 P2 P2
H
I3 P3 P3 P3 P3
I4 P4 P4
H
H
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
I6 P6
I5 P5 P5
15
...What For:...Video Transcoding (2)
gather available
gridlet-results H' I1'
sent by servicing
peers
extract result data
& discard headers
reassemble file
according to
semantics
new header
ordering
constraints
transformations
H' I4' P4' P4'
H' I2' P2' P2'
P1' P1' P1'
H' I3' P3' P3' P3' P3'
H' I6' P6'
H' I5' P5' P5'
XML Format
Description
<>
<>
..............
..............
</ >
</ >
Gridlet Manager
H' I1' P1' P1' P1' I2' P2' P2' I3' P3' P3' P3' P3' I4' P4' P4' I5' P5' P5' I6' P6'
special cases:
discard gridlets
crypto-challenge
movie file converted/processed
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
16
...What For:...Network Simulation
COGITARE addresses:
limits on size &
complexity of
simulations
inefficient resource
utilization (e.g.,
multi-core)
no agnostic
topology
description
languages
no repository for
research result
interchange
absence of teaching
a platform
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
17
Conclusion
e-Science is becoming dominant
increasing demand for computing resources
harness resources from various sources (P2P,Grid,Cloud)
minority of computer researchers and programmers
intuitive application and resource model
manage activities around communities
Future Work
assessment of financial derivative products
chemical reaction and process simulation
Thank you: Questions?
www.gsd.inesc-id.pt/~lveiga
15:47
Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30
18
Download

gsd ...Where