Synergies among Grid, Peer-to-Peer and Cloud Computing (Towards e-Science Communities) Luís Antunes Veiga [email protected] Distributed Systems Group INESC-ID Lisboa / Instituto Superior Técnico Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa ...Why...: e-Science Most science is becoming e-Science large data repositories growing every day processed in myriads of ways powerful applications computational intensive increasing demand for resources 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 2 ...Why...: e-Science Communities Researchers form natural communities they tend to gather around... research areas tools, instruments, applications data repositories used affiliation, geography projects, consortia special kind of “social” network 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 3 ...Why...: Synergies... Leverage globally available computing resources harness resources of whatever shape or source e.g., Clusters, Grids, multiprocessors P2P voluntary cycle-sharing, Desktop Grids Utility and Cloud Computing provide uniform and easy-to-use interface to resources data storage, sharing, transfer resource allocation and scheduling work/task distribution most e-scientists will not be programmers (no-code)! 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 4 ...Where...:e-Science At Large... E-Science examples... video coding video and image processing raytracing, high-res rendering face recognition in pictures/movies mollecular modeling chemical reaction simulation 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 5 ...e-Science At Large... ...E-Science examples network protocol simulation financial investments stock exchange derivatives statistical numerical methods, data processing language/speech processing 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 6 ...e-Science At Large What is common to all these e-Science activities? large amounts of data complex methods/algorithms long processing times, resource intensive no software development /classical programming languages, API, sockets, synchronization, MPI, etc. use mostly pre-developed/deployed applications scripting, customization, configuration possible intrincate and very advanced comprises large numbers of parallelizable tasks most can be made completely independent 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 7 Synergy Vision Resources from P2P, Grid, Utility Computing Deployed Tasks Job Result 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 8 ...What:...Synergy Application Execution Model Gridlet concept: intuitive, simple to use, data-centered suited to most applications used by researchers Resource Sharing Architecture leverage mostly any computing and storage provider a P2P-based Cloud encompassing Clusters, Grids, PCs Community Support social network integration (facebook,hi5) deployable via BOINC (SETI@home) 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 9 ...How:...Gridlet Gridlet uniform basis of workload division, computation off-load chunk of data with associated operations to be performed parameters, scripts, configuration files, ... cost estimate: G$: (CPU, Bandwidth, Memory, Disk) jobs are gridlets sent to applications allow adaptation of unmodified applications operation/data transformation via XML policies intuitive approach to data-partitions, task-spawning, resource management 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 10 ...How:...Infrastructure Synergy Infrastructure Extendable peer-to-peer architecture harness cycles of desktops, clusters, utility-computing gathers asymmetric participants, different capabilities Hybrid structured/unstructured overlay structured: data repository, caching, results, indexes unstructured: execution scheduled on any node Hierarchical overlay super-peers aggregate information of neighbors resources, applications, reputation, cached data, ... 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 11 Synergy Infrastructure cloud on overlay/mesh oceans of gridlets flow across the overlay lifecycle cost estimate G$ = (CPU, BW) P2P P2P overlay overlay network network gridlets served gridlets received gridlets submitted gridlets returned 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 12 ...How:...Community Support e-Science Infrastructure driven by Communities. Social network integration facebook,hi5, widgets on web pages execute code on idle computers of “friends” discover similar interests e.g., tools, applications Community-driven portals data sets, benchmark data, results algorithm, topology, process descriptions ask/donate storage and CPU code deployable via BOINC (SETI@home) 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 13 ...What For:...Current and Next Activities Application Scenarios Video Transcoding Network Topology/Protocol Simulation Raytracing for 3D rendering Face Dectection on Film Archives (e.g., Cinemateca) Synergy VM for transactional-memory applications Execution Infrastructures (combined) P2P cycle-sharing, volunteer computing Clustered Virtual Machines (e.g., Java, .Net) Grids, Utility Computing Infrastructures 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 14 ...What For:...Video Transcoding (1) file splitting semantics-aware data-partitioning append/prepend gridlet-data complete frames movie header information keep full (intra) & predicted frames XML description: format headers boundaries constraints transformations movie file (e.g., mpg, avi, flv, mov, wmv) H I1 P1 P1 P1 I2 P2 P2 I3 P3 P3 P3 P3 I4 P4 P4 I5 P5 P5 I6 P6 XML Format Description <> <> .............. .............. </ > </ > Gridlet Manager H H I1 P1 P1 H I2 P2 P2 H I3 P3 P3 P3 P3 I4 P4 P4 H H 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 I6 P6 I5 P5 P5 15 ...What For:...Video Transcoding (2) gather available gridlet-results H' I1' sent by servicing peers extract result data & discard headers reassemble file according to semantics new header ordering constraints transformations H' I4' P4' P4' H' I2' P2' P2' P1' P1' P1' H' I3' P3' P3' P3' P3' H' I6' P6' H' I5' P5' P5' XML Format Description <> <> .............. .............. </ > </ > Gridlet Manager H' I1' P1' P1' P1' I2' P2' P2' I3' P3' P3' P3' P3' I4' P4' P4' I5' P5' P5' I6' P6' special cases: discard gridlets crypto-challenge movie file converted/processed 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 16 ...What For:...Network Simulation COGITARE addresses: limits on size & complexity of simulations inefficient resource utilization (e.g., multi-core) no agnostic topology description languages no repository for research result interchange absence of teaching a platform 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 17 Conclusion e-Science is becoming dominant increasing demand for computing resources harness resources from various sources (P2P,Grid,Cloud) minority of computer researchers and programmers intuitive application and resource model manage activities around communities Future Work assessment of financial derivative products chemical reaction and process simulation Thank you: Questions? www.gsd.inesc-id.pt/~lveiga 15:47 Encontro Ciência 2009 / Science 2009 – Luís Veiga 2009/07/30 18