CCF YOCSEF
Shanghai
Big Data Beyond Hadoop
Real-Time Analytical Processing (RTAP) Using
Spark and Shark
Jason Dai
Engineering Director & Principal Engineer
Intel Software and Services Group
Agenda
Big Data beyond Hadoop
Introduction to Spark and Shark
Case study: real-time analytical processing (RTAP)
Big Data beyond Hadoop
Big Dta today
•
The
is in the room
Big Data beyond Hadoop
•
Real-time analytical processing (RTAP)
–
Discover and explore data iteratively and interactively for
real-time
insights
•
Advanced machine leaning and data mining (MLDM)
–
Graph-parallel
predictive analytics (non-SQL)
•
Distributed in-memory analytics
–
Exploit available
main memory
in the entire cluster for >100x speedup
RTAP: Real-Time Analytical Processing
Real-Time Analytical Processing (RTAP)
•
Data ingested & processed in a
streaming
fashion
•
Real-time data queried and presented in an
online
fashion
•
Real-time and history data combined and mined
interactively
•
Predominantly
RAM-based
processing
Advanced, Graph-Parallel MLDM
Advanced machine learning and data mining (MLDM)
•
Information retrieval (e.g., page rank)
•
Recommendation engine (e.g., ALS)
•
Social network analysis (e.g., clustering)
•
Natural language processing (e.g., NER)
•
…
Graph parallel computations
•
A sparse graph
G(V, E)
•
A vertex program
P
runs on each vertex in parallel
& repeatedly
•
Vertices interact along edges
评论