PaperReading20200421

CanChen ggchen@mail.ustc.edu.cn


 

Spark on Hadoop vs MPI/OpenMP on Beowulf

Today I read this paper just to extend my knowledge scope.

  1. Stark and OpenMPI/MPI are two cluster computing frameworks. Spark is good at handling fault tolerance support and data replication while OpenMP/MPI is designed to maxmize high performance computing.
  2. GCP: Google Cloud Platform
  3. Core data units in Spark: Resilient Distributed Datasets(RDDs)
  4. Hadoop: programming model--mapreduce(mapping: data processing locally; shuffling: data redistribution over network; reduction: data summarization); a distributed file system(HDFS); a cluster manager--YARN(handling resources and job scheduling).
  5. MPI:Message Passing Interface, a communication protocol supporting point to point and collective communication. Disads: do not support fault tolerance and not suitable for small grain level of parallelism.
  6. OpenMP: user-friendly interface allows easy parallelizing complex algorithms; support small grain parallelism.
posted @ 2020-04-21 17:20  Klaus-Chen  阅读(92)  评论(0编辑  收藏  举报