并行计算与并行算法


  Course Home   |   Syllabus   |   Groups   |   Project   |

Projects

The related source codes and documents on the projects should be uploaded to the FTP website. The website is ftp://public.sjtu.edu.cn,username: weihao, password: epcc2019.

Projects 1-3 are done in your own cluster, which consists of at least three laptops, PCs or VMs.

Project 1: Have a fun with parallel programming wiht MPI

  • MPI Hello World: Write a parallel variant of Kernighan and Ritchie's classic "hello, world" program. Each process should print a message of the form "Hello world from process 'r' of 'n' on 'h'", where 'r' is its rank, 'n' is the total number of processes, 'h' is the name of the specific CPU/hostname it runs on. You may use these MPI APIs as follows:

    • MPI_Comm_size.

    • MPI_Comm_rank.

    • MPI_Get_processor_name. We call this function only out of curiosity. It is seldom necessary for the processes themselves to know what physical processors they run on. But this function is there in the MPI standard and it was meant to be used in process migration. Here we simply use it to show that the MPI processes indeed run on different CPUs/machines. We could just as well have used the standard UNIX function 'gethostname', but this would return the same name if the MPI program was to run on a large SMP, or a large single parallel machine like, say, Cray X1. On the other hand, function MPI_Get_processor_name will return a specific number of the physical processor a given MPI process runs on.

    • Analysis: Run your program with different numbers of processors and host nodes. You may use other mapping schemes (e.g., -loadbalance, -bynode, -nolocal), rather than the default mapping of process ranks to nodes. Give an analysis on your program's output.

  • Calculate PI: Write a parallel MPI program to calculate the value of PI. The serial verison of calculating PI using the rectangle rule is shown in the attached slide; or you can refer to EXERCISES 4.11. You program should at least output the value of PI and the program's running time. You may use these MPI APIs as follows:

    • MPI_Bcast.

    • MPI_Reduce.

    • MPI_Wtime.

    • Analysis: Benchmark your program with different numbers of processors, host nodes and intervals (in the rectangle rule). Give an analysis on your program's output.


Project 2: MapReduce in Hadoop System

Requirements

Each student group should propose a project proposal, which is targeting at implementing wordcount program using MapReduce. 


Project 3: Big Data Analysis in Hadoop System

Requirements

Each student group should propose a project proposal, which is targeting at implementing weatherdata program using MapReduce.

Weatherdata is a project to get the temperatures from the website, which is available on http://ram-n.github.io/weatherData/

You need use your weatherdata program to get the statistics of the highest and lowest temperature.


Important Dates and Submissions:

DDL:You have to finish this program before 23:59 PM on May 31th, 2019.

You should create a folder named by your student IDs which consists of two folders named by "project1" and "project 3". And you should put related files in the corresponding folder, and name these files according to the directory structure shown as follows.

  • Project1

  • ----report_for_projetc1.pdf // Your should use the offical double-column ACM SIG Proceedings Templates to write your project report (either Word or LaTeX is OK). The length of your report should be at least 4 pages. Download link: http://www.acm.org/sigs/publications/proceedings-templates#aL1

  • ----run.sh // The TA will only use the "./run.sh" command under the "project1" folder to run the executable file. Please make sure it can work.

  • ----tips_for_run.txt // Please give some examples on how to execute "run.sh". You may include some descriptions of parameters.

  • ----Your sourcecode files


  • Project3

  • ----report_for_projetc3.pdf // Your should use the offical double-column ACM SIG Proceedings Templates to write your project report (either Word or LaTeX is OK). The length of your report should be at least 4 pages. Download link: http://www.acm.org/sigs/publications/proceedings-templates#aL1

  • ----run.sh // The TA will only use the "./run.sh" command under the "project3" folder to run the executable file. Please make sure it can work.

  • ----tips_for_run.txt // Please give some examples on how to execute "run.sh". You may include some descriptions of parameters.

  • ----Your sourcecode files

Project Management

You should strongly consider using either Subversion or CVS to perform source code control for your project and the paper you write describing it. I suggest Subversion.

I also strongly suggest writing your course project report using LaTeX. It is the de-facto tool in which most CS research papers are written. While it has a bit of start up cost, it's much easier to collaboratively write complex research papers using LaTeX than using word.

Writing Papers

Analysis

  • Books: Raj Jain, the Art of Computer Systems Performance Analysis - a very good overview of lots of mathetmatical techniques and queueing bits, aimed at a systems audience




homework to download: https://jbox.sjtu.edu.cn/l/WuCIJ9 password: zzis

uploads for teachers:  https://pan.baidu.com/s/1c8fe6MQCiCiOUBI8wh-14Q        password: 8rgz