Full name:
Jackson Davis
Email:[email protected]
Username:jdavis70
I faced challenges involved with communicating with the different nodes I was working with and implementing the different workflows.
My implementation comprises 1
new software components, totaling 550
added lines of code over the previous implementation. Key challenges included:
- Coordinating communication and data transfer between the coordinator and worker nodes during the different MapReduce phases. I solved this by carefully designing the message passing and notification system using the existing distribution framework.
- Implementing the shuffle phase to correctly partition and distribute the map output to the appropriate reducer nodes. I addressed this by leveraging consistent hashing and introducing a new append method in the store to efficiently group the shuffled data.
Describe how you characterized the correctness and performance of your implementation
Correctness: I characterized the correctness of my implementation by:
- Developing comprehensive test cases that cover various workflows and edge cases
- Comparing the output of the distributed execution with the expected results from local execution
- Validating the behavior of individual components like the coordinator, mappers, and reducers
Performance:I characterized the performance of my implementation by:
- Measuring the execution time of MapReduce jobs with varying input sizes and comparing it with local execution
- Evaluating the impact of additional features like compaction on overall performance
Which extra features did you implement and how?
- Compaction functions: I added support for user-defined compact functions that can be run on the map output to minimize data transfer between the map and reduce tasks. This is achieved by aggregating values with the same key before shuffling, reducing network bandwidth usage.
Roughly, how many hours did this milestone take you to complete?
Hours: 30-35