Duplication removal project in cloud computing

Hey guys, I have to do cloud computing project in this semester.

This is my Statement: Cloud Based improved file handling and duplication removal using md5.

Actually I am new to cloud computing and I am stuck at first step xD I don't know how should I start the project like what should I use.

Can someone guide me in the correct direction? Please

