Cloud architecture for divide conquer/ scatter gather?


I'm building a feature for a cloud service that receives a huge file as an input, splits the file into smaller files, performs some analysis and modifications to the smaller files, and then it reassembles them into a large file.

Can anyone give me any architecture pointers to achieve this?

I have access to AWS, so I han use S3, a DB, and pretty much any other service offered by Amazon.

Obviously this needs to be highly performant and highly scalable.


