I run an app which can have heavy computing tasks ON-DEMAND (users can trigger both long running and short running tasks at will).
Essentially it is very hard for me to predict my users activity and correctly dimension any AWS EC2 or VPS or else... Sometimes I might just need a simple 8 vCPU with 32Gb RAM, but at any given moment I need tens of 4x TESLA GPU with 128 Gb RAM each due to multiple users doing things at the same time.
AWS Lambda's autoscaling would be just ideal except it won't support GPU and/or limits process to 15mn which is a no-go for my app. Also I can't wait for more than 500ms between the moment the user press the button and the moment the process starts to run (some tasks require near realtime latency), leaving ECS Fargate behind (auto provisioning new containers takes 20 sec on average).
I understand I need some 24/7 uptime machine able to respond in realtime to users but I cannot afford tens of such beasts for 24/7 use while needing them only 300h per month (way way underused).
I believe cloud providers could leverage virtualization to offer a giant "shared" supercomputer I could deploy my containerized app on, but pay ONLY for my ACTUAL usage. I mean, precisely and exactly bill me for the strict usage of their computing power per GB/s like AWS Lambda.
I am aware of both ECS Fargate and Jelastic Cloud but that's quite not what I am looking for: both still need to provision ressources (respectively containers and cloudlets) with some delay (25 seconds+).
Are there some cloud providers offering such solutions? Is there any other way I can achieve realtime (as near as 0ms latency possible) with more than 30mn processing without 24/7 billing of unused resources ?!