Cloud Dataproc: Google’s new managed service for Hadoop and Spark
Google has unveiled a new offering as part of its Cloud Platform providing on-demand access to Spark and Hadoop processing services, in order to enable customers to more easily derive useful insights from large datasets. That is on top of other Cloud Platform resources being used.
Dataproc is charged on a minute-by-minute basis, meaning users only pay for the resources that they are using at the time, rather than stump up for contracts or long-term fees, allowing them to gain better ways of managing the cost of Hadoop deployments on the Google Cloud Platform. Your clusters can also include preemptible instances that have lower compute prices, reducing your costs even further.
Because Dataproc can spin up clusters this fast, users will be able to set up ad-hoc clusters when needed and because it is managed, Google will handle the administration for them. And by fast, I mean that Cloud Dataproc clusters can be started, scaled and shutdown in an average of 90 seconds per operation.
Offered as a managed service via the Google Cloud Platform, Cloud Dataproc is geared toward open-source users looking to automate the management of their data clusters.
Unsurprisingly, Dataproc is also integrated with the rest of Google’s cloud services, including BigQuery, Cloud Storage, Cloud Bigtable, Cloud Logging and Cloud Monitoring. Companies can use it to extract, transform and load terabytes of raw log data directly into BigQuery for business reporting, for example.
On paper Dataproc looks highly competitive, and it will certainly be a popular addition to the lineup among those who already use Google’s already varied and broad range of cloud products; whether the service will capture market share from other providers though is yet to be seen. This is because the interaction of clusters with Spark or Hadoop is through the Google Developers console. “In the time it takes you to read this blog post, you can have a Spark or Hadoop cluster created, configured, and ready to work for you”, said Google product manager James Malone, announcing the new service in a post on the firm’s blog. When a cluster is no longer in use it can be turned off to avoid spending money needlessly.