Google App Engine is a great place to run your applications, but for some workloads you may want more fine-grained control of the environment your app runs in. You may need to fine-tune how or when scaling occurs, customize the load balancer or code in a language that App Engine doesn’t support.

Today we’re excited to introduce a solution paper and tutorial for Scalable and Resilient Web Applications to help you – you guessed it – build a scalable and resilient web application on Google Cloud Platform. The solution includes a technical paper that discusses the application architecture and key design decisions as well as a functional, open source application and tutorial hosted on GitHub that you can deploy or even use as a starting point for your own applications.

You may have read our previous post about Google Compute Engine Load Balancer easily handling 1,000,000 requests per second or watched the live demo where Compute Engine Autoscaler added enough instances to handle over 1,500,000 requests per second and wondered, how exactly did they do that?

The sample implementation uses Cloud Deployment Manager to provision a load balancer as well as multi-zone auto-scaled Compute Engine instances to serve the Redmine project management web app. The architecture uses Google Cloud SQL and Google Cloud Storage to reliably and scalably store the app’s data. Here’s an overview of the complete architecture:

You’ll also learn how to use Chef and Compute Engine startup-scripts to configure and install software on instances at boot time. There’s a lot of technical content we think you’ll find useful – check out the article, then head over to the GitHub project page (where you’ll also find the tutorial and can ask questions or make suggestions in the issues section) and start building more scalable and resilient apps.

-Posted by Evan Brown, Solutions Architect

One of the most compelling benefits of building and deploying solutions on public cloud platforms is the speed at which you can move from idea to running applications. We offer you a continuum of compute options – from high performance VMs and container-based services to managed PaaS – so you can choose the most suitable option.

For those of you who need a VM-based solution, deploying an application requires that all underlying runtime components and packages be in place and configured correctly. This often becomes a labor-intensive, time-consuming task. Developers should spend most of their time on design and writing code. Time spent finding and deploying libraries, fixing dependencies, resolving versioning issues and configuring tooling is time away from that work.

Today, we're introducing Google Cloud Launcher, where you can launch more than 120 popular open source packages that have been configured by Bitnami or Google Click to Deploy. Deployment is incredibly straightforward: users simply select a package from the library, specify a few parameters and the package is up and running in a few clicks. Cloud Launcher is designed to make developers more efficient, removing operational deployment and configuration tasks so developers can focus on what matters – their application and their users.

Cloud Launcher includes developer tools and stacks such as Apache Solr, Django, Gitlab, Jenkins, LAMP, Node.js, Ruby on Rails, and Tomcat. It also includes popular databases like MongoDB, MySQL, PostgreSQL and popular applications like Wordpress, Drupal, JasperReports, Joomla and SugarCRM. Many of these packages have been specifically built and performance-tuned for Google Cloud Platform, and we’re actively working to ensure these packages are well integrated with Google Cloud Monitoring so you can review health and performance metrics, create custom dashboards and set alerts for your cloud infrastructure and software packages in one place. This will roll out to all supported packages on Cloud Launcher this spring.

When you visit Cloud Launcher, you can search for your desired package, or filter and browse categories such as Database, CRM or CMS.

“We are excited to partner with Google to simplify the deployment and configuration of servers and applications and look forward to continue to expand our integration with Google Compute Engine. Delivering an exceptional user experience is important to us, and Compute Engine gives Bitnami users another great way to deploy their favorite app in just few clicks,” said Erica Brescia, COO at Bitnami Inc.

You can get started with cloud launcher today to launch your favorite software package on Google Cloud Platform in a matter of minutes. And do remember to give us feedback via the links in Cloud Launcher or join our mailing list for updates and discussions. Enjoy building!

-Posted by Varun Talwar, Product Manager, Google Cloud Platform

Currently, the way that doctors and clinicians approach medical treatment is to look at a patient’s symptoms, determine a prognosis, and assign the appropriate treatment. While sensible, this reactive approach leaves a lot open for interpretation and may not hone in on critical clues such as predisposition to genetic mutation or length of time an illness lingered before symptoms appeared. With added insights about genetic makeup, environment, socioeconomic factors and family medical history, doctors and clinicians gain the ability to better tailor and individualize medical treatment.

Doctors need new technologies in order to provide this individualized care. Researchers devoted to personalized medicine can now use big data tools to analyze clinical records, genomic sequences, and laboratory data. All of this valuable data may reveal how differences in an individual’s genetics, lifestyle, and environment influence reactions to disease. And ultimately, it may show us that customized treatments can improve outcomes. To get there, we first need to overcome the challenge of data inundation. Vast health datasets create significant impediments to storage, computation, analysis, and data visualization. The raw information for a single human genome is over 100 GB spanning over 20,000 genes, and the doctors’ handwritten notes are hard for computers (and people) to make sense of. There just aren’t enough tools and data scientists available to leverage large scale health data.

At Northrop Grumman, we’ve prototyped a personalized health analytics platform, using Google Cloud Platform and Google Genomics, to improve knowledge extraction from health data and facilitate personalized medicine research. With our personalized health analytics platform, a genomics researcher would be able to evaluate diseases across a set of patients with genomic and health information. In the past, a simple question about what genetics are linked to a medical condition might take hours, or even days, to execute. By leveraging Google Cloud Platform, in combination with our own algorithms, the analysis of 1,000 patients’ genomic data, across 218 diseases, generates near real-time results.

Northrop Grumman’s analytics platform would provide multiple benefits to researchers. With Google Genomics and Google BigQuery, terabytes of genomics information can be analyzed in only a few seconds, so researchers would see faster research results. This increase in the speed of discovery deepens our understanding of how genetic variations contribute to health and disease. In addition, the scalable storage and analysis tools provided by Google Cloud Platform and Google Genomics reduce costs and increase security when compared against in-house IT systems. And lastly, our platform aims to improve patient health by expanding the knowledge base for personalized medicine with discovery of complex hidden patterns across long time periods and among large study populations.

The Architecture

To make personalized medicine research easier, we architected our health analytics platform in layers. Here they are starting from the base layer, progressing upward:

  1. Massive Data Storage: A storage layer leverages Google Genomics to efficiently store and access genomic data on the petabyte scale and Northrop Grumman knowledge engines and framework to efficiently process and store electronic health records (EHR) data.
  2. Annotation Layer: The annotation layer provides tools to extract clinical knowledge from structured and unstructured EHR data sources. It also includes a database containing aggregated phenotypic and disease associations from public sources. These enable improved functional annotation of the genomic data.
  3. Analytics Layer: The analytics layer is built on top of Google BigQuery and Google Compute Engine to provide high-performance modeling and analytics tools. With these, we can demonstrate genomic risk modeling with analysis time scales of only several seconds.
  4. Visualization & Collaboration Layer: The visualization and collaboration layer provides a framework for high-level analytics, visualization, and collaboration tools.
The system architecture for Northrop Grumman’s personalized health analytics platform. A layered approach is designed to provide an integrated research environment with greater access to storage infrastructure, improved information extraction and annotation tools, more powerful computational platforms and improved collaboration and visualization tools. 

New Breakthroughs in Personalized Medicine

Today our personalized health analytics platform is a prototype, but the results are promising. Our health analytics platform may improve a researchers’ speed of discovery, lower the costs of storing massive amounts of health data, offer better security than in-house IT systems and ultimately lead to breakthroughs in personalized medicine and treatment. If you’re interested in learning more, please contact Northrop Grumman.

- Posted by Leon Li, Future Technical Leader and Systems Engineer at Northrop Grumman Corporation

More and more organizations have learned, through experimentation, how much latent value exists in large scale data and how it can be unearthed via parallelized data processing. Bringing these practices into production requires faster, easier and more reliable data processing pipelines.

Google Cloud Dataflow is designed to meet these requirements. It’s a fully managed, highly scalable, strongly consistent processing service for both batch and stream processing. It merges batch and stream into a unified programming model which offers programming simplicity, powerful semantics and operational robustness. The first two of these benefits are properties of the Dataflow programming model itself, which Google released in open source via a SDK, and is not tied to running on Google Cloud Platform.

Today, we’re announcing another deployment option for your Dataflow processing pipelines. The team behind the fast-growing Apache Flink project has released a Cloud Dataflow runner for Flink, allowing any Dataflow program to execute on a Flink cluster. Apache Flink is a new Apache Top-Level project that offers APIs and a distributed processing engine for batch and stream data processing.

By running on Flink, Dataflow pipelines benefit not only from the power of the Dataflow programming model, but also from the portability, performance and flexibility of the Flink runtime. It provides a robust execution engine with custom memory management and a cost-based optimizer. And best of all, you have the assurance that your Dataflow pipelines are portable beyond Google Cloud Dataflow: via the Flink runner, your pipelines can execute both on-premise (virtualized or bare-metal) or in the cloud (on VMs).

This brings the number of production-ready deployment runtimes for your Dataflow pipelines to three and gives you the flexibility to choose the right platform and the right runtime for your jobs, and keep your options open as the big data landscape continues to evolve. Available Dataflow runners include:

For more information, see the blog post by data Artisans, who created the Google Cloud Dataflow runner for Flink.
We’re thrilled by the growth of deployment options for the portable Dataflow programming model. No matter where you deploy your Dataflow jobs, join us using the “google-cloud-dataflow” tag on StackOverflow and let us know if you have any questions.

-Posted by William Vambenepe, Product Manager

Your new website is growing exponentially. After a few rounds of high fives, you start scaling to meet this unexpected demand. While you can always add more front-end servers, eventually your database becomes a bottleneck, which leads you to . . .

  • Add more replicas for better read throughput and data durability
  • Introduce sharding to scale your write throughput and let your data set grow beyond a single machine
  • Create separate replica pools for batch jobs and backups, to isolate them from live traffic
  • Clone the whole deployment into multiple datacenters worldwide for disaster recovery and lower latency

At YouTube, we went on that journey as we scaled our MySQL deployment, which today handles the metadata for billions of daily video views and 300 hours of new video uploads per minute. To do this, we developed the Vitess platform, which addresses scaling challenges while hiding the associated complexity from the application layer.

Vitess is available as an open-source project and runs best in a containerized environment. With Kubernetes and Google Container Engine as your container cluster manager, it's now a lot easier to get started. We’ve created a single deployment configuration for Vitess that works on any platform that Kubernetes supports.

In addition to being easy to deploy in a container cluster, Vitess also takes full advantage of the benefits offered by a container cluster manager, in particular:

  • Horizontal scaling – add capacity by launching additional nodes rather than making one huge node
  • Dynamic placement – let the cluster manager schedule Vitess containers wherever it wants
  • Declarative specification – describe your desired end state, and let the cluster manager create it
  • Self-healing components – recover automatically from machine failures

In this environment, Vitess provides a MySQL storage layer with improved durability, scalability, and manageability.

We're just getting started with this integration, but you can already run Vitess on Kubernetes yourself. For more on Vitess, check out our website, ask questions on our forum, or join us on GitHub. In particular, take a look at our overview to understand the trade-offs of Vitess versus NoSQL solutions and fully-managed MySQL solutions like Google Cloud SQL.

-Posted by Anthony Yeh, Software Engineer, YouTube

Businesses generate a staggering amount of log data that contains rich information on systems, applications, user requests, and administrative actions. When managed effectively, this treasure trove of data can help you investigate and debug system issues, gain operational and business insights and meet security and compliance needs.

But log management is challenging. You need to manage very high volumes of streaming data, provision resources to handle peak loads, scale fast and efficiently and have the capability to analyze data in real-time.

Starting today, Google Cloud Logging is available in beta to help you manage all of your Google Compute Engine and Google App Engine logs in one place, and collect, view, analyze and export them. By combining Google Cloud Monitoring with Cloud Logging, you gain a powerful set of tools for managing operations and increasing business insights.

The Cloud Logging service allows you to:

  • Ingest and view the log data, so that you can see all your logs in one place
  • Search the log data in real-time, so that you can resolve operational issues
  • Analyze the log data in real-time, so that you can glean actionable insights
  • Archive logs data for longer periods, to meet backup and compliance requirements

Several customers are already using the features for logs viewing and analysis. Here’s what Wix has to say about Cloud Logging.
At Wix we use BigQuery to analyze logs of Compute Engine auto-scaled deployments. We get a large volume of syslog data that we send to BigQuery to get insights on system health state and error rates. We generate time series data and integrate it with Google Cloud Monitoring to monitor system performance and business metrics. This provides us with essential insight for the running of our operations.  - Dmitry Shestak, Engineer@Infrastructure team, Wix

Ingest and view the log data

We understand that it’s important for you to keep all your logs in one place so that you can easily analyze and correlate the data. Cloud Logging solves this problem in several ways:

  • Compute Engine VM logs can be automatically collected for about two dozen log types through the Google packaged fluentd agent, with additional logs possible through custom configuration.
  • Compute Engine Activity logs record all system actions and API calls are enabled by default, with no agent installation required.
  • App Engine logs that include syslog, request logs and application logs are automatically enabled for all App Engine projects, including applications using Managed VM runtimes.

You can view the logs in the Logs Viewer (shown below) in the Google Developers Console by clicking on the “Logs” link under “Monitoring.”
When viewing logs in the Logs Viewer, you can filter results using filter text or drop-downs

Search the log data in real-time

The Logs Viewer lets you quickly investigate and debug issues, correlate logs between different services and find the root cause of an outage. You can filter logs using the drop-down menu and the filter bar, stream logs in real-time ("tail -f") and navigate through your log timeline without awkward next/previous page buttons.

Here’s an example that shows how you can filter Compute Engine logs to see only Compute Engine “Firewall” service logs, pick a particular firewall resource to see the logs and do this for a particular log level.
A filtered view of logs data using the Logs Viewer

Analyze the log data in real-time

Many scenarios will require complex querying of the logs data in real-time. Cloud Logging allows you to easily stream logs to Google BigQuery as they arrive, letting you search, aggregate and view your data using SQL-like queries. To learn how to configure BigQuery export, visit the Exports tab of the Logs Viewer, or see the detailed documentation.

Once you enable BigQuery export, you can stream logs to BigQuery in real-time, and view them there in seconds.
Log data in the BigQuery tables
Let’s explore a couple of examples of how this data and the analysis capability can be really useful to you.

  • Monitoring Code Performance: There are situations when something unexpected happened or is indicative of an imminent problem e.g. “disk space low.” With Compute Engine log data in BigQuery, you can generate a time series and monitor logs with a particular severity. It’s simple, you just query metadata.severity = “WARNING” in the relevant tables. E.g.
     SELECT Count (*) AS total, Date(metadata.timestamp) AS time FROM (Table_date_range(TABLE ID, Timestamp('2015-03-01'), Timestamp('2015-03-12'))) WHERE metadata.severity = "warning" GROUP BY time ORDER BY total;
  • Monitoring Request Latency: High latency leads to poor user experience and failed requests, which can lead to frustrated users and lost revenue. With App Engine log data in BigQuery, you can create time series of latency data by aggregating and charting the “protoPayload.latency” field. You can see unusual latencies in real-time and take steps to resolve the issue.

Archive logs data for longer period

Cloud Logging retains logs in the Logs Viewer for 30 days, but in some scenarios, you need to store log data for a longer period. With the click of a button, you can configure export to Google Cloud Storage. It’s another channel for you to take data to BigQuery, Cloud Dataflow or any Hadoop solution for further processing and analysis of data. This makes it easier to meet your business or compliance requirements. And with the recent launch of Google Cloud Storage Nearline, long-term log storage becomes even more affordable.

Getting Started

If you’re a current Google Cloud Platform user, Cloud Logging is available to you at no additional charge. Applicable charges for using Google Cloud Platform services (such as BigQuery and Cloud Storage) will still apply. For more information, visit the the Cloud Logging documentation page and share your feedback.

- Posted by Deepak Tiwari, Product Manager

Today, we’re making it even easier to deploy Open Source Puppet on Google Compute Engine with Click to Deploy. Now you can quickly set up a Puppet master configured with node_gce and gce_compute modules to provision and manage resources on Compute Engine.

Whether you’re managing one virtual machine or thousands, Puppet can help make system configuration easier. Puppet is a declarative language for expressing system configuration, coupled with an agent/master framework for distributing and enforcing the configuration. In a typical web server deployment, for example, you can define how you’d like Apache configured, and then deploy it to multiple virtual machines easily. Puppet is used by more than 22,000 companies around the world and Puppet Forge has more than 3,100 modules for provisioning and managing a wide variety of system resources.

"We're excited to see Google recognize the benefits and pervasiveness of Puppet, making it the first IT automation tool available as a Click to Deploy solution. This solution lowers the time required to get a functional Puppet master up and running and is the first step toward fully automating the management of projects in Google Compute Engine," said Nigel Kersten, CIO of Puppet Labs.

Learn more about running Open Source Puppet on Google Compute Engine and deploy a Puppet master today. Please feel free to let us know what you think about this feature. You can also contact Puppet Labs for professional services, premium support, or training. Deploy away!

-Posted by Pratul Dublish, Technical Program Manager