Strategies for Testing Client-Server Interactions in Mobile Applications ∗ How to Move Fast and not Break Things Niranjan Tulpule Google Inc. [email protected]

Abstract

source repository is the only way to sustain the fast development and release cycles on the server

Modern smartphone ecosystems have their unique set of constraints which makes testing the contract between client and servers hard. In this paper we will describe the Google+ team’s approaches to solving this problem. We will describe our testing philosophy followed by a couple of frameworks and test design patterns that we have found to be really useful in a client-server testing context. Categories and Subject Descriptors ging] Keywords

Testing Philosophy Testing software at Google’s scale is a unique challenge due to the sheer volume of code being committed to its source repository. We routinely see more than 20 changes per minute which require more than 75 million automated test cases to be run each day[1]. The continuous integration system uses dependency analysis to determine all the tests a change transitively affects and then runs only those tests for every change[2]. In spite of these optimizations, a poorly designed test suite can result in wasteful use of resources and more importantly can cause productivity issues for engineering teams. The Google+ engineering team tries to adhere to the following guiding principles so that we can deliver high quality software at a rapid pace.

D.2.5 [Testing and Debug-

mobile;protocol testing;

Background A vast majority of native mobile applications talk to a backend server for various reasons. Some applications such as mobile games only need to communicate with their servers for authentication, downloading new levels of play and storing scores and leaderboards while multiple player games require a constant, active internet connection. Other applications such as social networking applications, local search and discovery apps need to be connected to the internet to function. The Google+ server and web engineering teams release new versions of services multiple times per week but release the native versions of the iOS and Android applications less frequently, usually twice monthly. This mismatch in the client and server code deployment rhythm makes it much more likely that an engineer on the server team can make a change which breaks the client-server contract between the mobile applications and the server. We often encounter users running old versions of their applications requiring that the server changes have to be backwards compatible for a few versions. Testing this client-server contract in an automated fashion, which provides the right set of testing safety nets preventing code with defects that break this contract from being committed to the

• We prefer to prevent code changes with defects being commit-

ted to the source repository over finding these defects. Most of our automated test infrastructure on the Google+ team is designed to run tests as a part of the process of submitting a change to the source repository. • We prefer to keep our test sizes small and encourage all engi-

neers to write unit tests before they can tackle other tests. All our functional tests are optimized for a faster turnaround time which sets in motion a virtuous cycle of developers authoring and sustaining more such tests and helping us achieve higher code coverage. • We keep all of our test environments ”hermetically sealed” [3]

which means that our tests should not depend on any ancillary servers and resources which cannot be compiled and deployed on the same test runner. We will now take a look at two different approaches used to test the client-server interactions for the Google+ mobile applications for the Android and iOS platforms.

∗ Experience

report based on the work done by the Google+ Mobile Engineering Teams

Monolithic Test Setup for Client and Server Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MobileDeLi ’13, October 28, 2013, Indianapolis, Indiana, USA. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2603-2/13/10. . . $15.00. http://dx.doi.org/10.1145/2542128.2542134

In our initial iteration we used a monolithic approach to develop our client-server test framework. The test set up process works as follows: First We build and deploy our server stack on a remote test execution agent using a Hermetic Server configuration similar to the one described by Narla, Salas [3]. Second The test runner processes start a specially crafted version of the Android Emulator on the same remote test execution

19

and data stores, which were used to generate the ”Golden” responses. This ensures that there is consistency of test data. We then launch our test suite which uses the JSON data stored in these ”Golden Requests” to send request to the Server under Test and when the server responds, it compares the data in those responses to the ”Golden” responses using a JSON comparison tool. Smart JSON diffing This tool does a smart comparison of the JSON files. There are a few reasons why the actual responses sometimes differ from the stored ”Golden Responses” and to account for these nuances we developed a smart JSON diff tool. First Actual response has an extra field. This is usually harmless as our wire formats are designed to be extended and mobile clients will not fail. If a server developer’s change gets flagged then they can easily update the ”golden” JSON responses.

Figure 1. Google+ Mobile Client-Server Test Architecture

Second Value of a field is different in actual and golden responses. There are fields that have dynamic data such as timestamps which are usually harmless for the tests. When we store the golden responses using the ”Record Mode” we annotate it to ignore values of those dynamic fields. On the other hand if the field contains a static data and the value has changed it means that this change might break mobile clients if they using this field. This will cause the test suite to fail.

agent configured to divert the network traffic by the Application Under Test (the Google+ Android application) to our local server and also enable some options for logging and debugging. Third The test runner processes then install our Application Under Test on this emulator and then run our suite of functional tests on the application.

Third Actual response doesnt have a field that is expected by ”golden” response. This usually indicates that the mobile clients will break and hence the test suite will fail.

This approach made it easy for engineers to author tests, especially for engineers who were new to mobile app development but had a background in developing web services such as Gmail, Google Maps, etc. After running these tests for a few months we realized that they were very costly in practice as they were quite flaky and hard to debug and isolate the cause of failures. They were being used to validate server behavior as well as client functionality. Through analysis of our test usage and defects, we realized that almost 99% of the changes being made by the developers were either purely server changes or purely client changes and protocol inconsistencies accounted for a vast majority of the defects. We were consuming too much time and resources testing server code when only the client code changed and vice versa.

Conclusions This approach has improved the quality and coverage of our tests dramatically. The running time for tests was cut down from over 30 minutes for every supported version of the application to less than 5 minutes. Over a six month period, the replay tests exhibited less than 1% flakiness over 1000 daily invocations of the tests. These test have prevented many server defects which would have affected the Google+ mobile applications.

Acknowledgments I would like to thank the engineers who have designed, developed and evangelized the testing solutions described above: Eduardo Bravo, Matthew DeVore, Grygorii Luchytskyi and Sreevidya Tangellamudi. We owe a huge amount of gratitude to all the talented engineers across Google who have contributed towards building an amazing developer productivity ecosystem.

Testing the Protocol We decided to try a new approach in which rather than installing every supported version of the application and having to maintain a separate test suite for them, we decided to replace the application by a lightweight stub and use it as a source of requests to the server, which the test suites could then use to verify the correct behavior of the protocol for the given version. Figure 1 shows a simplified view of this test setup. This approach has three major components:

References [1] John Micco. Continuous Integration at Google Scale At the EclipseCon 2013, Boston, MA. URL eclipsecon.org/2013/sites/ eclipsecon.org.2013/files/Continuous\%20Integration\ %20at\%20Google\%20Scale.pdf [2] Pooja Gupta, Mark Ivey and Jon Penix. Testing at the speed and scale of Google On the Google Engineering Tools Blog URL google-engtools.blogspot.com/2011/06/ testing-at-speed-and-scale-of-google.html [3] Chaitali Narla and Diego Salas. Hermetic Servers On the Google Testing Blog URL googletesting.blogspot.com/2012/ 10/hermetic-servers.html

Test Generator or ”Record Mode” We instrumented the Google+ mobile applications on iOS and Android to log the contents of each requests to our server and the corresponding responses. The data contained in requests and responses is serialized using the JavaScript Object Notation (JSON) format. We added a special ”Record Mode” option to the versions of these applications used for testing. When we run our application in the ”Record Mode” all the requests and responses get stored within the application and can be easily extracted out and used for testing. We call these stored requests and responses as ”Golden”. Test Suites or ”Replay Mode” We then launch our Servers under Test using the Hermetic Servers configuration. They are configured to use the same user accounts

20

Strategies for Testing Client-Server Interactions ... - Research at Google

tive versions of the iOS and Android applications less frequently, usually twice monthly. ... rights licensed to ACM. ACM 978-1-4503-2603-2/13/10. . . $15.00.

378KB Sizes 3 Downloads 377 Views

Recommend Documents

Unsupervised Testing Strategies for ASR - Research at Google
Similarly, web-scale text cor- pora for estimating language models (LM) are often available online, and unsupervised recognition .... lated to cultural references, popular names, and businesses that are not obvious to everyone. The cultural and ...

General Algorithms for Testing the Ambiguity of ... - Research at Google
International Journal of Foundations of Computer Science c World .... the degree of polynomial ambiguity of a polynomially ambiguous automaton A and.

State of Mutation Testing at Google - Research at Google
mutation score, we were also unable to find a good way to surface it to the engineers in an actionable way. ... actionable findings during code review has a negative impact on the author and the reviewers. We argue that the code .... knowledge on ari

Strategies for Foveated Compression and ... - Research at Google
*Simon Fraser University, Vancouver ... Foveation is a well established technique for reducing graphics rendering times for virtual reality applications [​1​] and for compression of regular image .... be added to the system, which may bring furth

Evaluation Strategies for Top-k Queries over ... - Research at Google
their results at The 37th International Conference on Very Large Data Bases,. August 29th ... The first way is to evaluate row by row, i.e., to process one ..... that we call Memory-Resident WAND (mWAND). The main difference between mWAND ...

Distributed Training Strategies for the Structured ... - Research at Google
ification we call iterative parameter mixing can be .... imum entropy model, which is not known to hold ..... of the International Conference on Machine Learning.

Learning from User Interactions in Personal ... - Research at Google
use of such interactions as features [2] or noisy labels [23] ... This approach is schematically described in Figure 2. ..... a-bit-about-bundles-in-inbox.html, 2014.

packetdrill: Scriptable Network Stack Testing ... - Research at Google
network stack implementations, from the system call layer to the hardware network ..... ing receiver ACK that completed the three-way hand- shake [10], and TFO ...

Advances in Continuous Integration Testing ... - Research at Google
Distributed using internal version of bazel.io to a large compute farm. ○ Almost all testing is automated - no time for ... A test is affected iff a file being changed is present in the transitive closure of the test dependencies. ... about what wa

Taming Google-Scale Continuous Testing - Research at Google
time of a server; these are termed “flaky” tests [9] [10]. A flaky test may, ...... [10] “Android flakytest annotation,” http://goo.gl/e8PILv, 2016-10-05. [11] Q. Luo, F.

Comparing Consensus Monte Carlo Strategies ... - Research at Google
Dec 8, 2016 - Data centers are extremely large, shared, clusters of computers which can contain many ... A standard laptop with 8GB of memory can hold 1 billion double- ..... 10. 15 density. 0.2. 0.4. 0.6. 0.8 exact means. SCMC. MxCMC.

social interactions, stigma, and hiv testing
influence of payment on the decision to test is twofold: direct, as a result of ..... that if they took their entry cards to one of the six partner VCT centres during.

Mathematics at - Research at Google
Index. 1. How Google started. 2. PageRank. 3. Gallery of Mathematics. 4. Questions ... http://www.google.es/intl/es/about/corporate/company/history.html. ○.

Simultaneous Approximations for Adversarial ... - Research at Google
When nodes arrive in an adversarial order, the best competitive ratio ... Email:[email protected]. .... model for combining stochastic and online solutions for.

Asynchronous Stochastic Optimization for ... - Research at Google
Deep Neural Networks: Towards Big Data. Erik McDermott, Georg Heigold, Pedro Moreno, Andrew Senior & Michiel Bacchiani. Google Inc. Mountain View ...

SPECTRAL DISTORTION MODEL FOR ... - Research at Google
[27] T. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional,. Long Short-Term Memory, Fully Connected Deep Neural Net- works,” in IEEE Int. Conf. Acoust., Speech, Signal Processing,. Apr. 2015, pp. 4580–4584. [28] E. Breitenberger, “An

Asynchronous Stochastic Optimization for ... - Research at Google
for sequence training, although in a rather limited and controlled way [12]. Overall ... 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) ..... Advances in Speech Recognition: Mobile Environments, Call.

UNSUPERVISED CONTEXT LEARNING FOR ... - Research at Google
grams. If an n-gram doesn't appear very often in the training ... for training effective biasing models using far less data than ..... We also described how to auto-.

Combinational Collaborative Filtering for ... - Research at Google
Aug 27, 2008 - Before modeling CCF, we first model community-user co- occurrences (C-U) ...... [1] Alexa internet. http://www.alexa.com/. [2] D. M. Blei and M. I. ...

Quantum Annealing for Clustering - Research at Google
been proposed as a novel alternative to SA (Kadowaki ... lowest energy in m states as the final solution. .... for σ = argminσ loss(X, σ), the energy function is de-.

Interface for Exploring Videos - Research at Google
Dec 4, 2017 - information can be included. The distances between clusters correspond to the audience overlap between the video sources. For example, cluster 104a is separated by a distance 108a from cluster 104c. The distance represents the extent to

Voice Search for Development - Research at Google
26-30 September 2010, Makuhari, Chiba, Japan. INTERSPEECH ... phone calls are famously inexpensive, but this is not true in most developing countries.).