Yesterday I encountered a really good example of how using an online lookup of reference data from a survey can dramatically improve performance. The follow image shows the CPU usage on an 8 processor data collection server at the point when a single survey was modified to use online lookups instead of downloading all the reference data to the phones.

The heavy load prior to the switch was negatively impacting all users.
The cause of the problem
A large number of field workers were updating a survey with a hundred thousand records that was the source of reference data for other surveys. These people were working online with data sims in their phones. So as soon as a field worker completed a survey the results were sent to the server updating the reference data. That automatically triggered the downloading of the 100 thousand records, now including updated reference data, to all the other field workers. Each field worker would have the latest reference data each time they completed a survey but the load was high and response times were getting slow.
Choosing a solution for using reference data
The decision about how to access reference data depends on the type of work your field workers are doing and whether or not they have a network connection while they are doing it. Here are three scenarios to help you choose:
Scenario 1 – Working offline without a network connection
In this scenario your workforce might return to a central location that has wi-fi every evening where they will refresh their phones, uploading any data they have collected and downloading changes to surveys and reference data. The next day they head back out into the field and work offline.
The surveys can use the following functions to get reference data from other surveys and from csv files:
- search. To get choices for a select question.
- pulldata. To lookup reference values.
So what happens if the reference data gets really large and is updated continuously by the field worker. Well this is not actually a problem. The server will not be overwhelmed since the workers are only synchronizing their phones once per day. They should actually press the refresh button twice, maybe once in the evening and once in the morning. The first time it will upload all the new data and the second time they will download the updated reference information that includes all of the previous days work from all of the field workers.
Scenario 2 – working online with a permanent network connection
Now the field workers are submitting data as they complete surveys and getting updates to reference data whenever that changes. Because they are online you can use the following functions to get reference data:
- lookup_choices. To get choice for a select question
- lookup. To lookup reference data.
Now when completing a survey, if reference data is required, there will be a small delay while the network call is made and then the required reference data will be made available. It is no longer necessary to download large amounts of reference data, most of which will never be used, and store it on the phone. If the reference data is large you should find this a faster approach on the phone as you no longer need to wait for the reference data to be loaded into the survey when you open it.
If your reference data files are small, only a few thousand records, or they are not updated often then you can use either the online or the offline functions in this scenario.
Scenario 3 – A hybrid approach
In this scenario perhaps the workers are online most of the time and want immediate updates to reference data but occasionally they wander outside of the network and still need to lookup those references.
At the moment you would need to use the offline approach and look out for potential performance problems from large frequently updated sets of reference data. However we do have a change request pending to only synchronise the updates to the reference data and not to download all the data each time a single record changes. This change request will allow the hybrid scenario to scale without causing performance problems. Let us know if you need it and I will increase its priority.







