IMG_4040 2.PNG

In this section, we present three Other Industry Use Cases of Data Engineering from the transportation, retail, and agricultural industries. The use cases illustrate the importance and usage of Data Engineering. In each example the data collected, the consumer of the data, and the value of the organisation is reviewed. Similarities and potential applications to the pharmaceutical industry are discussed.


When it comes to moving people and making deliveries, few companies are more widespread and more widely-recognised than Uber. Uber is working to make transportation safer and more accessible, helping people order food quickly and affordably, reducing congestion in cities by getting more people into fewer cars, and creating opportunities for people to work on their own terms. [1]

But how do they do it? As it turns out, there’s a great deal of data being collected, produced and visualised behind the scenes —all working to create a more efficient company and impact transportation as a whole. [2]

Data is Uber’s biggest asset. They collect huge amounts of data to enable their business. They collect data from billions of GPS locations and their platforms are processing millions of events. Uber stores data about every trip for prediction about supply and demand. They also collect data on their drivers to understand their vehicle, location, speed, etc.

All of this data is then analysed and visualised to predict wait time, passenger demands, optimal driver locations, etc.

They leverage data visualisation to understand safety, efficiency, and traffic. Visual analytics made their data actionable. [3]

Uber feels that if they don’t use technology to analyse and interpret data, it is a missed opportunity to better understand their business. They don’t sit on their databases. They look for connections in every possible ounce of their data. [2]

This use case may make you wonder whether the pharmaceutical industry uses every ounce of data collected in a clinical trial or electronic health record? Is the pharma industry “sitting” on databases and missing opportunities? And can more data be collected and analysed in real time? If so, the pharmaceutical industry must identify opportunities to leverage both clinical and real-world data and visual analytics to improve the efficiency and effectiveness of drug development.


When Amazon first launched, it had a clear and ambitious mission to offer Earth’s biggest selection and to be Earth’s most customer-centric company. [4]

Their goal is still the same 20 years later as they continue to innovate new solutions to make things easier, faster, better, and more cost effective with a core focus being a customer-centric business.

According to founder and CEO, Jeff Bezos, technology is very important to supporting this focus on the customer. In their 2010 Annual Report (Amazon, 2011) he said:

“Look inside a current textbook on software architecture, and you’ll find few patterns that we don’t apply at Amazon. We use high-performance transactions systems, complex rendering and object caching, workflow and queuing systems, business intelligence and data analytics, machine learning and pattern recognition, neural networks and probabilistic decision making, and a wide variety of other techniques. And while many of our systems are based on the latest in computer science research, this often hasn’t been sufficient: our architects and engineers have had to advance research in directions that no academic had yet taken. Many of the problems we face have no textbook solutions, and so we —happily —invent new approaches… All the effort we put into technology might not matter that much if we kept technology off to the side in some sort of R&D department, but we don’t take that approach. Technology infuses all of our teams, all of our processes, our decision-making, and our approach to innovation in each of our businesses. It is deeply integrated into everything we do”. [4]

Similar to Uber, Amazon collects multitudes of data including inventory levels, costs, sales, customer demographics, buying patterns, and competitor data. The data is both structured and unstructured from many diverse sources. The data is used in a variety of ways but most importantly it is used to understand customer behaviors to drive customer growth, expansion, retention and personalisation. This is aligned with Amazon’s customer-centric business model.

In addition to customer focus analytics, Amazon uses data to optimise their supply chain, streamline product fulfillment, anticipate shipping and recommend products. Since Amazon recognises the importance of technology and a focus on artificial intelligence and machine learning, they make the data widely available within their organisation to ensure data driven management styles. Amazon does not silo or restrict access to their database because they feel silo-ed data results in slow decision making. The databases are consumed across their organisation by both data scientists and machines. [4]

In this example, Amazon utilises data engineering and technology to unlock insights from their customer data and turn it into a competitive advantage. They make their data widely available within their organisation. Is this mindset and approach applicable in the pharmaceutical industry? What are the compliance concerns and risks of providing broad access to experimental clinical data? And how can the pharmaceutical industry utilise real world evidence data as well as historical data from clinical trials, similar to Amazon’s customer database, as a competitive advantage?


The field of agriculture (no pun intended!) has also moved into using formal analytical tools to aid producers (farmers) and oversight (government). A main area of focus is crop prediction. To that end, a variety of data is collected, for example: weather, soil, past crop results. Precise and accurate crop prediction can create an environment where yields are known and can be priced accordingly.

This analytical area is different from pharma in two major ways:

1. Predictor variables are highly environment-dependent (e.g., rain, air temperature) instead of highly controlled (e.g., fixed-dose regimens)

2. Regulatory requirements are not as stringent

Still, there is some potential to borrow tools and ideas from agriculture research to pharmaceutical research.

Here are four examples of software tools used in agriculture analytics:


The HARVIST (Heterogeneous Agricultural Research Via Interactive, Scalable Technology) project integrates multiple Earth Science data sources into a single graphical user interface that allows for the investigation of connections between different variables. In particular, the focus is on relationships between weather and crop yield. [5]

AgrometShell (AMS)

This software is designed to facilitate monitoring of the growing season, for governmental agency use. It is license-free, and bridges the gap between agromet, remote sensing and socio-economic datasets. [6]

Crop Yield Predictor

This software was designed as an interactive decision tool to predict crop yields and economic returns for deficit-irrigated crops. Users can designate potential irrigation schedules to optimise yields and net returns. These schedules can be tested with a range of annual precipitation to find yield and income risks from wet, average, and dry years. Alternative irrigation schedules could include pre-season irrigation, irrigation amounts and frequency, earlier or later commencement of irrigation, and earlier or later cessation of irrigation. [7]

Descartes Labs

This platform allows immediate, easy access to global data - historically and in near real-time. It uses Python APIs to access data about the earth within seconds.

Users can write Python functions, and run them on thousands of images in parallel. They can run NDVI, BAI, NDSII, SAVI, or any other calculations over each pixel within their area of interest. As an example, one can fuse satellite data with weather data and Descartes Labs-generated data sets into a single global-scale analysis using the Descartes Labs platform. [8]

In all of the industry use cases above, data engineering and technology is utilised to unlock insights. Might there be applications of such concepts in pharmaceutical research?


[1] “About Uber - Our Story - Vision for Our Future | Uber.” Driver Requirements | How To Drive With Uber | Uber , 2018, .

[2] Jacob, Sherice. “How Uber Uses Data to Improve Their Service And Create The New Wave of Mobility.” NEILPATEL ,

[3] “Engineering Intelligence Through Data Visualization at Uber.” Uber Engineering Blog, 2016,

[4] Chaffey, Dave. “ Case Study - 2018 Update.” Smart Insights, Smart Insights, 14 Aug. 2018,

[5] “About HARVIST.” NASA, Jet Propulsion Laboratory, California Institute of Technology,

[6] Hoefsloot, Peter. “AgrometShell.” Hoefsloot Spacial Solutions, Peter Hoefsloot,

[7] “Crop Yield Predictor.” Crop Water Allocator | K-State Mobile Irrigation Lab, K-State Research & Extension Mobile Irrigation Lab,

[8] “Descartes Labs: Home.” Descartes Labs, Descartes Labs,