This section gives insight into change management needs due to Culture, Benefits and Challenges.


Whenever a new methodology is introduced to an existing team, organisation or sector it is important to understand the prevailing culture of that incumbent organism. As the phrase often attributed to Peter Drucker states; “Culture eats Strategy for breakfast…” [1]   If uptake depends on human decisioning then it doesn’t matter how innovative the technology, efficient or sensible the new methodology may be; if the people responsible for integrating, deploying and maintaining this new way of working don’t, for whatever reason, want it to happen.

For the pharmaceutical industry we must look inside at where we are, where we’ve come from and indeed where we’re heading to – to gain a holistic understanding of our industry culture – in order to make the case, for or against, with regards to adopting a data engineering approach in the clinical data management space. 

Three main drivers seem to recur in data engineering discussions within the pharmaceutical industry:

Innovative Science

Traditionally the pharmaceutical sector would initiate drug discovery through two main avenues. Principally through internal development, discovering research projects that could be carried forward into human experimentation models, within an acceptable budget. Although this approach is highly rewarding for an organisation, to see a compound go from discovery to marketing approval, it is rarely enough to sustain a robust product development pipeline.

A second route would come from product in-licensing, co-development partnerships, and/or mergers and acquisitions to bring new products into the organisations’ development pipeline. Ideally this approach will lead to an eventual submission to the health authorities for marketing approval, although this is not always the case. This approach often requires significant up-front investment, however, may also yield a high return.

As early as 2011, even before the Big Data boom, MIT’s Sloan School of management provided evidence that data-driven organisations performed 5-6% more effectively than their non-analytic cousins.[2]    

As such, there is now an increasing move towards a third way; that of the pharmaceutical organisations drawing more attention to Data Engineering and Data Science such that they may, leverage their vast amounts of historical clinical study data and augment it with external revolutionary data collections, such as wearables, genomics and electronic health records. This approach offers new opportunities to conduct research, leveraging data to its fullest potential. Combining this approach with the two described above creates multiple avenues for innovation for the organisation.

Patient Protection

Patient safety is, rightfully so, a major concern that needs to be addressed when introducing the topic of Data Engineering within the pharmaceutical industry. Risk needs to be managed effectively to ensure the safety of patients, hence the structure incorporated in all sponsor companies for identifying, managing and reporting serious adverse events. Regulations have developed over time and are continually being monitored and reviewed to ensure data around this important and sensitive area are handled appropriately [for instance 21 CFR Part 11].

However, we must not be so averse to risk that we miss the potential improvement opportunities offered by data augmentation, automation, enrichment and standardisation (key components of Data Engineering). Hiring programmers to re-program tasks which are standard use within the organisation could be deemed an inefficient use of a resource which could be better utilised if invested in discovering further treatments. Data Engineering, with the appropriate level of checks and balances, could offer a “safer” data environment by adopting “write-once, use-often” programs building in automation where value added. Such a change may increase overall productivity while also mitigating the risk of human error.

Value Generation

As with other industries, value generation is an important factor to ensure cost-efficient drug development in the pharmaceutical industry. The cost of investing in research, change management and new technology, often with no guarantee of return, has long been a closely monitored metric. Investing in Data Engineering techniques that enable organisations to reduce costs and shorten timelines while also utilising Data Science across legacy datastores delivers a double impact of value generation for executive leadership teams.

Therefore, utilising Data Engineering to preserve or advance innovative science, patient protection and value generation within the organisation may enable quicker adoption and motivation for the change.

With these cultural precepts in mind we can begin to look into the benefits and challenges of applying emerging techniques in Data Engineering within the pharmaceutical industry.


With much of study data being similar, automating the collection, validation, reporting and analysis could vastly reduce timescales for set-up, database lock and time to submission. If eCRF screens are created using standard variables for patient numbers, investigator details, substance lists etc., this can free up the data architect/engineer to determine how best to deal with non-conformed data (such as exploratory biomarker and real-world evidence data). 

Data validation, which can be a main source of contention in a study timeline, if automated, can reduce human error, whilst automatically auditing how errors are dealt with. Regulations are clear on how modification to data items need to be handled appropriately. A locked down standardisation routine can manage a majority of “common” data errors (e.g. missing or incorrect zip code for an address) with a formal method of query management auditing the automatic responses from investigators.

If centralised data is also adopted, this lays a good foundation for Data Science to come to the fore. Being able to analyse across studies to determine answers to “never before thought of” questions is a huge benefit to companies. It can also pinpoint problems such as patient scarcity, by identifying patients from previous studies who may benefit from new trials. [3

Risk-based monitoring and adaptive trials already look to centralised data for their source, however due to the siloed study culture in most organisations these departments often rely on pooling data after the event, rather than a fully automated, near-real-time, centrally populated conformed database. 

These are just a few examples of how data engineering techniques can improve quality and efficiency when processing the information captured in clinical trials. As we further explore Data Engineering and learn of the many techniques and tools available, many more applications will arise, bringing forth greater value and benefit to the industry. 


The disadvantages of adopting data engineering generally fall into the following categories:

  • Initial Investment

  • Ongoing Maintenance

In order to create a data engineering culture within an organisation time and money is required, yet this is often at odds with the financial model, as this investment is not directly associated to a compound our clinical study.

Additionally, a certain amount of centralisation is required to enable digital connectivity between systems/tools, bringing along additional challenges as it relates to security, data protection and maintenance of the study blind.Therefore, strategic investment is required, with a robust security model, which needs executive buy-in from senior leaders who can envision the longer-term benefits of data automation. 

The type of upfront organisational design required is in the areas of data capture, data validation, data storage and reporting/submission. Data engineering works most optimally within a set of standards, which the pharmaceutical industry has no shortage of. For example, CDASH for data collection and SDTM/ADaM for data submission can be utilised effectively within an engineered data flow. Although, each study needs to have architects early in study set-up, to ensure standards, when applicable, are adhered to. When current standards don’t apply, the architect determines the appropriate method to cater to these unique data elements and bake them into future standards, if appropriate. 

From an ongoing perspective, organisations need different skills to that of the traditional clinical Data Manager. The ability to think big picture, yet have a precise focus on detail to ensure individual studies don’t lose their innovative science, whilst conforming to centralisation, is essential for the Data Engineer who will create and maintain these automated data pipelines. An automated test harness needs to be employed to ensure the ongoing maintenance does not break what is already being captured. Lastly, a pragmatic mindset to ensure overengineering doesn’t occur is essential for an optimized data engineering model. [4]


[1] Campbell, David J., et al. Business Strategy: an Introduction. Palgrave Macmillan, 2011.

[2] Brynjolfsson, Erik, et al. “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” SSRN, 22 Apr. 2011, https://ssrn.com/abstract=1819486.

[3] Swanger, David. “Abundance of Data + Scarcity of Patients = Clinical Complexity.” Geeks Talk Clinical, Medidata, 2018, https://blog.mdsol.com/abundance-data-scarcity-patients-clinical-complexity.

[4] Savva, Nicos, and Gabriel Straub. “Making Big Data Deliver.” London Business School, London Business School, 2018, www.london.edu/faculty-and-research/lbsr/making-big-data-deliver.