Machine Learning in the Oil and Gas Industry Can Revolutionize How Companies Do Business


machine learning

Researchers at oil and gas companies can use machine learning to develop predictive algorithms that can be used to find solutions to the major challenges facing the industry. Image Credit: Flickr user x6e38

Oil and gas companies are always looking to optimize extraction methods, but it can be challenging for computational scientists to piece together the implications of the millions of data points generated by monitoring systems at every stage of the extraction process. Identifying connections between seemingly unrelated processes is critical for making accurate predictions about the best extraction methods, so researchers at oil and gas companies are increasingly turning to the machine learning task of supervised learning to discern relationships.1 One very effective way of doing this is through machine learning.

In supervised learning, a computer is provided with a set of training data comprised of existing inputs and their corresponding responses. Based on this information, the computer identifies patterns in the data in order to create a generalized algorithm that can be used to predict the response values for any input data with which it is presented.2  As a result, any connection between two parametersfrom the most obvious causal factors to the weakest of associationsis incorporated into the final prediction.

In order to take full advantage of the power of machine learning, researchers at oil and gas companies need to be able to work with many large datasets from multiple research departments, and it can be tough to keep it all in order using traditional paper notebooks. With electronic lab notebooks, scientists can organize and manage training datasets for supervised learning much more easily. Once all of the necessary training data has been collected, scientists can then use a workflow authoring application to integrate the data and initiate the machine learning process.

Designing the Training Set

The first step in the supervised learning process is the organization of the experimental data and results that are to be included in the training set. Larger training sets produce more accurate algorithms, so it is important for researchers to maximize the amount of experimental data that is provided to the computer. Electronic lab notebooks support this goal in a few ways:

  • Easing Information Access


When preparing the training set, researchers may not have all of the relevant experimental data on hand. Sometimes, they must include data that was collected months or even years before. Electronic lab notebooks make it easy to access results regardless of when they were collected so that scientists don’t have to go digging through old lab notebooks or try to decipher the notes of other researchers who have since left the lab.

  • Facilitating Collaboration Between Researchers and Departments


In order to design a training set with a sufficient amount of information, it is often necessary to incorporate data from multiple labs and departments. For instance, a training set designed to predict materials that can be used to increase gas pipeline integrity may need to include data from chemists studying the properties of the gas being transported, field scientists identifying the environmental factors that could damage the pipeline, and engineers examining the constructability of the project. Electronic lab notebooks make it easy for all of these researchers, regardless of their physical locations, to share data in real-time. That way, the original training set can include all of their results, and it can quickly be updated as they make new discoveries.

  • Preventing Data Loss


Losing data that could have been incorporated into a training set can be catastrophic for the final algorithm. Not only does a greater quantity of data make it easier for the computer to identify patterns, but if a particularly critical piece of information is left out of the training set, the final algorithm will be fundamentally incapable of making accurate predictions. By preventing data loss, electronic lab notebooks ensure that scientists are preparing complete training sets.

Putting it All Together

After all of the relevant training data has been collected, researchers need to bring it all together into a single training set to be passed to the computer. This can be a major challenge, given that scientists are usually working with an extremely high volume of data from a wide variety of sources. With a workflow authoring application, scientists can automate data aggregation and the initiation of the machine learning process. Many of these applications also offer a graphical user interface, so they can be utilized by researchers who don’t have extensive experience in computer science. As a result, the company won’t need to contract out the work or hire a new IT specialist on whom scientists would have to rely during key steps in the research process.

Organizing and Utilizing Algorithm Outputs

Given the complexity of the challenges that oil and gas companies face today, an algorithm that is produced through the supervised learning process will likely generate complicated outputs that contain massive amounts of information. A workflow application can automate the organization of the data so that results can be easily visualized, analyzed and shared among the researchers and departments that can benefit from them. Based on the algorithm’s predictions, scientists can design future experiments and develop new products that advance the company’s goals.

Of course, even if a large amount of data is included in the original training set, there may be cases in which the algorithm generated in the machine learning process does not make accurate predictions. When this happens, electronic lab notebooks make it easier for researchers to go back to the original training set, figure out what was missing and design a new training set that makes up for the shortcomings. And when the use of a particular training set does lead to the production of an accurate algorithm, electronic lab notebooks give scientists the opportunity to learn from their success. As they design a new training set for a supervised learning task intended to tackle a similar problem, they can quickly compare it to the old training set in order to ensure that they are including the same type and quantity of data.

BIOVIA Electronic Lab Notebooks and BIOVIA Pipeline Pilot make it easier for oil and gas companies to harness the power of machine learning to find innovative solutions to the problems they face. As your firm expands into this area of computational research, these software solutions offer the data management and collaboration capabilities that are essential for success. Contact us today to learn more about our Unified Lab Management offerings.  

  1. “New Ways of Working – Big Data and Machine Learning Are On Their Way,” November 2015,
  2. “Supervised Learning,” 2016,