Coder Social home page Coder Social logo

petrobras / 3w Goto Github PK

View Code? Open in Web Editor NEW
287.0 14.0 58.0 3.03 GB

Promotes development of ML algorithms for early detection and classification of undesirable events in offshore oil wells.

License: Apache License 2.0

Python 0.25% Jupyter Notebook 99.75%
anomaly-detection data-science machine-learning multivariate-time-series-analysis oil-well-monitoring

3w's Introduction

Apache 2.0 CC BY 4.0 Code style Versioning

Table of Content

Introduction

This is the first repository published by Petrobras on GitHub. It supports the 3W Project, which aims to promote experimentation and development of Machine Learning-based approaches and algorithms for specific problems related to detection and classification of undesirable events that occur in offshore oil wells.

The 3W Project is based on the 3W Dataset, a database described in this paper, and on the 3W Toolkit, a software package that promotes experimentation with the 3W Dataset for specific problems. The name 3W was chosen because this dataset is composed of instances from 3 different sources and which contain undesirable events that occur in oil Wells.

Motivation

Timely detection of undesirable events in oil wells can help prevent production losses, reduce maintenance costs, environmental accidents, and human casualties. Losses related to this type of events can reach 5% of production in certain scenarios, especially in areas such as Flow Assurance and Artificial Lifting Methods. In terms of maintenance, the cost of a maritime probe, required to perform various types of operations, can exceed US $500,000 per day.

Creating a dataset and making it public to be openly experienced can greatly foment the development of tools that can:

  • Improve the process of identifying undesirable events in the drilling, completion and production phases of offshore wells;
  • Increase the efficiency of monitoring the integrity of wells and subsea systems, whose related problems can generate invaluable losses for people, environment, and company's image.

Strategy

The 3W is the first pilot of a Petrobras' program called Conexões para Inovação - Módulo Open Lab. This pilot is an open project composed by two major resources:

  • The 3W Dataset, which will be evolved and supplemented with more instances from time to time;
  • The 3W Toolkit, which will also be evolved (in many ways) to cover an increasing number of undesirable events during its development.

Therefore, our strategy is to make these resources publicly available so that we can develop the 3W Project with a global community collaboratively.

Ambition

With this project, Petrobras intends to develop (fix, improve, supplement, etc.):

  • The 3W Dataset itself;
  • The 3W Toolkit itself;
  • Approaches and algorithms that can be incorporated into systems dedicated to monitoring undesirable events in offshore oil wells during their respective drilling, completion and production phases;
  • Tools that can be useful for our ambition.

Governance

The 3W Project was conceived and publicly launched on May 30, 2022 as a strategic action by Petrobras, led by its department responsible for Flow Assurance and its research center (CENPES). Since then, 3W has become increasingly consolidated at Petrobras in several aspects: more professionals specialized in labeling instances, more projects and teams using the resources made available by 3W, more investment in developing the digital tools needed to label and export instances, more interest in including different types of undesirable events that occur in wells during the drilling, completion and production phases, etc.

Due to this evolution, from May 1st, 2024 the 3W's governance is now done with the participation of the Petrobras' department responsible for Well Integrity.

Contributions

We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.

Before you can contribute to this project, you need to read and agree to the following documents:

It is also very important to know, participate and follow the discussions. See the discussions section.

Licenses

All the code of this project is licensed under the Apache 2.0 License and all 3W Dataset's data files (Parquet files saved in subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.

Versioning

In the 3W Project, three types of versions will be managed as follows.

  • Version of the 3W Toolkit: specified in the init.py file;
  • Version of the 3W Dataset: specified in the dataset.ini file;
  • Version of the 3W Project: specified with tags in the git repository;
  • We will exclusively use the semantic versioning defined in https://semver.org;
  • Versions will always be updated manually;
  • Versioning of the 3W Toolkit and 3W Dataset are completely independent of each other;
  • The version of the 3W Project will be updated whenever, and only when, there is a new commit in the main branch of the repository, regardless of the updated resource: 3W Toolkit, 3W Dataset, 3W Project's documentation, example of use, etc;
  • We will only use annotated tags and for each tag there will be a release in the remote repository (GitHub);
  • Content for each release will be automatically generated with functionality provided by GitHub.

Questions

See the discussions section. If you don't get clarification, please open discussions to ask your questions so we can answer them.

3W Dataset

To the best of its authors' knowledge, this is the first realistic and public dataset with rare undesirable real events in oil wells that can be readily used as a benchmark dataset for development of machine learning techniques related to inherent difficulties of actual data. For more information about the theory behind this dataset, refer to the paper A realistic and public dataset with rare undesirable real events in oil wells published in the Journal of Petroleum Science and Engineering (link here).

Structure

The 3W Dataset consists of multiple Parquet files saved in subdirectories of the dataset directory and structured as detailed here.

Overview

A 3W Dataset's general presentation with some quantities and statistics is available in this Jupyter Notebook.

3W Toolkit

The 3W Toolkit is a software package written in Python 3 that contains resources that make the following easier:

  • 3W Dataset overview generation;
  • Experimentation and comparative analysis of Machine Learning-based approaches and algorithms for specific problems related to undesirable events that occur in offshore oil wells during their respective drilling, completion and production phases;
  • Standardization of key points of the Machine Learning-based algorithm development pipeline.

It is important to note that there are arbitrary choices in this toolkit, but they have been carefully made to allow adequate comparative analysis without compromising the ability to experiment with different approaches and algorithms.

Structure

The 3W Toolkit is implemented in sub-modules as discribed here.

Incorporated Problems

Specific problems will be incorporated into this project gradually. At this point, we can work on:

All specification is detailed in the CONTRIBUTING GUIDE.

Examples of Use

The list below with examples of how to use the 3W Toolkit will be incremented throughout its development.

For a contribution of yours to be listed here, follow the instructions detailed in the CONTRIBUTING GUIDE.

Reproducibility

For all results generated by the 3W Toolkit to be consistent, we recommend you create and use a virtual environment with the packages versions specified in the environment.yml, which was generated with conda. Our current recommendation is to use the conda distributed by Miniforge. Download and install Miniforge according to the official instructions. Open a prompt on your operating system (Windows, Linux or MacOS). Make sure the current directory is the directory where you have the 3W. Run the following commands as needed:

$ conda env create -f environment.yml
  • To activate the created virtual environment:
$ conda activate 3W
  • To use the 3W Toolkit resources interactively:
$ python
  • To initialize a local Jupyter Notebook server:
$ jupyter notebook

3w's People

Contributors

afraniomelo avatar andrepaulofm avatar araujomarins avatar pivettamarcos avatar ricardoevvargas avatar thadeuluiz avatar victorrezende avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3w's Issues

Values of simulated instances for event '8' missing.

The sensor values of the simulated instances for event '8' are missing.

They don't have the original values that were available in the previous version (1.0.0), except for the P-PDG tag values, which remain the same.

Vulnerability ID: CVE-2022-39286

Type: Regular
CVSS: 8.8
CWE ID: CWE-250
Severity: High
Date Published 2022-10-26T05:15:00Z
Description: Jupyter Core is a package for the core common functionality of Jupyter projects. Jupyter Core prior to version 4.11.2 contains an arbitrary code execution vulnerability in "jupyter_core" that stems from "jupyter_core" executing untrusted files in CWD. This vulnerability allows one user to run code as another. Version 4.11.2 contains a patch for this issue. There are no known workarounds.
Affected packages: Python-jupyter-core-4.11.1

How to handle memory issues

Hi,
I try to prepare data for my model by transforming original data format into sliding window approach shited 1 step. However, I have memory issues when creating sliding window dataset it makes data size become bigger. Is it still valuable to do resampling become hourly data? if it is better to keep data frequency as is, what should I use to handle memory issues.

thank you

Mais example is not working

Hi, I am interested to use this dataset. my plan is to get same data processing and split so I could get apple to apple comparison with previous work. However, I couldn't run train_lgbm in mais folder with error below:
ImportError: cannot import name 'prepare_data' from 'tune_lgbm' (\3W-main\3W-main\toolkit\mais\training\multiclass\tune_lgbm.py)

Please advice

Vulnerability ID: CVE-2022-45199

Type: Regular
CVSS: 5.3
CWE ID: CWE-400
Severity: Medium
Date Published 2022-11-14T06:26:00Z
Description: Pillow prior to 9.3.0 allows denial of service via SAMPLESPERPIXEL.
Affected packages: Python-Pillow-9.2.0

Vulnerability ID: CVE-2023-24816

Type: Regular
CVSS: 7
CWE ID: CWE-20, CWE-78
Severity: High
Date Published 2023-02-13
Description: IPython (Interactive Python) is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language. Versions prior to 8.1.0 are subject to a command injection vulnerability with very specific prerequisites. This vulnerability requires that the function IPython.utils.terminal.set_term_title be called on Windows in a Python environment where ctypes is not available. The dependency on ctypes in IPython.utils._process_win32 prevents the vulnerable code from ever being reached in the ipython binary. However, as a library that could be used by another tool set_term_title could be called and hence introduce a vulnerability. Should an attacker get untrusted input to an instance of this function they would be able to inject shell commands as current process and limited to the scope of the current process. Users of ipython as a library are advised to upgrade. Users unable to upgrade should ensure that any calls to the IPython.utils.terminal.set_term_title function are done with trusted or filtered input.
Affected packages: Ipython-8.5.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.