Coder Social home page Coder Social logo

equinor / oneseismic Goto Github PK

View Code? Open in Web Editor NEW
22.0 9.0 13.0 2.35 MB

License: GNU Affero General Public License v3.0

Python 27.10% CMake 1.08% C++ 36.33% Go 25.84% Dockerfile 1.65% C 0.90% Shell 0.68% JavaScript 2.49% HTML 0.46% Bicep 3.47%
seismic hacktoberfest

oneseismic's Introduction

oneseismic

Next-generation seismic in the cloud.

Please note that oneseismic is under heavy development, and parts of this document may be outdated.

What is oneseismic?

Technically, oneseismic is an API for reading seismic data in an easy and scalable manner. The biggest challenge with seismic is the size of it - single surveys span from a few hundred megabytes to tens or even hundreds of gigabytes. The standard format for storing seismic, SEG-Y, is unfit for efficient data extraction and querying.

The guiding design principle and focus of oneseismic is programs first - it is the idea that if you can build a solid foundation then imagining new applications on top is fast and easy. The aim is to provide a powerful feature that empowers developers and geoscientists so they can develop new and novel applications, and get results faster and easier.

The best way to illustrate this is with a motivating example:

import oneseismic
import oneseismic.simple as simple

cubeid = '...'
client = simple.client('https://oneseismic.url')
inline24    = cube.sliceByLineno(cubeid, dim = 0, lineno = 24 )().numpy()
crossline13 = cube.sliceByLineno(cubeid, dim = 1, lineno = 13 )().numpy()
time220     = cube.sliceByLineno(cubeid, dim = 2, lineno = 220)().numpy()

This Python program gets three slices - an inline slice, a crossline slice, and a time slice, and makes them immediately available. This simple example only demonstrates the fetching of arbitrary data, but we can also do something useful with it:

vintage1 = '...'
vintage2 = '...'
client = simple.client('https://oneseismic.url')
proc1 = client.sliceByLineno(vintage1, dim = 1, lineno = 13)()
proc2 = client.sliceByLineno(vintage2, dim = 1, lineno = 13)()
slicev1 = proc1.numpy()
slicev2 = proc2.numpy()

diff = slicev2 - slicev1

This program computes the difference between the samples between two vintages of the same field.

Notice that instead of immediately realising the data as a numpy array, this program uses the temporaries proc1 and proc2. Creating a process will schedule the fetch, but not actually start serve the data right away. This makes the oneseismic server process both queries in parallel.

Is oneseismic a database?

That depends on the definition of database. The goal of oneseismic is not to be a universal storage solution for seismic data, but rather an efficient way to work with and requests bits and pieces of seismic data. In that sense, it is a database.

Why not SEG-Y?

SEG-Y was designed for data exchange, which means density, single-file and in-band metadata are useful properties because it allows for space-efficient and lossless transfer between parties. SEG-Y works well for this (with the exception of rampant SEG-Y standards violations). However, SEG-Y is unfit for modern computer programs:

  1. Meta-data is interleaved with the data. That means the file can be split multiple places (good for tape!), but also means that the data cannot be contiguously copied.
  2. SEG-Y is very trace-oriented, but there is no requirement that traces are laid out for efficient access of 3D shapes. Well-organised files are laid out for efficient inline access, but without an index it requires extra information, or a linear scan.
  3. Even if the file is well organised, reading a single time/depth slice or horizon is very time consuming.

Installation

Oneseismic is primarily an API, so there is no installation - the system is up and running and be queried at any time. The Python package is a user-friendly way to use the API, and can be installed with:

pip install oneseismic

However, the API is perfectly usable without going through the Python package. Please note that it is still under heavy development, and may change with little notice.

Examples

Developer's corner

This section is for the developers of oneseismic, and describes the architecture and design choices that power oneseismic.

Offline partitioning

When uploaded, the volume is partitioned into equally-sized chunks, which are then addressed by its coordinates in this coarser grid. This process is time consuming, but is only performed once. With a unique address <volume>/<resolution>/<partioning>/<chunk>, which can be easily computed from any coordinate, oneseismic can efficiently get arbitrary shapes from large volumes.

Terminology

A lot of the familiar terminology in oneseismic is lifted from the unix family of operating systems, since the concepts in use in oneseismic map pretty well onto the concepts in unix. This is an incomplete list of terms used throughout code and documentation:

process : The process is the high-level procedure from a user request until data is delivered.

PID : The PID, process identifier, is the key used to identify a single process across the sub systems. Please note that unlike traditional unix systems, this is represented as a string, and not a single integer.

A day in the life (of a request)

License

The server is licensed under AGPL v3+, while the connector and python libraries are licensed under the LGPL v3+.

oneseismic's People

Contributors

achaikou avatar anetteu avatar autt avatar christdej avatar dependabot[bot] avatar eivindjahren avatar erlendhaa avatar jokva avatar kjellkongsvik avatar reedonly avatar sindreosnes avatar sveinung-r avatar terryhannant avatar zaker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oneseismic's Issues

Add CI

Set up continuous integration on seismic-cloud

closed by #9

API: Log to "central repository"

  • Write logs to database (pgresql for mobility?)
  • Write an error handler / severity handler for logging
  • Reqrite/ refactor out existing log/ logger calls

Benchmark suite

There are several techniques we can try to improve performance, but without a simple and correct way of comparing results its impossible to experiment.

So what we need is a benchmark suite - a set of test queries (lines, horizontal slices, curtains, horizons and horizons with window) with timing reports.

The timing reports from the driver program can be a part of the response itself, at least with a parameter, as well as a wallclock timing start-to-finish from the client.

Obviously, on top of that, the response should be tested for correctness, i.e. that the correct data is returned.

Persist profiling numbers server side

  • Provide file/pipe for stitch so the wrapper API can read profiling numbers from stitch
  • Write profile to persistence layer
  • Expose endpoint that provides access to profiling data for sessions, i.e (/profile/{sessionID})

API: API to SC integration

As a minimum we should discuss the elements below with regards to integration between the two components:

  • Manifest
  • Byte stream
  • Error handling (Including what should be propagated to the client calling the API)

API: Integration test in CI

Add an integration test with docker containers. Use the python client and some setup to run a little battery of test surfaces.

Integration tests:
Read manifests from Path

API: API definition

So we need an API definition and a swagger file. Arguably, the API is the real product here, and needs a lot of love.

The API definition work can be done in parallel with the benchmark suite (at least for the most part). Both in terms of implementation, testing, and design.

API: Manifests should be stored in a DB

The manifest can be read and stored in memory on startup of the API

  • Create CosmosDB in Azure
  • Access CosmosDB with MongoDB driver for Golang
  • Read Manifest from DB

API: Add profiling middleware

We need a profiling middleware thatt times the request. And sets a header for the client so it knows how long the server is working on the request.

Realistic (very large) synthetic SEGY dataset prep

@maxschub

During our investigation efforts, in order to undertand the demands of processing, querying a SEGY dataset on any Azure data service(s), it is crucial that we work with a realistic (very large) synthetic SEGY dataset.

  • How big should we go here (time, surface coverage area, approx. file size)?
  • When can you provide it?

Thanks!

API: Integrate with core grpc server

Core is going to be a grpc server implemented in python. We will have to implement a layer that interfaces with this new direction. The core will have the opportunity of running as a independent server, but we will be doing the deployment/management of this.

Sample data prep and queries

  • @jokva Export the Valve (SEGY) sample dataset as CSV. Place it under a new container/folder in the segyngrmdfdev Azure storage account.
  • @jokva Provide sample cube queries (code) for the several different cases: Axis parallel slicing, Curtain Section, Amplitude/Attribute map and the most complex Generalization/random points in cube.

Investigate, evaluate Azure data services

  • @CarlosSardo Investigate, evaluate what Azure data services fit best, several options:
  • Azure Data Explorer: fast and highly scalable data exploration service; ideal for analyzing large volumes of diverse data; Scales quickly to terabytes of data, in minutes, allowing rapid iterations of data exploration to discover relevant insights; Supports analysis of high volumes of heterogeneous data (structured and unstructured); Offers an innovative query language, optimized for high performance data analytics.
  • Azure HDInsight: popular open source frameworks—including Apache Hadoop, Hive, Spark, and Kafka—using Azure HDInsight, a cost-effective, enterprise-grade service for open source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open source ecosystem with the global scale of Azure.
  • Azure Databricks: Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform; Dynamically autoscale clusters up and down;

API: Create monitoring dashboard

Ingestion: Test segmentation outputs segments with the right name for primary/secondary keys

        {
            "binInfoStart": {
                "crosslineNumber": 20,
                "ensembleXCoordinate": 0.0,
                "ensembleYCoordinate": 0.0,
                "inlineNumber": 1
            },
            "binInfoStop": {
                "crosslineNumber": 24,
                "ensembleXCoordinate": 0.0,
                "ensembleYCoordinate": 0.0,
                "inlineNumber": 1
            },
            "primaryKey": 1,
            "traceStart": 0,
            "traceStop": 4
        },

This is fine when inline/crossline are 189/193, but check with what SEGYScan outputs for other keys, and verify they're the same.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.