das-rcn / das_metadata Goto Github PK
View Code? Open in Web Editor NEWTools for standardizing Distributed Acoustic Sensing (DAS) metadata
License: Creative Commons Attribution 4.0 International
Tools for standardizing Distributed Acoustic Sensing (DAS) metadata
License: Creative Commons Attribution 4.0 International
Feature request:
Hi everyone, I wanted to propose a few additional entries for the metadata. I grouped them according to the categories agreed until now. Most of them should not be always required, I think. I leave it up to you to decide. Feel free to change/adapt/etc..
-- Overview
-- Cable and Fiber:
-- Interrogator:
-- Acquisition:
-- Channel
Other comments/ideas:
Saving a number of free metadata slots for custom/optional entries with descriptions can be helpful for specific cases. I guess the extra volume increase for a few extra entries is not dramatic. A custom name could be given to these entries (e.g. Feature1, Feature2,..), or alternatively one of these could be assigned for each broad category (Overview, Interrogator,...). Some of these may need to be customized during processing, and not always previous to acquisition
I also think that it may be a good option to allow for coordinate reference frames that are not geographical, such as a user-defined Cartesian system with units of choice, for example, for small cables, geotechnical applications,...
As proposed in the meeting today, I also agree to generate a new category for pre-processing. It may even become customary to apply several procedures (perhaps even not yet popular...) to the raw data during acquisition.
Would a single, separate file for all the metadata make sense in the end? It could also include an overview with warnings about acquisition parameters changes during a given campaign. However, this may be a bit tricky to agree upon.
the structure of the meta data makes it seem very suitable for json format
This could also be easy to read in various other code, and exported if the meta data is used as header in a file format (e.g. miniDAS https://github.com/DAS-RCN/RCN_DASformat)
Reading the meta-data descriptions several points come up, and need perhaps clarification:
Overview:
Cable
Acquisition
Channel
Right now it reads like there's just one cable-fiber ID for each portion of the fiber. Am I understanding that correctly?
As I'm working on an implementation of this in DASCore, I'm realizing it probably makes sense to keep a fiber ID and a cable ID for each portion of the light path.
Even though this may be redundant for some experiments, it's common enough to have a mix of straight stretches and loop-backs that I think it could help clarify acquisition setup for others after the fact.
A couple examples: This lets you leave a clear record when you start collecting data on a new fiber in the same cable. It also lets you have a clear indication of two fibers with a U connection at the end that sit in the same cable being differentiated from two fibers in two separate cables that are side-by-side or a cable deployed as a U. Or if you have a single mode DAS and a multimode DTS recording in the same cable (e.g. to do temp. corrections on your DAS data), and you use a similar metadata standard for DTS, then you can make it clear that they're in the same cable (and this means same thermal properties) versus side-by-side cables (potentially quite different thermal response).
I want to propose to move the proposal to Markdown instead of a static PDF.
Markdown has the advantage that it can be discussed, edited and revision in an transparent and open fashion. Once the community / owner is happy a Markdown document can be rendered into any format. PDF, HTML, Word, LaTeX or whatever.
For a proper and transparent RFC I advocate strongly for modern Markdown on GitHub.
I propose to formalize meta data proposal in term/
to a machine and human readable format. I vote for json-schema, the schemas can easily be validated and translated into various other schemas, see https://app.quicktype.io/.
The thingy would look like this
{
"title": "Distributed Acoustic Sensing Metadata - Overview",
"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "http://example.com/example.json",
"type": "object",
"default": {},
"required": [
"location",
"number_of_interrogators",
"principle_investigators",
"start_datetime",
],
"properties": {
"location": {
"type": "string",
"default": "",
"title": "Description of geographic location",
"examples": [
"Parkfield, California, USA"
]
},
"deployment_type": {
"type": "string",
"default": "",
"title": "Describes the permanency of the deployment",
"examples": [
"permanent"
]
},
"number_of_interrogators": {
"type": "number",
"default": "",
"title": "Number of interrogators used to collect data over the course of data collection",
"examples": [
2
]
},
"principle_investigators": {
"type": "string",
"default": "",
"title": "Point of Contact(s)",
"examples": [
"P.I. Doe"
]
},
"start_datetime": {
"type": "string",
"format": "date-time",
"default": "",
"title": "Start date of experiment",
"examples": [
"2018-02-11T00:00:00"
]
}
},
"examples": [
{
"location": "",
"number_of_interrogators": 1,
"principle_investigators": "",
"start_datetime": "",
}
]
}
The PRODML standard has a concept called 'Optical Path', which resembles our cable/fiber but is intended to capture the entire optical path during a single acquisition and would map to an OTDR measurement (which might be linked as an ancillary file). In this way the entire response could be captured including any splices or connections and could be validated by comparison with an OTDR measurement, which is usually conducted prior to data collection anyway. Perhaps something to think about.
Experience with many other formats is that during conception of the format it is never fully anticipated how it is going to be used.
Often, pre-defined header fields are canabalized for other purposes (notably in SEGY...)
I would suggest to add an empty container that can be used for custom (non-standard) header information where people could write things like VSP charge, or moment tensors, or weather conditions...
For completeness
SSA 2022/Seattle Washington
• Important parameters for using a DAS dataset:
o Method of photonic estimation (dual-pulse, single-pulse, chirp, local oscillator) or at least require the category of photonic estimation (quantitative, non-quantitative)
o SEAFOM standard of noise level (e.g., noise level estimate in rad/rt-hz at 1, 5,10,50 km)
• Less important, but still relevant:
o provenance of location estimation (why, how, when, who)
o Dark fiber vs direct install
o Fiber owner
o Fiber operator
o OTDR for array (w provenance as fibers do change)
EGU 2022
(related) Ways to transfer large amounts of data?
Add trace start time to channel metadata? [but would this also imply sample rate and number of samples?
Other
Timing could be other than GPS (e.g., NTP or PTP).
Could add a timing metric for segments where timing lock was missing.
Ownership of the cable, location of the first repeater, depth of water (marine or lake)
Use a pointer to a file containing locations rather than repeat (implemented in July 25 version)
Add metadata version number
Add “user-defined” space
Feature request (reason, suggestion to implement):
Coordinate system descriptions are too vague. If provided in meta-data, geodetic or projected coordinate systems should be fully defined by things like horizontal datum, vertical datum, ellipsoid, units, projection, zone, etc. EPSG codes exist for this purpose and propose that they be required when including any sensing cable coordinates. Descriptive text is fine too, but EPSG codes should be a requirement.
The spec currently suggests that the orientation of the fibre at each channel be given in strike ('degrees clockwise from east positive') and dip (angle downwards from the horizontal?).
Firstly, it feels unnatural to define an orientation or direction in space using 'strike'; it is usually used to define planes. (Trend and plunge are more common for directions or orientations.) Secondly, strike is more commonly measured as an azimuth from north, not from east. Finally, there are multiple geological conventions on how strike is measured, making this more ambiguous than need be.
I propose that we match existing conventions. SEED defines channel directions with azimuth (degrees east from local north) and dip (degrees down from the local horizontal; both in blockette 52). SAC uses inclination (degrees downwards from the local vertical direction) instead of dip.
My own preference is for the SAC convention, but would be happy with either as an improvement over the current proposal.
Tangentially, it is worth considering that strain(-rate) systems are orientational, not directional, so the direction of measure is 180°-ambiguous. In other words, it doesn't matter whether a channel is defined as pointing one way or the other in space.
Given than, should the spec constrain azimuths to be in the range [0°, 180°) (or [–90°, 90°))? Or is it better to leave the possibility of specifying the channel orientation in two ways? Or perhaps should it be defined as being the direction in which the laser travels? This would then remove the directional ambiguity.
reading through the whitepaper it is not clear how the distinction between channels, cable distance and geographical coordinates is incorporated
It may be useful to have a dedicated section on coordinates somewhere with
this info seems to be all in there, but a bit scattered. However this seems one of the most critical information in the data, so it might deserve a more prominent position
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.