kujaku11 / mth5 Goto Github PK

Exchangeable and archivable format for magnetotelluric time series to better serve the community through FAIR principles.

Home Page: https://mth5.readthedocs.io/en/latest/index.html

License: MIT License

Makefile 0.15% Python 99.85%

mth5's Introduction

MTH5

MTH5 is an HDF5 data container for magnetotelluric time series data, but could be extended to other data types. This package provides tools for reading/writing/manipulating MTH5 files.

MTH5 uses h5py to interact with the HDF5 file, xarray to interact with the data in a nice way, and all metadata use mt_metadata. This project is in cooperation with the Incorporated Research Institutes of Seismology, the U.S. Geological Survey, and other collaborators. Facilities of the IRIS Consortium are supported by the National Science Foundation’s Seismological Facilities for the Advancement of Geoscience (SAGE) Award under Cooperative Support Agreement EAR-1851048. USGS is partially funded through the Community for Data Integration and IMAGe through the Minerals Resources Program.

Version: 0.4.3
Free software: MIT license
Documentation: https://mth5.readthedocs.io.
Examples: Click the Binder badge above and Jupyter Notebook examples are in docs/examples/notebooks
Suggested Citation: Peacock, J. R., Kappler, K., Ronan, T., Heagy, L., Kelbert, A., Frassetto, A. (2022) MTH5: An archive and exchangeable data format for magnetotelluric time series data, Computers & Geoscience, 162, doi:10.1016/j.cageo.2022.105102

Features

Read and write HDF5 files formated for magnetotelluric time series, transefer functions, and Fourier Coefficients.
From MTH5 a user can create an MTH5 file, get/add/remove stations, transfer functions, Fourier Coefficients, runs, channels and filters and all associated metadata.
Data is contained as an xarray which can house the data and metadata together, and data is indexed by time.
Readers for some data types are included as plugins, namely
- Z3D
- NIMS BIN
- USGS ASCII
- LEMI
- StationXML + miniseed

Introduction

The goal of MTH5 is to provide a self describing heirarchical data format for working, sharing, and archiving. MTH5 was cooperatively developed with community input and follows logically how magnetotelluric data are collected. This module provides open-source tools to interact with an MTH5 file.

The metadata follows the standards proposed by the IRIS-PASSCAL MT Software working group and documented in MT Metadata Standards Note: If you would like to comment or contribute checkout Issues or Slack.

MTH5 Format

The basic format of MTH5 is illustrated below, where metadata is attached at each level.

MTH5 File Version 0.1.0

MTH5 file version 0.1.0 was the original file version where Survey was the highest level of the file. This has some limitations in that only one Survey could be saved in a single file, but if you have mulitple Surveys that you would like to store we need to add a higher level Experiment.

Important: Some MTH5 0.1.0 files have already been archived on ScienceBase and has been used as the working format for Aurora and is here for reference. Moving forward the new format will be 0.2.0 as described below.

MTH5 File Version 0.2.0

MTH5 file version 0.2.0 has Experiment as the top level. This allows for multiple Surveys to be included in a single file and therefore allows for more flexibility. For example if you would like to remote reference stations in a local survey with stations from a different survey collected at the same time you can have all those surveys and stations in the same file and make it easier for processing.

Hint: MTH5 is comprehensively logged, therefore if any problems arise you can always check the mth5_debug.log (if you are in debug mode, change the mode in the mth5.init) and the mth5_error.log, which will be written to your current working directory.

Examples

Make a simple MTH5 with one station, 2 runs, and 2 channels, 1 transfer function, 1 set of Fourier Coefficients (version 0.2.0)

from mth5.mth5 import MTH5

with MTH5() as mth5_object:
    mth5_object.open_mth5(r"/home/mt/example_mth5.h5", "a")

    # add a survey
    survey_group = mth5_object.add_survey("example")

    # add a station with metadata
    station_group = mth5_object.add_station("mt001", survey="example")
    station_group = survey_group.stations_group.add_station("mt002")
    station_group.metadata.location.latitude = "40:05:01"
    station_group.metadata.location.longitude = -122.3432
    station_group.metadata.location.elevation = 403.1
    station_group.metadata.acquired_by.author = "me"
    station_group.metadata.orientation.reference_frame = "geomagnetic"

    # IMPORTANT: Must always use the write_metadata method when metadata is updated.
    station_group.write_metadata()

    # add runs
    run_01 = mth5_object.add_run("mt002", "001", survey="example")
    run_02 = station_group.add_run("002")

    # add channels
    ex = mth5_object.add_channel("mt002", "001", "ex", "electric", None, survey="example")
    hy = run_01.add_channel("hy", "magnetic", None)
    
    # add transfer functions
    tf = station_group.transfer_functions_group.add_transfer_function("tf01")
    
    # add Fourier Coefficients
    fcs = station_group.fourier_coefficients_group.add_fc_group("fc01")

    print(mth5_object)

    /:
    ====================
        |- Group: Experiment
        --------------------
            |- Group: Reports
            -----------------
            |- Group: Standards
            -------------------
                --> Dataset: summary
                ......................
            |- Group: Surveys
            -----------------
                |- Group: example
                -----------------
                    |- Group: Filters
                    -----------------
                        |- Group: coefficient
                        ---------------------
                        |- Group: fap
                        -------------
                        |- Group: fir
                        -------------
                        |- Group: time_delay
                        --------------------
                        |- Group: zpk
                        -------------
                    |- Group: Reports
                    -----------------
                    |- Group: Standards
                    -------------------
                        --> Dataset: summary
                        ......................
                    |- Group: Stations
                    ------------------
                        |- Group: mt001
                        ---------------
                            |- Group: Fourier_Coefficients
                            ------------------------------
                            |- Group: Transfer_Functions
                            ----------------------------
                        |- Group: mt002
                        ---------------
                            |- Group: 001
                            -------------
                                --> Dataset: ex
                                .................
                                --> Dataset: hy
                                .................
                            |- Group: 002
                            -------------
                            |- Group: Fourier_Coefficients
                            ------------------------------
                                |- Group: fc01
                                --------------
                            |- Group: Transfer_Functions
                            ----------------------------
                                |- Group: tf01
                                --------------
            --> Dataset: channel_summary
            ..............................
            --> Dataset: tf_summary
            .........................

Credits

This project is in cooperation with the Incorporated Research Institutes of Seismology, the U.S. Geological Survey, and other collaborators. Facilities of the IRIS Consortium are supported by the National Science Foundation’s Seismological Facilities for the Advancement of Geoscience (SAGE) Award under Cooperative Support Agreement EAR-1851048. USGS is partially funded through the Community for Data Integration and IMAGe through the Minerals Resources Program.

mth5's People

Contributors

Stargazers

Watchers

Forkers

zy911k24 nre900 xandrd

mth5's Issues

test_process_parkfield_run.py fails due to

The good news is that my modification to tests/parkfield/test_process_parkfield_run.py has eliminated the error where it could not find the config.json file. The bad news is that this test is running into an mth5 exception.

Below I paste the gitactions output. This may actually mth5 issue, it may be an error because the path in the config is not found. , and it is the aurora-generated config that needs fixing.

@kkappler : TODO check if the mth5 path in the config file is absolute, and if so, is the h5 file being generated on the server prior to the running of this test

@kujaku11 : Don't worry about this issue until karl reviews the pathing issue - this may be a properly idenitifed exception becuase the path to h5 is not well defined on the server.

============================= test session starts ==============================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/share/miniconda/envs/aurora-test/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/aurora/aurora
plugins: cov-2.12.1
collecting ... collected 21 items / 1 error / 20 selected

==================================== ERRORS ====================================
________ ERROR collecting tests/parkfield/test_process_parkfield_run.py ________
tests/parkfield/test_process_parkfield_run.py:7: in
tf_collection = process_mth5_run(processing_cfg, run_id, units="SI")
aurora/pipelines/process_mth5.py:216: in process_mth5_run
run_config, mth5_obj = initialize_pipeline(run_cfg)
aurora/pipelines/process_mth5.py:35: in initialize_pipeline
mth5_obj.open_mth5(config["mth5_path"], mode="r")
/usr/share/miniconda/envs/aurora-test/lib/python3.9/site-packages/mth5/mth5.py:499: in open_mth5
raise MTH5Error(msg)
E mth5.utils.exceptions.MTH5Error: Cannot open new file in mode r
------------------------------- Captured stderr --------------------------------
2021-08-28 00:34:23,940 [line 143] mt_metadata.utils.mttime.MTime.setup_logger - INFO: Logging file can be found /usr/share/miniconda/envs/aurora-test/lib/python3.9/site-packages/logs/mt_time.log
2021-08-28 00:34:23,941 [line 498] mth5.mth5.MTH5.open_mth5 - ERROR: Cannot open new file in mode r

Make sure all examples work with test data links

What is the best way to have links in the setup for the mth5_test_data?

Cannot Convert FDSN Channel to MTH5 Channel Back to FDSN Channel

FDSN channel:
networks = ['BK']
stations = ['PKD']
channels = {'BQ2':''}
starttime = '2004-09-28T00:00:00.000000'
endtime = '2004-09-28T01:59:59.974999'

Is being collected through the IRIS web client and it's metadata is converted to an MTH5 framework using:

translator = XMLInventoryMTExperiment()
experiment = translator.xml_to_mt(inv)

When trying to convert the electric channel back to an FDSN channe code using 'fdsn_tools.make_channel_code(mth5_chan)`

The error below is thrown.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-a6645837fb05> in <module>
----> 1 MakeMTH5.make_mth5_from_fdsnclient(networks, stations, channels, starttime, endtime, path=None, client="IRIS")

~/Desktop/Projects/mth5/mth5/clients/make_mth5.py in make_mth5_from_fdsnclient(networks, stations, channels, starttime, endtime, path, client)
     97                     mth5_chan = m.stations_group.get_station(msta_id)\
     98                                  .get_run(mrun_id).get_channel(mchan_id)
---> 99                     print(fdsn_tools.make_channel_code(mth5_chan))
    100                     # Map the channel but mth5 and ph5
    101 

~/Desktop/Projects/mth5/mth5/utils/fdsn_tools.py in make_channel_code(channel_obj)
    166 
    167     period_code = get_period_code(channel_obj.sample_rate)
--> 168     sensor_code = get_measurement_code(channel_obj.type)
    169     if "z" in channel_obj.component.lower():
    170         orientation_code = get_orientation_code(

AttributeError: 'ElectricDataset' object has no attribute 'type'

Upon inspection Channel Electrics does not seem to have a type:

Channel Electric:
-------------------
	component:        ey
	data type:        electric
	data format:      int32
	data shape:       (1,)
	start:            2003-09-12T18:54:00+00:00
	end:              2005-03-15T16:45:00+00:00
	sample rate:      40.0

Is this coming from how the channel is built or because this is missing from the original stationxml.

Add a transfer function group

Should add a transfer functions group to the file. This should be under experiment - > survey - > station -> transfer_functions

There can be a group for each estimate then this can be saved as multi-dimensional arrays.

This way an MTH5 can be a collection of transfer functions as well.

subclassing xarray discussion

RunTS has_a dataset, which is an xr.Dataset object. It would be good to discuss the pros and cons of extending xr.Dataset to be a RunTS object. What would be the advantages and what aspects of the changeover would be difficult.

Jared, Joe, Karl, Lindsey to discuss when Jared returns from AK?

make_mth5 from atomic list of streams

Here is some rough pseudocode for a Requested Function:

This is a conceptual sketch of a method that we would want to expand on in future.

There are some classes that could be used to do a lot of this.

Namely: aurora/aurora/sandbox/io_helpers/fdsn_dataset_config.py

def add_to_exiting_mth5(row, mth5_obj):
    fdsn_dataset_config  = FDSNDatasetConfig.from_df_row(row)
    inventory = fdsn_dataset_config.get_inventory()
    stream = fdsn_dataset_config.get_data_via_fdsn_client()

    #<VOODOO>
    add_metadata_to_mth5(mth5_obj, inventory)
    add_datastream_to_mth5(mth5_obj, stream)
    #</VOODOO>
    return



def make_mth5_from_dataframe(df, h5_path=None):
    """
    df: pandas Dataframe 
    has the following columns:  [“NETWORK”, “STATION”, “CHANNEL”, “START TIME”, “END TIME”]
    
    h5_path: pathlib.Path or string or None
        This is the path the the mth5 file that will get built by the function
        
    Behaviour: 
    The function can iterate over each row of the dataframe, and access the metadata
     and data associated with that row. The data and metadata will be added to the 
     mth5 object.

    This means that metadata from a new stream can be augmented to the mth5 "experiment"

    Test that the mth5 can be saved
    That the mth5 can be opened and all the data can be read back (maybe plotted as a check that everything is fine)

    After a first cut works, an obvious thing to do is merge the stream queries

    Parameters
    ----------
    df
    h5_path

    Returns
    -------

    """
    mth5_obj = initialize_mth5(h5_path) #returns an mth5_obj and handles already 
    # exists case
    for row in df.iterrows():
        add_to_exiting_mth5(row, mth5_obj)
    mth5_obj.close() #etc.
    return h5_path

add myself and Karl as collaborators please?

@kujaku11: would you be willing to add myself and @kkappler to the repo for project work?

Augment create_examle_mth5_file ipynb with example of data access

Add a little snippet once the mth5 is created that either calls an internal plotter or better maybe just:

import matplotlib.pyplot as plt
time_series_data = command_to_extract_a_few_thousand_samples_from_some_run_channel()
plt.plot(time_series_data);plt.show()

Process for Getting Miniseed From IRIS and Submitting StationXML to IRIS

Setup GitActions

Setup GitActions so that the workflow is comprehensive.
@lheagy Any chance you could take a look at this, or walk me through the process. Seems like there is a bunch of stuff to use.

channel number missing

When using run_group.get_channel() from an open MTH5 file, the channel number gets lost somehow and is set to None

reading from zarr

Accessing data chokes on decompression for some zarr. When compression scale offset is used. Check what compression is supported by zarr.

Entry Point for Making MTH5 files based on User Inputs and Various Ingestion Protocols

There does not seem to be a callable class for building MTH5 files from user inputs and various ingestion protocol. We likely need functions that is similar to /mth5/examples/make_mth5_from_iris_dmc_local.py, /mth5/examples/make_mth5_from_iris_dmc.py, /mth5/examples/make_mth5_from_nims.py... but that are generalized, are callable, and accept arguments.

This class could eventually be expanded to include other ingestion protocols for NetCDF and other file types.

Make NIMS reader robust

Need to make NIMS reader robust. There is an issue when there are gaps in the data and how they are accounted for with GPS stamps. The code was translated from A. Kelbert's NIMSread.m matlab code and verified using Octave, but there should be more testing done with different NIMS files.

When resizing a dataset the chunk size does not change

When adding a channel with no data the initial dataset is set to a size of (1,) which can be extended with a max shape of (None,). The issue is that if you want to resize this data set the chunk size stays the same (1,) therefore there is a bunch of overhead and extra metadata for each chunk that is store which results in taking a long time to set the data and bloats the HDF5 file.

There are a few ways to get around this maybe:

If the input metadata has a start and end time with a sampling rate then make an array of zeros with the expected shape
Set the chunk size manually to something that could be efficient.
Reset the chunk size some how.

Plot large time series efficiently

Would be nice to be able to plot the time series in its entirety if possible.

Possible solutions would be using something like Bokeh, Datashader, HoloViz.

FutureWarning when run_ts_obj accesses its time series

When I call:

run_ts_obj = run_obj.to_runts()

I get this warning:
/home/kkappler/software/irismt/mth5/mth5/timeseries/channel_ts.py:531: FutureWarning: Timestamp.freq is deprecated and will be removed in a future version if self._ts.coords.indexes["time"][0].freq is None: /home/kkappler/software/irismt/mth5/mth5/timeseries/channel_ts.py:541: FutureWarning: Timestamp.freq is deprecated and will be removed in a future version sr = 1e9 / self._ts.coords.indexes["time"][0].freq.nanos

This is discussed here: dask/dask#7783

The issue is that the freq property will not be associated with a single element of a series of timestamps.

There are two solutions worth trying here:

We could stop accessing an individual timestamp, i.e. replace

sr = 1e9 / self._ts.coords.indexes["time"][0].freq.nanos
with
sr = 1e9 / self._ts.coords.indexes["time"].freq.nanos

If that doesn't solve the issue, then how about something like

sr = 1./(self._ts.coords.indexes["time"].diff().median())

sr = 1./(self._ts.coords.indexes["time"][1] - self._ts.coords.indexes["time"][0])

These assume a uniform sampling rate throughout the series - which we should confirm is a property of an mth5 time series. If it is not always the case, then there should be a boolean property self.uniformly_sampled or similar that we can check to make sure sample_rate is well defined.

make_mth5 example not packing time series into container

I tried running examples/notebooks/make_mth5_driver.ipynb and I do get an h5 file, but it is only 222k which seems small.

After the notebook, I added the following commands in the cells:

cas04 = mth5_obj.stations_group.get_station("CAS04")
run_b = cas04.get_run('b')
ts = run_b.to_runts()
ts.dataset.max()
ts.dataset.min()

I wind up with max and min being zero everywhere. Not sure what is going on but we should have some data in this file.

Get only metadata from data center

There has been a request by Anna to get only the metadata from a request to a data center, to get data availability basically.

This shouldn't be too hard to implement just need to add something to makeMTH5.

@timronan Do you have thoughts on this?

I was thinking that just adding a key word in MakeMTH5.inv_from_df like MakeMTH5.inv_from_df(df, client, data=False) to only get the metadata.

Do you know if there is a tool to convert an inventory to a data frame? That could be handy, probably wouldn't be too hard to build if there is not.

Need a way to update summary tables when metadata is changed

Need a method to update summary tables when metadata are changed.

For instance if you add a channel, the metadata associated with that channel is used to make a summary table entry. But now you want to update the metadata in the channel, there is not a way that the table entry is updated. Need to make sure everything is consistent.

Could have objects that watch for updates and then update the entry on the fly. This would be optimal, another way might be to validate all entries before the file is closed.

add sample_interval property

Priority: Low
Difficulty: Low
The same way we have a sample_rate property it would be nice to have a sample_interval property that returns 1/sample_rate. It will make the code more readable. Also we can consider aliasing it as "dt", maybe.

Add another group Experiment

Right now only one survey can exist per MTH5 file. It might be useful to add an extra layer so that multiple surveys can be in one file, and that would be Experiment, like how the metadata is setup.

Experiment
    |-> Survey
        |-> Station
            |-> Run
                |-> Channel

logger checking tzinfo every few milliseconds

Running aurora tests on gitactions and found the test report is completely dominated by tzinfo logging messages.

I count around 5000 lines of this message:

2021-08-28 21:36:10,105 [line 387] mt_metadata.utils.mttime.MTime.validate_tzinfo - INFO: Local timezone identified setting to UTC

It seems like some condition is being checked many times per second and logged.

Add mth5 creation example with multiple stations

The CAS04 could be augmented with NVR08

Factor out large files into smaller ones

Need to factor large files into smaller ones for easier reading.

Namely:

metadata
groups
timeseries

Fail when trying to represent timeseries.RunTS object from command line

If I do something like:

from mth5.timeseries import RunTS
r = RunTS()
r

The kernel crashes and I can't find the reason.

Branch for adding helper functions?

@kujaku11 : I am going to port a few of the helper functions that are mth5 related out of aurora.

These methods will be standalone and so they should not break any tests. Any objection to me adding them to master branch in mth5/utils/helpers.py ?

Configure for PyPI and Conda

Need to make sure all the configuration files for PyPI and Conda are all setup so users can install the package.
Need to make sure all dependencies are accounted for.

Convert to Physical Units

It would be beneficial if MTH5 provided a tool to convert archived data that is in counts to physical units. This would mean apply any filters. This could be done either in the time domain or the frequency domain. We already have a channel response object, we just need to find an efficient way to apply the filters. The reverse should also be available, physical units to counts for archiving.

In doing so the mth5.level would be updated to 2 instead of 1.

@kkappler, @lheagy, @timronan Do you have any thoughts on an efficient method?

Define how filters are represented

Define how filter should be represented in the MTH5 file and as an object.

Channel size estimate dtype error

Inside master_station_run_channel.py around line 1134 on branch make_mth5_51 the code was taking the difference between time_period.end and time_period.start.

This fails because they are both strings. Below is a snippet that fixes the issue.

REPLACE

estimate_size = (
                            channel_metadata.time_period.end - channel_metadata.time_period.start) * channel_metadata.sample_rate

which acutally should return a tuple WITH

if channel_metadata.time_period.start != channel_metadata.time_period.end:
  if channel_metadata.sample_rate > 0:
      end_time = UTCDateTime(channel_metadata.time_period.end)
      start_time = UTCDateTime(channel_metadata.time_period.start)
      delta_time_seconds = end_time - start_time
      n_samples_ish = delta_time_seconds * \
                      channel_metadata.sample_rate
      estimate_size = (int(n_samples_ish),)

N.B. A better solution is to add a "duration" method to channel_metadata, so that it returns the number of seconds, i.e.:

if channel_metadata.time_period.start != channel_metadata.time_period.end:
  if channel_metadata.sample_rate > 0:
      delta_time_seconds = channel_metadata.time_period.duration
      n_samples_ish = delta_time_seconds * \
                      channel_metadata.sample_rate
      estimate_size = (int(n_samples_ish),)

@timronan I think you were going to add a temporary fix on the make_mth5_51 branch

Make conversion of time series object to NetCDF robust

xarray natively supports NetCDF (which is now NetCDF4 that is based on HDF5). It would be nice to make the ChannelTS.ts.to_netcdf and RunTS.dataset.to_netcdf robust so that other programs could use them. Relatively low priority, but would be nice to have.

Get slice when requesting a RunTS

When getting a RunTS from the data it would be nice if you could specify a time window.

Suggest adding start and end time to the arguments:

run_group.to_runts(start=None, end=None)

@kkappler I will try to add this functionality soon.

Unexpected value for channels_recorded_electric, channels_recorded_magnetic

Here is a screenshot of an mth5 dataset as it prints to screen

I have only 4 channels, ex, ey, hx, hy

Its puzzling that the channels_recorded_electric seems to have so many entries ... I would have expected ["ex", "ey"]

Incorrect metadata when pulling from IRIS

When pulling data from network 'EM' the channel metadata is not propagated to each run
The units for the data are incorrect, they should all be counts
All filters have been 'applied' if the units are counts.

Fail to propagate time-period into channel summary

When reading in metadata to create a channel the stations_group.summary_table does not populate with the appropriate date and time for time_period.start time_period.end

This occurs with mth5/examples/make_mth5_file_from_xml.py

Namely this issue will occur anytime metadata is updated after a channel is added to a run group. There needs to be functionality to update the stations_group.summary_table entry when metadata is updated. This should be done automatically and not be left to the user for consistency. Moreover, there should be a test when a file is closed to make sure that all the channel entries are consistent.

Build Data Frame that Indexes Acquisition Runs

There should be a function that outputs a pandas data frame containing the acquisition runs available in an MTH5 expirment. Each row should represent one run and should contain: Station ID, Run ID, List of Channels, Sample Rate, Starttime, Endtime.

This function will verify that all channels listed per run will have the same sample rate. If there is an inter-run sample rate inconsistency than an error will be thrown.

There should be an optional argument that will make the output of the function a data frame with one channel per each row. So the DF columns would be: Station ID, Run ID, Channel, Sample Rate, Starttime, Endtime.

mth5 filters not inheriting all properties from mt_metadata filters

Specifically:
The FIR filter in mt_metadata accessed via:
experiment.surveys[0].filters
has new properties that I am creating via the obspy mapping.
In this case the property I am looking for is:
decimation_input_sample_rate

However, when I access the filters via, for example:
hx = run.get_channel('hx')
hx.channel_response_filter.filters_list[3]
I do not have the attribute available, I get instead:
AttributeError: 'FIRFilter' object has no attribute 'decimation_input_sample_rate'

So we need to add this attribute. However, that is a one-off fix. The issue is more that we need a way for mth5 to get a comprehensive list of attributes from mt_metadata. Otherwise we will forever have this issue whenever adding new attrs (which could happen a few times in the next few months).

Ideally we would make a test in mt_metadata called something like:
test_can_make_list_of_filter_attributes(FIRFilter):
which generates a list of attrs that we expect to be populated.

The underlying list_maker there would be used in mth5.

Let's discuss.

Wrap All pyfiles in examples folder to run with pytest

Basically wrap the code in "filename.py" as "def test_filename():", then if name=='main' it.

then create test_examples_folder.py .
and import all these test_filename functions and call them sequentially in there.

Revamp/update the documentation

Since the metadata has been split off to a new repository, need to revamp the documentation to reflect that and just provide a link to mt_metadata read the docs. And continue to update documentation.

logging_config is not installed from pip or conda

The logging_config.yml is not included in the PIP or conda installs. This causes an error in load_logging_config function when mth5 is initialized.

Suggest adding .yml to the manifest and making a test for if the file is not there use some default values.

run group not inheriting station coordinates

Context: I am trying to add latitude to my synthetic station in aurora/tests/synthetic and not able to add coordinates on the run level.

Not sure if a run should inherit coordinates from the station, but I would have naively expected it to.

here is a code snippet, the output of which is
17.006 ?= 0.00

from mth5.mth5 import MTH5
m = MTH5()
m.open_mth5("tmp.h5", mode="w")
station_group = m.add_station("test1")
station_group.metadata.location.latitude = 17.006
run_group = station_group.add_run("001")
print(station_group.metadata.location.latitude,"?=", run_group.station_group.metadata.location.latitude)

Make channel summary more efficient

Right now making the channel summary takes a long time even for small MHT5's. This is because it creates a group each time that validates the metadata, which slows it down.

Suggest just looping through hdf5 groups and datasets and get metadata directly from the attributes.

Validate metadata when input into xarray.attrs

@lheagy , @kkappler , @domfournier, @kujaku11 Need help figuring out an efficient way to validate any metadata that is input into xarray.DataArray.attrs, which is the container for the data.

For example, say we have a channel of 1-D array magnetic data. The container is mth5.timeseries.ChannelTS, and it will have appropriate metadata attached to it. Here is a simple example

import numpy as np
from mth5.timeseries import ChannelTS

hx = ChannelTS("magnetic")

sample rate is a property which uses hx.metadata.sample_rate under the hood
hx.sample_rate = 10

start is a property which uses hx.metadata.time_period.start under the hood
hx.start = "2020-01-01T12:00:00"

Set the data to random numbers
hx.ts = np.random.rand(4096)

Print a summary of the channel
hx
Out[13]:
Channel Summary:
Station: None
Run: None
Channel Type: magnetic
Component: None
Sample Rate: 10.0
Start: 2020-01-01T12:00:00+00:00
End: 2020-01-01T12:06:49.500000+00:00
N Samples: 4096

Print what the metadata looks like
hx.metadata
Out[14]:
{
"magnetic": {
"channel_number": null,
"component": null,
"data_quality.rating.value": 0,
"filter.applied": [
false
],
"filter.name": [
"none"
],
"location.elevation": 0.0,
"location.latitude": 0.0,
"location.longitude": 0.0,
"measurement_azimuth": 0.0,
"measurement_tilt": 0.0,
"sample_rate": 10.0,
"sensor.id": null,
"sensor.manufacturer": null,
"sensor.type": null,
"time_period.end": "2020-01-01T12:06:49.500000+00:00",
"time_period.start": "2020-01-01T12:00:00+00:00",
"type": "magnetic",
"units": null
}
}

Print what the xarray metadata looks like
hx.ts.attrs
Out[15]:
{'channel_number': None,
'component': None,
'data_quality.rating.value': 0,
'filter.applied': [False],
'filter.name': ['none'],
'location.elevation': 0.0,
'location.latitude': 0.0,
'location.longitude': 0.0,
'measurement_azimuth': 0.0,
'measurement_tilt': 0.0,
'sample_rate': 10.0,
'sensor.id': None,
'sensor.manufacturer': None,
'sensor.type': None,
'time_period.end': '2020-01-01T12:06:49.500000+00:00',
'time_period.start': '2020-01-01T12:00:00+00:00',
'type': 'magnetic',
'units': None}

The current method to update the xarray attrs when a user changes some metadata is with a manual function call:

hx.metadata.sensor.id = 4096
hx.metadata.measurement_azimuth = "90"

hx.metadata
Out[19]:
{
"magnetic": {
"channel_number": null,
"component": null,
"data_quality.rating.value": 0,
"filter.applied": [
false
],
"filter.name": [
"none"
],
"location.elevation": 0.0,
"location.latitude": 0.0,
"location.longitude": 0.0,
"measurement_azimuth": 90.0,
"measurement_tilt": 0.0,
"sample_rate": 10.0,
"sensor.id": "4096",
"sensor.manufacturer": null,
"sensor.type": null,
"time_period.end": "2020-01-01T12:06:49.500000+00:00",
"time_period.start": "2020-01-01T12:00:00+00:00",
"type": "magnetic",
"units": null
}
}

Update xarray attrs manually
hx.update_xarray_metadata()
hx.ts.attrs
Out [21]:
{'channel_number': None,
'component': None,
'data_quality.rating.value': 0,
'filter.applied': [False],
'filter.name': ['none'],
'location.elevation': 0.0,
'location.latitude': 0.0,
'location.longitude': 0.0,
'measurement_azimuth': 90.0,
'measurement_tilt': 0.0,
'sample_rate': 10.0,
'sensor.id': '4096',
'sensor.manufacturer': None,
'sensor.type': None,
'time_period.end': '2020-01-01T12:06:49.500000+00:00',
'time_period.start': '2020-01-01T12:00:00+00:00',
'type': 'magnetic',
'units': None}

Error raised with opening file with mode="r"

If a file exists and you open the file with
m=MTH5() m.open_mth5("filename", mode="r")

an MTH5Error is raised

add importable reference to data path to data repo

use inspect module to allow mth5_data_repo to know its path and make that path importable from other repositories

Proper place? init? util.py?

MTH5 to miniseed and StationXML

Need to develop a tool to output miniseed data and a StationXML to archive at IRIS. The tools are there, just need to assemble them into a function.

Metadata has Experiment as top level, but MTH5 has Survey as top level

mt_metadata has Experiment as the top level, whereas MTH5 has the top level as survey. This was a design flaw that failed to account for multiple surveys. To make the two symbiotic need to add an Experiment level to MTH5. This will take some work but shouldn't be too bad.

add setup.py to mth5_test_data repo

Once there is a setup then we can make this package (and the data) accessable by adding the following to say util.py:

import inspect
import os
import mth5_test_data

MTH5_TEST_DATA_DIR = os.path.dirname(inspect.getfile(mth5_test_data))
#may need one more os.path.dirname() on there, and could cast to Path() as well

mt_code nomenclature definitions

ex, ey, hx, hy vs e1, e2, h1, h2

Inside mth5/mth5/utils/fdsn_tools.py there are rules for assigning an "mt_code" to channels.
This is called when we make channels from obspy in mth5/mth5/timeseries/channel_ts.py
in the from_obspy_trace() method.

Specifically the
mt_code = fdsn_tools.make_mt_channel(
fdsn_tools.read_channel_code(obspy_trace.stats.channel)
)
is changing ex to e1.

This comes from fdsn tools.

Let's review these conventions. I had thought that ex, ey had to do more with local strike than cardinal direction.

def make_mt_channel(code_dict, angle_tol=15):
"""

:param code_dict: DESCRIPTION
:type code_dict: TYPE
:return: DESCRIPTION
:rtype: TYPE

"""

mt_comp = mt_code_dict[code_dict["component"]]

if not code_dict["vertical"]:
    if (
        code_dict["orientation"]["min"] >= 0
        and code_dict["orientation"]["max"] <= angle_tol
    ):
        mt_dir = "x"
    elif (
        code_dict["orientation"]["min"] >= angle_tol
        and code_dict["orientation"]["max"] <= 45
    ):
        mt_dir = "1"
    if (
        code_dict["orientation"]["min"] >= (90 - angle_tol)
        and code_dict["orientation"]["max"] <= 90
    ):
        mt_dir = "y"
    elif code_dict["orientation"]["min"] >= 45 and code_dict["orientation"][
        "max"
    ] <= (90 - angle_tol):
        mt_dir = "2"