Skip to content

Quick Start

This guide will get you up and running with gwframe in just a few minutes.

Installation

pip install gwframe

Reading GWF Files

Basic Reading

import gwframe

# Read a single channel
data = gwframe.read('data.gwf', 'L1:STRAIN')

# Access the data and metadata
print(f"Channel: {data.name}")
print(f"Sample rate: {data.sample_rate} Hz")
print(f"Duration: {data.duration} seconds")
print(f"Data shape: {data.array.shape}")

The read() function returns a TimeSeries object with:

  • array: NumPy array containing the data
  • name: Channel name
  • dtype: NumPy dtype of the samples (mirrors array.dtype)
  • start: Start time (GPS seconds)
  • dt: Sample spacing (seconds)
  • duration: Total duration (seconds)
  • sample_rate: Sampling rate (Hz)
  • unit: Physical unit
  • type: Channel type ('proc', 'adc', or 'sim')

Reading Multiple Channels

# Read all channels
all_data = gwframe.read('data.gwf', channels=None)
for name, ts in all_data.items():
    print(f"{name}: {len(ts.array)} samples")

# Read specific channels
channels = ['L1:STRAIN', 'L1:AUX-CHANNEL']
data_dict = gwframe.read('data.gwf', channels)

Time-Based Slicing

# Read data for a specific time range
data = gwframe.read(
    'multi_frame.gwf',
    'L1:STRAIN',
    start=1234567890.0,  # GPS start time
    end=1234567900.0     # GPS end time
)

This automatically finds, reads, and stitches together all frames overlapping with the requested time range.

Reading from Memory

# Read from file-like object
with open('data.gwf', 'rb') as f:
    data = gwframe.read(f, 'L1:STRAIN')

# Read from bytes
from io import BytesIO
with open('data.gwf', 'rb') as f:
    gwf_bytes = f.read()
data = gwframe.read_bytes(gwf_bytes, 'L1:STRAIN')

Writing GWF Files

Simple Write

import numpy as np
import gwframe

# Generate some data
t = np.linspace(0, 1, 16384)
data = np.sin(2 * np.pi * 10 * t)

# Write to file
gwframe.write(
    'output.gwf',
    data,
    start=1234567890.0,      # GPS start time
    sample_rate=16384,     # Hz
    name='L1:TEST',
    unit='strain'
)

Writing Multiple Frames

The key feature of gwframe is efficient multi-frame writing:

with gwframe.FrameWriter('output.gwf') as writer:
    for i in range(100):
        data = np.random.randn(16384)
        writer.write(
            data,
            start=1234567890.0 + i,
            sample_rate=16384,
            name='L1:TEST'
        )

If a context manager doesn't fit your workflow, use open() and close() directly:

writer = gwframe.FrameWriter('output.gwf')
writer.open()
for i in range(100):
    data = np.random.randn(16384)
    writer.write(
        data,
        start=1234567890.0 + i,
        sample_rate=16384,
        name='L1:TEST'
    )
writer.close()

Writing Multiple Channels

# Single frame with multiple channels
gwframe.write(
    'output.gwf',
    channels={
        'L1:STRAIN': strain_data,
        'L1:AUX': aux_data
    },
    start=1234567890.0,
    sample_rate=16384,
    name='L1'
)

Advanced Frame Creation

For more control, use the Frame class:

# Create frame
frame = gwframe.Frame(
    start=1234567890.0,
    duration=1.0,
    name='L1',
    run=1
)

# Add channels
frame.add_channel(
    'L1:STRAIN',
    strain_data,
    sample_rate=16384,
    unit='strain',
    comment='Calibrated strain'
)

# Add metadata
frame.add_history('CREATOR', 'my_pipeline')
frame.add_history('VERSION', '1.0.0')

# Write frame
frame.write('output.gwf')

Inspecting GWF Files

# Get file information
info = gwframe.get_info('data.gwf')
print(f"Number of frames: {info.num_frames}")
for frame in info.frames:
    print(f"Frame {frame.index}: {frame.name} at GPS {frame.start}, duration {frame.duration}s")

# Get available channels
channels = gwframe.get_channels('data.gwf')
for channel in channels:
    print(channel)

Masked Arrays and Invalid Data

ADC channels in GWF files can carry a data-valid flag indicating the entire channel contains suspect data. By default, reading such channels raises an InvalidDataError:

import gwframe

# Raises InvalidDataError if the channel is flagged invalid
data = gwframe.read('data.gwf', 'H1:ADC-CHANNEL')

To read the data anyway, pass allow_invalid=True. The result is a NumPy masked array with all samples masked:

data = gwframe.read('data.gwf', 'H1:ADC-CHANNEL', allow_invalid=True)
print(type(data.array))  # numpy.ma.MaskedArray

When writing masked arrays, the behavior depends on the channel type:

  • ADC channels: The channel-level data-valid flag is set. Per-sample mask detail is lost (the entire channel is flagged invalid).
  • Proc/sim channels: The mask is discarded (these channel types have no data-valid field in the frame format).

Use on_mask_loss to control what happens when mask information is lost:

import numpy as np

masked = np.ma.MaskedArray(data, mask=quality_mask)
frame = gwframe.Frame(start=1234567890.0, duration=1.0, name='H1')

# Default: warns when mask info is lost
frame.add_channel('H1:TEST', masked, sample_rate=16384,
                  channel_type='proc')

# Raise an error instead
frame.add_channel('H1:TEST', masked, sample_rate=16384,
                  channel_type='proc', on_mask_loss='raise')

# Silently discard the mask
frame.add_channel('H1:TEST', masked, sample_rate=16384,
                  channel_type='proc', on_mask_loss='ignore')

Data Validation

Enable CRC checksum validation for data integrity:

# Validate checksums when reading
data = gwframe.read(
    'data.gwf',
    'L1:STRAIN',
    validate_checksum=True
)

Next Steps