Skip to content

Implement NASA flight data loader#696

Open
ThreeMonth03 wants to merge 1 commit intosolvcon:masterfrom
ThreeMonth03:data_processing
Open

Implement NASA flight data loader#696
ThreeMonth03 wants to merge 1 commit intosolvcon:masterfrom
ThreeMonth03:data_processing

Conversation

@ThreeMonth03
Copy link
Collaborator

I implement the data loader of NASA flight dataset to help us finish the future work in issue #685. As for the usage of data loader, you can see the following command and understand how to get data from a iterator.

(base) threemonth03@threemonthwin:~/Downloads/modmesh$ python3 modmesh/track/load.py
Loaded 120791 events.
Event 0:
  timestamp=1.60259601021904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36558835, -5268929.31643981,  3306577.65409484]), con_vel_in_ecef=array([ 0.00220006, -0.0081095 ,  0.00414972]), quat_con_ecef=array([-0.45840088, -0.1767584 ,  0.51147502,  0.70499532]))

Event 1:
  timestamp=1.60259601022904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36564722, -5268929.31643027,  3306577.65410693]), con_vel_in_ecef=array([ 0.00197762, -0.00815516,  0.00414352]), quat_con_ecef=array([-0.45840118, -0.17675819,  0.51147449,  0.70499557]))

Event 2:
  timestamp=1.6025960102293e+18
  source=imu
  data=IMUData(dvel_in_imu=array([-0.18792725, -0.00048828, -0.04785156]), dangle_in_imu=array([-1.90734863e-06,  3.81469727e-06,  1.90734863e-06]))

Event 3:
  timestamp=1.60259601023904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36570593, -5268929.31642279,  3306577.65411988]), con_vel_in_ecef=array([ 0.00190321, -0.00814863,  0.00428433]), quat_con_ecef=array([-0.45840138, -0.17675799,  0.51147416,  0.70499573]))

Event 4:
  timestamp=1.60259601024904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36576642, -5268929.31641798,  3306577.65413258]), con_vel_in_ecef=array([ 0.00201493, -0.00820102,  0.00423451]), quat_con_ecef=array([-0.45840157, -0.17675801,  0.51147413,  0.70499562]))

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please review this pull request when you are available. Thanks.

Comment on lines +33 to +58
@dataclass
class GroundTruthData:
"""
Data structure for ground truth data.
"""
con_pos_in_ecef: np.ndarray
con_vel_in_ecef: np.ndarray
quat_con_ecef: np.ndarray


@dataclass
class IMUData:
"""
Data structure for IMU data.
"""
dvel_in_imu: np.ndarray
dangle_in_imu: np.ndarray


@dataclass
class LidarData:
"""
Data structure for Lidar data.
"""
range_m: np.ndarray
doppler_speed_mps: np.ndarray
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data structure from different sensors.

Comment on lines +61 to +69
@dataclass(order=True)
class SensorEvent:
"""
A generic sensor event with a timestamp,
source identifier, and data payload.
"""
timestamp: float
source: str = field(compare=False)
data: Any = field(compare=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic sensor event with a timestamp,
source identifier, and data payload.

Comment on lines +99 to +108
def load(self):
"""
Load imu, lidar, and ground-truth data,
then sort events by timestamp.
"""
self.load_imu()
self.load_lidar()
self.load_ground_truth()
self.events.sort()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling this function, this function will load data from different files, then sort the data based on the order of timestamps.

Comment on lines +221 to +232
def main():
dataset = FlightDataset()
dataset.load()
print(f"Loaded {len(dataset)} events.")
it = dataset.iter_events()
for i in range(5):
event = next(it)
print(f"Event {i}:")
print(f" timestamp={event.timestamp}")
print(f" source={event.source}")
print(f" data={event.data}")
print()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple usage to load dataset by iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.

Points to address:

  • Clarify why the CLI is ever needed.
  • Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
  • Do not directly import module content.
  • Clarify if it is better to make FlightDataset an iterator.
  • Name mangling is not necessary. Use _load_text() instead.

The code quality is not very good. We could need more discussions.

from dataclasses import dataclass, field
import numpy as np
from pathlib import Path
from typing import Any, Iterable, Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not directly import module content. That is, do not from ... import .... Use only import ... for all modules and entities.

import typing, and then consume the contents like typing.Any, instead of from typing import Any; Any. The former removes the module name at the consuming site.

dataset = FlightDataset()
dataset.load()
print(f"Loaded {len(dataset)} events.")
it = dataset.iter_events()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is preventing you from using enumerate()?

for i, event in enumerate(dataset):
  if i > 5:
    break

Making dataset an iterator is more straigh-forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are around 130000 event data points, I'm not sure whether it is a good idea without iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enumerate() uses the iterator protocol as well.

data: Any = field(compare=False)


class FlightDataset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a good idea for high-performance computing to rely on looping in Python. As a prototype, it's fine. But eventually you will need to get this thing into C++ for speed and saving memory.

"""
return self.events[idx]

def iter_events(self, sources: Optional[str | Iterable[str]] = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing information in Python will get in your way when moving the code into C++.

Let Python be dynamic. If you need typing information, it is a sign that the code needs to be in C++ soon.

Iterate over events of one or more sources.

:param sources: Event source(s). If None, iterate over all events.
Sources can be a single source string or an iterable of source strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx docstring format is off.

"""
if sources is None:
yield from self.events
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return after yeild? Is it ever reaching return?

Load imu data.
"""
data = self.__load_text(self.imu_csv)
for row in data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be very slow compare to C++. I would guess 20-100x.

)
self.events.append(event)

def __load_text(self, path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name mangling is not necessary. Use _load_text() instead.

Comment on lines +221 to +232
def main():
dataset = FlightDataset()
dataset.load()
print(f"Loaded {len(dataset)} events.")
it = dataset.iter_events()
for i in range(5):
event = next(it)
print(f"Event {i}:")
print(f" timestamp={event.timestamp}")
print(f" source={event.source}")
print(f" data={event.data}")
print()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?

@yungyuc yungyuc changed the title Implement NASA flight data loader. Implement NASA flight data loader Mar 18, 2026
@yungyuc
Copy link
Member

yungyuc commented Mar 20, 2026

@ThreeMonth03 update?

@ThreeMonth03 ThreeMonth03 force-pushed the data_processing branch 3 times, most recently from 0ef7fc3 to 022c53d Compare March 20, 2026 16:10
@yungyuc yungyuc added the array Multi-dimensional array implementation label Mar 21, 2026
@yungyuc yungyuc moved this to In Progress in tabular data processing Mar 21, 2026
@ThreeMonth03 ThreeMonth03 force-pushed the data_processing branch 4 times, most recently from fd28592 to 88d49f0 Compare March 22, 2026 05:31
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please review this pull request.

Comment on lines +197 to +330

class TimeSeriesDataFrameTC(unittest.TestCase):

col_sol = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'DELTA_VEL[3]']
col_sol2 = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'EPOCH', 'DELTA_VEL[3]']

dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
"""
modified_dlc_data = """DELTA_VEL[1] ,DELTA_VEL[2] ,EPOCH ,DELTA_VEL[3]
-0.18792724609375,-0.00048828125,1602596010229299968,-0.0478515625
-0.1903076171875,-0.0009765625,1602596010249309952,-0.0489501953125
-0.18743896484375,0.0006103515625,1602596010269309952,-0.0498046875
-0.18927001953125,-0.0009765625,1602596010289319936,-0.04840087890625
-0.188720703125,-0.00103759765625,1602596010309309952,-0.0504150390625
-0.18951416015625,-0.000732421875,1602596010329309952,-0.0489501953125
-0.18902587890625,-0.000732421875,1602596010349309952,-0.0489501953125
-0.1895751953125,-0.00128173828125,1602596010369299968,-0.04925537109375
-0.18841552734375,6.103515625e-05,1602596010389309952,-0.0489501953125
-0.1884765625,-0.00042724609375,1602596010409309952,-0.04840087890625
"""
unsorted_dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
"""

def test_read_from_text_file_basic(self):
tsdf = TimeSeriesDataFrame()

tsdf.read_from_text_file(io.StringIO(self.dlc_data))
self.assertEqual(tsdf._columns, self.col_sol)
self.assertEqual(len(tsdf._columns), 3)
for i in range(len(tsdf._columns)):
self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
self.assertEqual(tsdf._index_name, 'EPOCH')

tsdf.read_from_text_file(
io.StringIO(self.modified_dlc_data),
delimiter=',',
timestamp_column='EPOCH'
)

self.assertEqual(tsdf._columns, self.col_sol)
self.assertEqual(len(tsdf._columns), 3)
for i in range(len(tsdf._columns)):
self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
self.assertEqual(tsdf._index_name, 'EPOCH')

tsdf.read_from_text_file(
io.StringIO(self.modified_dlc_data),
delimiter=',',
timestamp_in_file=False
)
self.assertEqual(tsdf._columns, self.col_sol2)
self.assertEqual(len(tsdf._columns), 4)
for i in range(len(tsdf._columns)):
self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
self.assertEqual(tsdf._index_name, 'Index')

def test_dataframe_attribute_columns(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))
self.assertEqual(tsdf.columns, self.col_sol)

def test_dataframe_attribute_shape(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))
self.assertEqual(tsdf.shape, (10, 3))

def test_dataframe_attribute_index(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))

nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]

self.assertEqual(
list(tsdf.index), list(nd_arr[:, 0].astype(np.uint64))
)

def test_dataframe_get_column(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))

col_data = tsdf['DELTA_VEL[1]']

nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]

self.assertEqual(list(col_data), list(nd_arr[:, 1]))

def test_dataframe_sort(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data))

# Test out-of-place sort
reordered_tsdf = tsdf.sort(tsdf.columns, index_column=None,
inplace=False)
col_data = reordered_tsdf['DELTA_VEL[1]']
nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
self.assertEqual(list(col_data), list(nd_arr[:, 1]))

# Test inplace sort_by_index
tsdf.sort_by_index()
col_data = tsdf['DELTA_VEL[1]']
nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
self.assertEqual(list(col_data), list(nd_arr[:, 1]))

# Test out-of-place sort with index_column
tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data),
timestamp_in_file=False)

reordered_tsdf = tsdf.sort(['EPOCH', 'DELTA_VEL[1]'],
index_column='EPOCH', inplace=False)
col_data = reordered_tsdf['DELTA_VEL[1]']
nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
self.assertEqual(list(col_data), list(nd_arr[:, 1]))

# vim: set ff=unix fenc=utf8 et sw=4 ts=4 sts=4:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move unittest from ‎tests/test_timeseries_dataframe.py to tests/test_track.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the original test file is refactoring. Do not mix refactoring with adding features in the same PR. Revert the change to the original test file.

Add new testing code in a new file.

Comment on lines +27 to 38

import os
import contextlib
from io import StringIO
import unittest

import numpy as np

from modmesh import SimpleArrayUint64, SimpleArrayFloat64

all = ['TimeSeriesDataFrame']


class TimeSeriesDataFrame(object):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move tests/test_timeseries_dataframe.py to modmesh/track/timeseries_dataframe.py

Comment on lines +128 to +135
def load(self):
"""
Load all source datasets and build the timestamp timeline.
"""
self.dataframes["imu"] = self._load_dataframe(self.imu_csv)
self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv)
self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv)
self._rebuild_timeline()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load data with timeseries_dataframe().

Comment on lines +154 to +175
def _rebuild_timeline(self):
"""
Rebuild the timestamp timeline from loaded source dataframes.
"""
timeline_map: dict[int, list[EventReference]] = {}
for source, dataframe in self.dataframes.items():
for row_index, timestamp in enumerate(dataframe.index):
timestamp = int(timestamp)
timeline_map.setdefault(timestamp, []).append(
EventReference(
dataset=self,
timestamp=timestamp,
source=source,
row=row_index,
)
)

self.events = [
ref
for timestamp in sorted(timeline_map)
for ref in timeline_map[timestamp]
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build a sorted list to record the order of timestamps.

Comment on lines +112 to +120
@property
def data(self):
"""
Return a lazy view of the original row data.

:return: Lazy row view backed by the source dataframe.
:rtype: _EventDataView
"""
return _EventDataView(self.dataset.dataframes[self.source], self.row)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get data by _EventDataView.

@ThreeMonth03 ThreeMonth03 requested a review from yungyuc March 22, 2026 05:32
@yungyuc
Copy link
Member

yungyuc commented Mar 22, 2026

@yungyuc Please review this pull request.

I am not seeing your annotations to explain how you addressed the points I made. Please help my review by leaving annotations where you address the points.

Without your annotations, it will take a lot of time to review. It wouldn't take unreasonable amount of time for the author to explain how the points are addressed with the annotations.

There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.

Points to address:

  • Clarify why the CLI is ever needed.
  • Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
  • Do not directly import module content.
  • Clarify if it is better to make FlightDataset an iterator.
  • Name mangling is not necessary. Use _load_text() instead.

The code quality is not very good. We could need more discussions.

Additional point:

  • Rename the overly-long module name timeseries_dataframe to dataframe.

import dataclasses
import pathlib

from .timeseries_dataframe import TimeSeriesDataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not import module contents. Import module itself and use the contents from the module like import timeseries_dataframe; timeseries_datafrom.TimeSeriesDataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeseries_dataframe is too long a module name. Name it as dataframe.

@ThreeMonth03 ThreeMonth03 force-pushed the data_processing branch 3 times, most recently from 74aaa9e to 6b5c6e3 Compare March 22, 2026 10:13
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Clarify why the CLI is ever needed.
    We don't need CLI because we have demonstrate the usage of dataset in unittest.
  • Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
  • Do not directly import module content.
  • Clarify if it is better to make FlightDataset an iterator.
    No, the data is not large. Therefore, I implement __getitem__ method.
  • Name mangling is not necessary. Use _load_text() instead.
  • Rename the overly-long module name timeseries_dataframe to dataframe.

@yungyuc Please review this pr. Thanks.

Comment on lines +39 to +53
def _write_csv(self, directory, name, content):
path = pathlib.Path(directory) / name
path.write_text(content, encoding="utf-8")
return str(path)

def test_load_flight_dataset(self):
with tempfile.TemporaryDirectory() as tmpdir:
imu_data = (
"TIME_NANOSECONDS_TAI ,DATA_DELTA_VEL[1] ,"
"DATA_DELTA_VEL[2] ,DATA_DELTA_VEL[3] ,"
"DATA_DELTA_ANGLE[1] ,DATA_DELTA_ANGLE[2] ,"
"DATA_DELTA_ANGLE[3]\n"
"30,3.0,3.1,3.2,30.0,30.1,30.2\n"
"10,1.0,1.1,1.2,10.0,10.1,10.2\n"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unittest for data loader.

Comment on lines +31 to +34
import dataclasses
import json
import ssl
import pathlib
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import module undirectly.

Comment on lines +250 to +265
def _load_dataframe(self, path):
"""
Load one CSV file into a time-series dataframe.

:param path: Path to a source CSV file.
:type path: pathlib.Path or str
:return: Loaded dataframe for the source file.
:rtype: DataFrame
"""
tsdf = DataFrame()
tsdf.read_from_text_file(
path,
delimiter=",",
timestamp_column="TIME_NANOSECONDS_TAI",
)
return tsdf
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dunder naming.


class TimeSeriesDataFrame(object):

class DataFrame(object):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the long module.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pay attention to the review comments. I believe I already mentioned the following point for 2 times in this PR, and you still made it wrong (this is the third time):

  • Use relative import for importing modules in the project.

This PR is to add new feature. Do not include refactoring work.

  • Add new testing code in a new file.
  • I do not see _load_text(). Clarify where it is.

import zipfile

from pathlib import Path
from modmesh.track.dataframe import DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the third time I am making the comment in this PR: do not directly import module content. Import the module, and call the module content.

And you should use relative import for importing modules in the project.

If you are not familiar with PEP-8, get familiar with it.

Comment on lines +197 to +330

class TimeSeriesDataFrameTC(unittest.TestCase):

col_sol = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'DELTA_VEL[3]']
col_sol2 = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'EPOCH', 'DELTA_VEL[3]']

dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
"""
modified_dlc_data = """DELTA_VEL[1] ,DELTA_VEL[2] ,EPOCH ,DELTA_VEL[3]
-0.18792724609375,-0.00048828125,1602596010229299968,-0.0478515625
-0.1903076171875,-0.0009765625,1602596010249309952,-0.0489501953125
-0.18743896484375,0.0006103515625,1602596010269309952,-0.0498046875
-0.18927001953125,-0.0009765625,1602596010289319936,-0.04840087890625
-0.188720703125,-0.00103759765625,1602596010309309952,-0.0504150390625
-0.18951416015625,-0.000732421875,1602596010329309952,-0.0489501953125
-0.18902587890625,-0.000732421875,1602596010349309952,-0.0489501953125
-0.1895751953125,-0.00128173828125,1602596010369299968,-0.04925537109375
-0.18841552734375,6.103515625e-05,1602596010389309952,-0.0489501953125
-0.1884765625,-0.00042724609375,1602596010409309952,-0.04840087890625
"""
unsorted_dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
"""

def test_read_from_text_file_basic(self):
tsdf = TimeSeriesDataFrame()

tsdf.read_from_text_file(io.StringIO(self.dlc_data))
self.assertEqual(tsdf._columns, self.col_sol)
self.assertEqual(len(tsdf._columns), 3)
for i in range(len(tsdf._columns)):
self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
self.assertEqual(tsdf._index_name, 'EPOCH')

tsdf.read_from_text_file(
io.StringIO(self.modified_dlc_data),
delimiter=',',
timestamp_column='EPOCH'
)

self.assertEqual(tsdf._columns, self.col_sol)
self.assertEqual(len(tsdf._columns), 3)
for i in range(len(tsdf._columns)):
self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
self.assertEqual(tsdf._index_name, 'EPOCH')

tsdf.read_from_text_file(
io.StringIO(self.modified_dlc_data),
delimiter=',',
timestamp_in_file=False
)
self.assertEqual(tsdf._columns, self.col_sol2)
self.assertEqual(len(tsdf._columns), 4)
for i in range(len(tsdf._columns)):
self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
self.assertEqual(tsdf._index_name, 'Index')

def test_dataframe_attribute_columns(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))
self.assertEqual(tsdf.columns, self.col_sol)

def test_dataframe_attribute_shape(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))
self.assertEqual(tsdf.shape, (10, 3))

def test_dataframe_attribute_index(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))

nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]

self.assertEqual(
list(tsdf.index), list(nd_arr[:, 0].astype(np.uint64))
)

def test_dataframe_get_column(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.dlc_data))

col_data = tsdf['DELTA_VEL[1]']

nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]

self.assertEqual(list(col_data), list(nd_arr[:, 1]))

def test_dataframe_sort(self):
tsdf = TimeSeriesDataFrame()
tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data))

# Test out-of-place sort
reordered_tsdf = tsdf.sort(tsdf.columns, index_column=None,
inplace=False)
col_data = reordered_tsdf['DELTA_VEL[1]']
nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
self.assertEqual(list(col_data), list(nd_arr[:, 1]))

# Test inplace sort_by_index
tsdf.sort_by_index()
col_data = tsdf['DELTA_VEL[1]']
nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
self.assertEqual(list(col_data), list(nd_arr[:, 1]))

# Test out-of-place sort with index_column
tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data),
timestamp_in_file=False)

reordered_tsdf = tsdf.sort(['EPOCH', 'DELTA_VEL[1]'],
index_column='EPOCH', inplace=False)
col_data = reordered_tsdf['DELTA_VEL[1]']
nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
self.assertEqual(list(col_data), list(nd_arr[:, 1]))

# vim: set ff=unix fenc=utf8 et sw=4 ts=4 sts=4:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the original test file is refactoring. Do not mix refactoring with adding features in the same PR. Revert the change to the original test file.

Add new testing code in a new file.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use relative import for importing modules in the project.
  • Add new testing code in a new file.
  • I do not see _load_text(). Clarify where it is.
    It has been removed because NasaDataset.load() loads data by modmesh.track.dataframe now, which is developed by Zong-han.

@yungyuc Please review this pr again. Thanks.

import zipfile

from pathlib import Path
from . import dataframe
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use relative import to import module.

import tempfile
import unittest

import modmesh.track.dataset as dataset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid import module content and add new unitttest in newfile.

Copy link
Member

@yungyuc yungyuc Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad Python code. Change it to the relative import idiomatic Python:

from modmesh.track import dataset

Comment on lines -34 to -46
from modmesh import SimpleArrayUint64, SimpleArrayFloat64


class TimeSeriesDataFrame(object):

def __init__(self):
self._init_members()

def _init_members(self):
self._columns = list()
self._index_data = None
self._index_name = None
self._data = list()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move dataframe to modmesh/track/dataframe.py.

@ThreeMonth03 ThreeMonth03 requested a review from yungyuc March 22, 2026 13:23
Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself keeping to repeat the same points and lost counts for them. @ThreeMonth03 , I hope I will not need to repeat it for another time.

  • Clarify if it is better to make FlightDataset an iterator.
  • Name mangling is not necessary. Use _load_text() instead.
  • I do not see _load_text(). Clarify where it is.

Addition point to address:

  • Use relative import Pythonic from modmesh.track import dataset

import tempfile
import unittest

import modmesh.track.dataset as dataset
Copy link
Member

@yungyuc yungyuc Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad Python code. Change it to the relative import idiomatic Python:

from modmesh.track import dataset

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc I think I fix the import statement this time, and I make a new issue about refactoring import statements in this project, see #701 .

import tempfile
import unittest

from modmesh.track import dataset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import module with pythonic statement.

True)
"""
return self.sort(columns=columns, index_column=None, inplace=inplace)
from modmesh.track import dataframe
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import module with pythonic statement.

@ThreeMonth03 ThreeMonth03 requested a review from yungyuc March 22, 2026 15:00
@yungyuc
Copy link
Member

yungyuc commented Mar 24, 2026

I found myself keeping to repeat the same points and lost counts for them. @ThreeMonth03 , I hope I will not need to repeat it for another time.

  • Clarify if it is better to make FlightDataset an iterator.
  • Name mangling is not necessary. Use _load_text() instead.
  • I do not see _load_text(). Clarify where it is.

@ThreeMonth03 you have not addressed the above points and made them clear with inline annotations. Please read my point carefully and address them. I really hope that I do not need to repeat them.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Clarify if it is better to make FlightDataset an iterator.
  • Name mangling is not necessary. Use _load_text() instead.
  • I do not see _load_text(). Clarify where it is.
    @yungyuc I think I have further explained the points in the inline annotations this time.

Comment on lines +292 to +319
def __iter__(self):
"""
Iterate over events in chronological order.

:return: Iterator over timestamp-ordered event references.
:rtype: iter[EventReference]
"""
return iter(self.events)

def __len__(self):
"""
Return the number of events in the timeline.

:return: Number of loaded events.
:rtype: int
"""
return len(self.events)

def __getitem__(self, idx):
"""
Return the event reference at ``idx``.

:param idx: Event index.
:type idx: int
:return: Event reference at the specified index.
:rtype: EventReference
"""
return self.events[idx]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not better to make FlightDataset an iterator. Sometimes, we want to take rows of data based on a series of unsorted indexes, which is not suitable for iterator because it is poor for random access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that you renamed FlightDataset to NasaDataset? Correct me if I am wrong. (If you did, it's something should be annotated when the code review was requested.)

I am not sure you understand what I meant to make NasaDataset to be an iterator: https://docs.python.org/3/glossary.html#term-iterator ?

Comment on lines +237 to +264
def load(self):
"""
Load all source datasets and build the timestamp timeline.

:return: None
:rtype: None
"""
self.dataframes["imu"] = self._load_dataframe(self.imu_csv)
self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv)
self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv)
self._rebuild_timeline()

def _load_dataframe(self, path):
"""
Load one CSV file into a time-series dataframe.

:param path: Path to a source CSV file.
:type path: pathlib.Path or str
:return: Loaded dataframe for the source file.
:rtype: DataFrame
"""
tsdf = dataframe.DataFrame()
tsdf.read_from_text_file(
path,
delimiter=",",
timestamp_column="TIME_NANOSECONDS_TAI",
)
return tsdf
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed __load_text() in this pull request becuase I replaced __load_text() with read_from_text_file() to load tabular data, which is developed by Zon-Hang. Furthermore, I've avoided name mangling when creating member functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The explanation is clear. It was not clear from the code and you have rewritten the history.

While the code is still being discussed, keeping the history helps tracking the change a little bit. In the end the changesets should be reorganized, so it's not a problem to rebase and squash early. You the author is responsible for helping make the review easy to go.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we are able to move forward the review again. Points to address:

  • Use a one liner when you can fit a variable within a 80-char line.
  • Clarify the design for NasaDataset to support iterator.

self.download_dir = Path.cwd() / ".cache" / "download"
self.download_dir = pathlib.Path.cwd() / ".cache" / "download"
self.filename = filename
self.csv_dir = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a one liner when you can fit a variable within a 80-char line.

Do not abuse multi-line constructs. The abuse makes code harder to read.

There is a very easy idiom to shorten the 16 lines between 159-176 to 5:

        _ = self.download_dir = pathlib.Path.cwd() / ".cache" / "download"
        self.csv_dir = _ / "Flight1_Catered_Dataset-20201013" / "Data"
        self.imu_csv = self.csv_dir / "dlc.csv"
        self.lidar_csv = self.csv_dir / "commercial_lidar.csv"
        self.gt_csv = self.csv_dir / "truth.csv"

Comment on lines +237 to +264
def load(self):
"""
Load all source datasets and build the timestamp timeline.

:return: None
:rtype: None
"""
self.dataframes["imu"] = self._load_dataframe(self.imu_csv)
self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv)
self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv)
self._rebuild_timeline()

def _load_dataframe(self, path):
"""
Load one CSV file into a time-series dataframe.

:param path: Path to a source CSV file.
:type path: pathlib.Path or str
:return: Loaded dataframe for the source file.
:rtype: DataFrame
"""
tsdf = dataframe.DataFrame()
tsdf.read_from_text_file(
path,
delimiter=",",
timestamp_column="TIME_NANOSECONDS_TAI",
)
return tsdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The explanation is clear. It was not clear from the code and you have rewritten the history.

While the code is still being discussed, keeping the history helps tracking the change a little bit. In the end the changesets should be reorganized, so it's not a problem to rebase and squash early. You the author is responsible for helping make the review easy to go.

Comment on lines +292 to +319
def __iter__(self):
"""
Iterate over events in chronological order.

:return: Iterator over timestamp-ordered event references.
:rtype: iter[EventReference]
"""
return iter(self.events)

def __len__(self):
"""
Return the number of events in the timeline.

:return: Number of loaded events.
:rtype: int
"""
return len(self.events)

def __getitem__(self, idx):
"""
Return the event reference at ``idx``.

:param idx: Event index.
:type idx: int
:return: Event reference at the specified index.
:rtype: EventReference
"""
return self.events[idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that you renamed FlightDataset to NasaDataset? Correct me if I am wrong. (If you did, it's something should be annotated when the code review was requested.)

I am not sure you understand what I meant to make NasaDataset to be an iterator: https://docs.python.org/3/glossary.html#term-iterator ?

for ref in timeline_map[timestamp]
]

def __iter__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already provided __iter__() in NasaDataset. Why not just provide __next__()?

IIRC, Python automatically provides a default pair of __iter__() and __next__() when you implement both __len__() and __getitem__() that takes an integer. If you rely on the automatically provided iterator protocol implementation, you do not need to provide a custom __iter__().

That is, having __iter__() without __next__() should be a mistake. What do you really want to do here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

array Multi-dimensional array implementation

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants