Implement NASA flight data loader by ThreeMonth03 · Pull Request #696 · solvcon/modmesh

ThreeMonth03 · 2026-03-17T16:08:56Z

I implement the data loader of NASA flight dataset to help us finish the future work in issue #685. As for the usage of data loader, you can see the following command and understand how to get data from a iterator.

(base) threemonth03@threemonthwin:~/Downloads/modmesh$ python3 modmesh/track/load.py
Loaded 120791 events.
Event 0:
  timestamp=1.60259601021904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36558835, -5268929.31643981,  3306577.65409484]), con_vel_in_ecef=array([ 0.00220006, -0.0081095 ,  0.00414972]), quat_con_ecef=array([-0.45840088, -0.1767584 ,  0.51147502,  0.70499532]))

Event 1:
  timestamp=1.60259601022904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36564722, -5268929.31643027,  3306577.65410693]), con_vel_in_ecef=array([ 0.00197762, -0.00815516,  0.00414352]), quat_con_ecef=array([-0.45840118, -0.17675819,  0.51147449,  0.70499557]))

Event 2:
  timestamp=1.6025960102293e+18
  source=imu
  data=IMUData(dvel_in_imu=array([-0.18792725, -0.00048828, -0.04785156]), dangle_in_imu=array([-1.90734863e-06,  3.81469727e-06,  1.90734863e-06]))

Event 3:
  timestamp=1.60259601023904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36570593, -5268929.31642279,  3306577.65411988]), con_vel_in_ecef=array([ 0.00190321, -0.00814863,  0.00428433]), quat_con_ecef=array([-0.45840138, -0.17675799,  0.51147416,  0.70499573]))

Event 4:
  timestamp=1.60259601024904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36576642, -5268929.31641798,  3306577.65413258]), con_vel_in_ecef=array([ 0.00201493, -0.00820102,  0.00423451]), quat_con_ecef=array([-0.45840157, -0.17675801,  0.51147413,  0.70499562]))

ThreeMonth03

@yungyuc Please review this pull request when you are available. Thanks.

ThreeMonth03 · 2026-03-17T16:10:05Z

modmesh/track/load.py

+@dataclass
+class GroundTruthData:
+    """
+    Data structure for ground truth data.
+    """
+    con_pos_in_ecef: np.ndarray
+    con_vel_in_ecef: np.ndarray
+    quat_con_ecef: np.ndarray
+
+
+@dataclass
+class IMUData:
+    """
+    Data structure for IMU data.
+    """
+    dvel_in_imu: np.ndarray
+    dangle_in_imu: np.ndarray
+
+
+@dataclass
+class LidarData:
+    """
+    Data structure for Lidar data.
+    """
+    range_m: np.ndarray
+    doppler_speed_mps: np.ndarray


Data structure from different sensors.

ThreeMonth03 · 2026-03-17T16:11:18Z

modmesh/track/load.py

+@dataclass(order=True)
+class SensorEvent:
+    """
+    A generic sensor event with a timestamp,
+    source identifier, and data payload.
+    """
+    timestamp: float
+    source: str = field(compare=False)
+    data: Any = field(compare=False)


A generic sensor event with a timestamp,
source identifier, and data payload.

ThreeMonth03 · 2026-03-17T16:13:33Z

modmesh/track/load.py

+    def load(self):
+        """
+        Load imu, lidar, and ground-truth data,
+        then sort events by timestamp.
+        """
+        self.load_imu()
+        self.load_lidar()
+        self.load_ground_truth()
+        self.events.sort()
+


When calling this function, this function will load data from different files, then sort the data based on the order of timestamps.

ThreeMonth03 · 2026-03-17T16:14:15Z

modmesh/track/load.py

+def main():
+    dataset = FlightDataset()
+    dataset.load()
+    print(f"Loaded {len(dataset)} events.")
+    it = dataset.iter_events()
+    for i in range(5):
+        event = next(it)
+        print(f"Event {i}:")
+        print(f"  timestamp={event.timestamp}")
+        print(f"  source={event.source}")
+        print(f"  data={event.data}")
+        print()


A simple usage to load dataset by iterator.

Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?

yungyuc

There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.

Points to address:

Clarify why the CLI is ever needed.
Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
Do not directly import module content.
Clarify if it is better to make FlightDataset an iterator.
Name mangling is not necessary. Use _load_text() instead.

The code quality is not very good. We could need more discussions.

yungyuc · 2026-03-18T23:33:47Z

modmesh/track/load.py

+from dataclasses import dataclass, field
+import numpy as np
+from pathlib import Path
+from typing import Any, Iterable, Optional


Do not directly import module content. That is, do not from ... import .... Use only import ... for all modules and entities.

import typing, and then consume the contents like typing.Any, instead of from typing import Any; Any. The former removes the module name at the consuming site.

yungyuc · 2026-03-18T23:37:27Z

modmesh/track/load.py

+    dataset = FlightDataset()
+    dataset.load()
+    print(f"Loaded {len(dataset)} events.")
+    it = dataset.iter_events()


What is preventing you from using enumerate()?

for i, event in enumerate(dataset): if i > 5: break

Making dataset an iterator is more straigh-forward.

There are around 130000 event data points, I'm not sure whether it is a good idea without iterator.

enumerate() uses the iterator protocol as well.

yungyuc · 2026-03-18T23:38:30Z

modmesh/track/load.py

+    data: Any = field(compare=False)
+
+
+class FlightDataset:


It is not a good idea for high-performance computing to rely on looping in Python. As a prototype, it's fine. But eventually you will need to get this thing into C++ for speed and saving memory.

yungyuc · 2026-03-18T23:39:37Z

modmesh/track/load.py

+        """
+        return self.events[idx]
+
+    def iter_events(self, sources: Optional[str | Iterable[str]] = None):


Typing information in Python will get in your way when moving the code into C++.

Let Python be dynamic. If you need typing information, it is a sign that the code needs to be in C++ soon.

yungyuc · 2026-03-18T23:40:21Z

modmesh/track/load.py

+        Iterate over events of one or more sources.
+
+        :param sources: Event source(s). If None, iterate over all events.
+        Sources can be a single source string or an iterable of source strings.


Sphinx docstring format is off.

yungyuc · 2026-03-18T23:41:01Z

modmesh/track/load.py

+        """
+        if sources is None:
+            yield from self.events
+            return


Why do you return after yeild? Is it ever reaching return?

yungyuc · 2026-03-18T23:42:12Z

modmesh/track/load.py

+        Load imu data.
+        """
+        data = self.__load_text(self.imu_csv)
+        for row in data:


May be very slow compare to C++. I would guess 20-100x.

yungyuc · 2026-03-18T23:42:52Z

modmesh/track/load.py

+            )
+            self.events.append(event)
+
+    def __load_text(self, path):


Name mangling is not necessary. Use _load_text() instead.

yungyuc · 2026-03-18T23:49:35Z

modmesh/track/load.py

+def main():
+    dataset = FlightDataset()
+    dataset.load()
+    print(f"Loaded {len(dataset)} events.")
+    it = dataset.iter_events()
+    for i in range(5):
+        event = next(it)
+        print(f"Event {i}:")
+        print(f"  timestamp={event.timestamp}")
+        print(f"  source={event.source}")
+        print(f"  data={event.data}")
+        print()


Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?

yungyuc · 2026-03-20T12:21:19Z

@ThreeMonth03 update?

ThreeMonth03

@yungyuc Please review this pull request.

ThreeMonth03 · 2026-03-22T05:06:36Z

tests/test_track.py

+
+class TimeSeriesDataFrameTC(unittest.TestCase):
+
+    col_sol = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'DELTA_VEL[3]']
+    col_sol2 = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'EPOCH', 'DELTA_VEL[3]']
+
+    dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
+1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
+1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
+1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
+1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
+1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
+1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
+1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
+1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
+1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
+1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
+"""
+    modified_dlc_data = """DELTA_VEL[1] ,DELTA_VEL[2] ,EPOCH ,DELTA_VEL[3]
+-0.18792724609375,-0.00048828125,1602596010229299968,-0.0478515625
+-0.1903076171875,-0.0009765625,1602596010249309952,-0.0489501953125
+-0.18743896484375,0.0006103515625,1602596010269309952,-0.0498046875
+-0.18927001953125,-0.0009765625,1602596010289319936,-0.04840087890625
+-0.188720703125,-0.00103759765625,1602596010309309952,-0.0504150390625
+-0.18951416015625,-0.000732421875,1602596010329309952,-0.0489501953125
+-0.18902587890625,-0.000732421875,1602596010349309952,-0.0489501953125
+-0.1895751953125,-0.00128173828125,1602596010369299968,-0.04925537109375
+-0.18841552734375,6.103515625e-05,1602596010389309952,-0.0489501953125
+-0.1884765625,-0.00042724609375,1602596010409309952,-0.04840087890625
+"""
+    unsorted_dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
+1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
+1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
+1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
+1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
+1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
+1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
+1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
+1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
+1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
+1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
+"""
+
+    def test_read_from_text_file_basic(self):
+        tsdf = TimeSeriesDataFrame()
+
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+        self.assertEqual(tsdf._columns, self.col_sol)
+        self.assertEqual(len(tsdf._columns), 3)
+        for i in range(len(tsdf._columns)):
+            self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
+        self.assertEqual(tsdf._index_name, 'EPOCH')
+
+        tsdf.read_from_text_file(
+            io.StringIO(self.modified_dlc_data),
+            delimiter=',',
+            timestamp_column='EPOCH'
+        )
+
+        self.assertEqual(tsdf._columns, self.col_sol)
+        self.assertEqual(len(tsdf._columns), 3)
+        for i in range(len(tsdf._columns)):
+            self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
+        self.assertEqual(tsdf._index_name, 'EPOCH')
+
+        tsdf.read_from_text_file(
+            io.StringIO(self.modified_dlc_data),
+            delimiter=',',
+            timestamp_in_file=False
+        )
+        self.assertEqual(tsdf._columns, self.col_sol2)
+        self.assertEqual(len(tsdf._columns), 4)
+        for i in range(len(tsdf._columns)):
+            self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
+        self.assertEqual(tsdf._index_name, 'Index')
+
+    def test_dataframe_attribute_columns(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+        self.assertEqual(tsdf.columns, self.col_sol)
+
+    def test_dataframe_attribute_shape(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+        self.assertEqual(tsdf.shape, (10, 3))
+
+    def test_dataframe_attribute_index(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+
+        self.assertEqual(
+            list(tsdf.index), list(nd_arr[:, 0].astype(np.uint64))
+        )
+
+    def test_dataframe_get_column(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+
+        col_data = tsdf['DELTA_VEL[1]']
+
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+    def test_dataframe_sort(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data))
+
+        # Test out-of-place sort
+        reordered_tsdf = tsdf.sort(tsdf.columns, index_column=None,
+                                   inplace=False)
+        col_data = reordered_tsdf['DELTA_VEL[1]']
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+        # Test inplace sort_by_index
+        tsdf.sort_by_index()
+        col_data = tsdf['DELTA_VEL[1]']
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+        # Test out-of-place sort with index_column
+        tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data),
+                                 timestamp_in_file=False)
+
+        reordered_tsdf = tsdf.sort(['EPOCH', 'DELTA_VEL[1]'],
+                                   index_column='EPOCH', inplace=False)
+        col_data = reordered_tsdf['DELTA_VEL[1]']
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+# vim: set ff=unix fenc=utf8 et sw=4 ts=4 sts=4:


Move unittest from ‎tests/test_timeseries_dataframe.py to tests/test_track.py.

Moving the original test file is refactoring. Do not mix refactoring with adding features in the same PR. Revert the change to the original test file.

Add new testing code in a new file.

ThreeMonth03 · 2026-03-22T05:07:54Z

modmesh/track/dataframe.py

+
 import os
 import contextlib
-from io import StringIO
-import unittest
-
 import numpy as np

 from modmesh import SimpleArrayUint64, SimpleArrayFloat64

+all = ['TimeSeriesDataFrame']
+

 class TimeSeriesDataFrame(object):



Move tests/test_timeseries_dataframe.py to modmesh/track/timeseries_dataframe.py

ThreeMonth03 · 2026-03-22T05:09:54Z

modmesh/track/load.py

+    def load(self):
+        """
+        Load all source datasets and build the timestamp timeline.
+        """
+        self.dataframes["imu"] = self._load_dataframe(self.imu_csv)
+        self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv)
+        self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv)
+        self._rebuild_timeline()


load data with timeseries_dataframe().

ThreeMonth03 · 2026-03-22T05:11:45Z

modmesh/track/load.py

+    def _rebuild_timeline(self):
+        """
+        Rebuild the timestamp timeline from loaded source dataframes.
+        """
+        timeline_map: dict[int, list[EventReference]] = {}
+        for source, dataframe in self.dataframes.items():
+            for row_index, timestamp in enumerate(dataframe.index):
+                timestamp = int(timestamp)
+                timeline_map.setdefault(timestamp, []).append(
+                    EventReference(
+                        dataset=self,
+                        timestamp=timestamp,
+                        source=source,
+                        row=row_index,
+                    )
+                )
+
+        self.events = [
+            ref
+            for timestamp in sorted(timeline_map)
+            for ref in timeline_map[timestamp]
+        ]


Build a sorted list to record the order of timestamps.

ThreeMonth03 · 2026-03-22T05:32:04Z

modmesh/track/load.py

+    @property
+    def data(self):
+        """
+        Return a lazy view of the original row data.
+
+        :return: Lazy row view backed by the source dataframe.
+        :rtype: _EventDataView
+        """
+        return _EventDataView(self.dataset.dataframes[self.source], self.row)


Get data by _EventDataView.

yungyuc · 2026-03-22T09:22:14Z

@yungyuc Please review this pull request.

I am not seeing your annotations to explain how you addressed the points I made. Please help my review by leaving annotations where you address the points.

Without your annotations, it will take a lot of time to review. It wouldn't take unreasonable amount of time for the author to explain how the points are addressed with the annotations.

There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.

Points to address:

Clarify why the CLI is ever needed.

Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.

Do not directly import module content.

Clarify if it is better to make FlightDataset an iterator.

Name mangling is not necessary. Use _load_text() instead.

The code quality is not very good. We could need more discussions.

Additional point:

Rename the overly-long module name timeseries_dataframe to dataframe.

yungyuc · 2026-03-22T09:17:20Z

modmesh/track/load.py

+import dataclasses
+import pathlib
+
+from .timeseries_dataframe import TimeSeriesDataFrame


Do not import module contents. Import module itself and use the contents from the module like import timeseries_dataframe; timeseries_datafrom.TimeSeriesDataFrame.

yungyuc · 2026-03-22T09:23:04Z

modmesh/track/dataframe.py

timeseries_dataframe is too long a module name. Name it as dataframe.

ThreeMonth03

Clarify why the CLI is ever needed.
We don't need CLI because we have demonstrate the usage of dataset in unittest.
Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
Do not directly import module content.
Clarify if it is better to make FlightDataset an iterator.
No, the data is not large. Therefore, I implement __getitem__ method.
Name mangling is not necessary. Use _load_text() instead.
Rename the overly-long module name timeseries_dataframe to dataframe.

@yungyuc Please review this pr. Thanks.

ThreeMonth03 · 2026-03-22T10:13:39Z

tests/test_track.py

+    def _write_csv(self, directory, name, content):
+        path = pathlib.Path(directory) / name
+        path.write_text(content, encoding="utf-8")
+        return str(path)
+
+    def test_load_flight_dataset(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            imu_data = (
+                "TIME_NANOSECONDS_TAI ,DATA_DELTA_VEL[1] ,"
+                "DATA_DELTA_VEL[2] ,DATA_DELTA_VEL[3] ,"
+                "DATA_DELTA_ANGLE[1] ,DATA_DELTA_ANGLE[2] ,"
+                "DATA_DELTA_ANGLE[3]\n"
+                "30,3.0,3.1,3.2,30.0,30.1,30.2\n"
+                "10,1.0,1.1,1.2,10.0,10.1,10.2\n"
+            )


Add unittest for data loader.

ThreeMonth03 · 2026-03-22T10:14:40Z

modmesh/track/dataset.py

+import dataclasses
 import json
 import ssl
+import pathlib


Import module undirectly.

ThreeMonth03 · 2026-03-22T10:21:26Z

modmesh/track/dataset.py

+    def _load_dataframe(self, path):
+        """
+        Load one CSV file into a time-series dataframe.
+
+        :param path: Path to a source CSV file.
+        :type path: pathlib.Path or str
+        :return: Loaded dataframe for the source file.
+        :rtype: DataFrame
+        """
+        tsdf = DataFrame()
+        tsdf.read_from_text_file(
+            path,
+            delimiter=",",
+            timestamp_column="TIME_NANOSECONDS_TAI",
+        )
+        return tsdf


Remove dunder naming.

ThreeMonth03 · 2026-03-22T10:21:56Z

modmesh/track/dataframe.py


-class TimeSeriesDataFrame(object):
+
+class DataFrame(object):


Rename the long module.

yungyuc

Please pay attention to the review comments. I believe I already mentioned the following point for 2 times in this PR, and you still made it wrong (this is the third time):

Use relative import for importing modules in the project.

This PR is to add new feature. Do not include refactoring work.

Add new testing code in a new file.
I do not see _load_text(). Clarify where it is.

yungyuc · 2026-03-22T11:29:35Z

modmesh/track/dataset.py

 import zipfile

-from pathlib import Path
+from modmesh.track.dataframe import DataFrame


I think it's the third time I am making the comment in this PR: do not directly import module content. Import the module, and call the module content.

And you should use relative import for importing modules in the project.

If you are not familiar with PEP-8, get familiar with it.

yungyuc · 2026-03-22T11:33:27Z

tests/test_track.py

+
+class TimeSeriesDataFrameTC(unittest.TestCase):
+
+    col_sol = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'DELTA_VEL[3]']
+    col_sol2 = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'EPOCH', 'DELTA_VEL[3]']
+
+    dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
+1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
+1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
+1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
+1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
+1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
+1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
+1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
+1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
+1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
+1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
+"""
+    modified_dlc_data = """DELTA_VEL[1] ,DELTA_VEL[2] ,EPOCH ,DELTA_VEL[3]
+-0.18792724609375,-0.00048828125,1602596010229299968,-0.0478515625
+-0.1903076171875,-0.0009765625,1602596010249309952,-0.0489501953125
+-0.18743896484375,0.0006103515625,1602596010269309952,-0.0498046875
+-0.18927001953125,-0.0009765625,1602596010289319936,-0.04840087890625
+-0.188720703125,-0.00103759765625,1602596010309309952,-0.0504150390625
+-0.18951416015625,-0.000732421875,1602596010329309952,-0.0489501953125
+-0.18902587890625,-0.000732421875,1602596010349309952,-0.0489501953125
+-0.1895751953125,-0.00128173828125,1602596010369299968,-0.04925537109375
+-0.18841552734375,6.103515625e-05,1602596010389309952,-0.0489501953125
+-0.1884765625,-0.00042724609375,1602596010409309952,-0.04840087890625
+"""
+    unsorted_dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3]
+1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125
+1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125
+1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625
+1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125
+1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625
+1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375
+1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625
+1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875
+1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625
+1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125
+"""
+
+    def test_read_from_text_file_basic(self):
+        tsdf = TimeSeriesDataFrame()
+
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+        self.assertEqual(tsdf._columns, self.col_sol)
+        self.assertEqual(len(tsdf._columns), 3)
+        for i in range(len(tsdf._columns)):
+            self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
+        self.assertEqual(tsdf._index_name, 'EPOCH')
+
+        tsdf.read_from_text_file(
+            io.StringIO(self.modified_dlc_data),
+            delimiter=',',
+            timestamp_column='EPOCH'
+        )
+
+        self.assertEqual(tsdf._columns, self.col_sol)
+        self.assertEqual(len(tsdf._columns), 3)
+        for i in range(len(tsdf._columns)):
+            self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
+        self.assertEqual(tsdf._index_name, 'EPOCH')
+
+        tsdf.read_from_text_file(
+            io.StringIO(self.modified_dlc_data),
+            delimiter=',',
+            timestamp_in_file=False
+        )
+        self.assertEqual(tsdf._columns, self.col_sol2)
+        self.assertEqual(len(tsdf._columns), 4)
+        for i in range(len(tsdf._columns)):
+            self.assertEqual(tsdf._data[i].ndarray.shape[0], 10)
+        self.assertEqual(tsdf._index_name, 'Index')
+
+    def test_dataframe_attribute_columns(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+        self.assertEqual(tsdf.columns, self.col_sol)
+
+    def test_dataframe_attribute_shape(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+        self.assertEqual(tsdf.shape, (10, 3))
+
+    def test_dataframe_attribute_index(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+
+        self.assertEqual(
+            list(tsdf.index), list(nd_arr[:, 0].astype(np.uint64))
+        )
+
+    def test_dataframe_get_column(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.dlc_data))
+
+        col_data = tsdf['DELTA_VEL[1]']
+
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+    def test_dataframe_sort(self):
+        tsdf = TimeSeriesDataFrame()
+        tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data))
+
+        # Test out-of-place sort
+        reordered_tsdf = tsdf.sort(tsdf.columns, index_column=None,
+                                   inplace=False)
+        col_data = reordered_tsdf['DELTA_VEL[1]']
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+        # Test inplace sort_by_index
+        tsdf.sort_by_index()
+        col_data = tsdf['DELTA_VEL[1]']
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+        # Test out-of-place sort with index_column
+        tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data),
+                                 timestamp_in_file=False)
+
+        reordered_tsdf = tsdf.sort(['EPOCH', 'DELTA_VEL[1]'],
+                                   index_column='EPOCH', inplace=False)
+        col_data = reordered_tsdf['DELTA_VEL[1]']
+        nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:]
+        self.assertEqual(list(col_data), list(nd_arr[:, 1]))
+
+# vim: set ff=unix fenc=utf8 et sw=4 ts=4 sts=4:


Moving the original test file is refactoring. Do not mix refactoring with adding features in the same PR. Revert the change to the original test file.

Add new testing code in a new file.

ThreeMonth03

Use relative import for importing modules in the project.
Add new testing code in a new file.
I do not see _load_text(). Clarify where it is.
It has been removed because NasaDataset.load() loads data by modmesh.track.dataframe now, which is developed by Zong-han.

@yungyuc Please review this pr again. Thanks.

ThreeMonth03 · 2026-03-22T13:19:46Z

modmesh/track/dataset.py

 import zipfile

-from pathlib import Path
+from . import dataframe


Use relative import to import module.

ThreeMonth03 · 2026-03-22T13:20:28Z

tests/test_track.py

+import tempfile
+import unittest
+
+import modmesh.track.dataset as dataset


Avoid import module content and add new unitttest in newfile.

This is bad Python code. Change it to the ~~relative import~~ idiomatic Python:

from modmesh.track import dataset

ThreeMonth03 · 2026-03-22T13:21:09Z

tests/test_timeseries_dataframe.py

-from modmesh import SimpleArrayUint64, SimpleArrayFloat64
-
-
-class TimeSeriesDataFrame(object):
-
-    def __init__(self):
-        self._init_members()
-
-    def _init_members(self):
-        self._columns = list()
-        self._index_data = None
-        self._index_name = None
-        self._data = list()


Move dataframe to modmesh/track/dataframe.py.

yungyuc

I found myself keeping to repeat the same points and lost counts for them. @ThreeMonth03 , I hope I will not need to repeat it for another time.

Clarify if it is better to make FlightDataset an iterator.

Name mangling is not necessary. Use _load_text() instead.

I do not see _load_text(). Clarify where it is.

Addition point to address:

Use ~~relative import~~ Pythonic from modmesh.track import dataset

yungyuc · 2026-03-22T13:57:05Z

tests/test_track.py

+import tempfile
+import unittest
+
+import modmesh.track.dataset as dataset


This is bad Python code. Change it to the ~~relative import~~ idiomatic Python:

from modmesh.track import dataset

ThreeMonth03

@yungyuc I think I fix the import statement this time, and I make a new issue about refactoring import statements in this project, see #701 .

ThreeMonth03 · 2026-03-22T14:55:38Z

tests/test_track.py

+import tempfile
+import unittest
+
+from modmesh.track import dataset


Import module with pythonic statement.

ThreeMonth03 · 2026-03-22T14:55:47Z

tests/test_timeseries_dataframe.py

-                True)
-        """
-        return self.sort(columns=columns, index_column=None, inplace=inplace)
+from modmesh.track import dataframe


Import module with pythonic statement.

yungyuc · 2026-03-24T14:12:24Z

I found myself keeping to repeat the same points and lost counts for them. @ThreeMonth03 , I hope I will not need to repeat it for another time.

Clarify if it is better to make FlightDataset an iterator.

Name mangling is not necessary. Use _load_text() instead.

I do not see _load_text(). Clarify where it is.

@ThreeMonth03 you have not addressed the above points and made them clear with inline annotations. Please read my point carefully and address them. I really hope that I do not need to repeat them.

ThreeMonth03

Clarify if it is better to make FlightDataset an iterator.
Name mangling is not necessary. Use _load_text() instead.
I do not see _load_text(). Clarify where it is.
@yungyuc I think I have further explained the points in the inline annotations this time.

ThreeMonth03 · 2026-03-24T15:42:42Z

modmesh/track/dataset.py

+    def __iter__(self):
+        """
+        Iterate over events in chronological order.
+
+        :return: Iterator over timestamp-ordered event references.
+        :rtype: iter[EventReference]
+        """
+        return iter(self.events)
+
+    def __len__(self):
+        """
+        Return the number of events in the timeline.
+
+        :return: Number of loaded events.
+        :rtype: int
+        """
+        return len(self.events)
+
+    def __getitem__(self, idx):
+        """
+        Return the event reference at ``idx``.
+
+        :param idx: Event index.
+        :type idx: int
+        :return: Event reference at the specified index.
+        :rtype: EventReference
+        """
+        return self.events[idx]


It is not better to make FlightDataset an iterator. Sometimes, we want to take rows of data based on a series of unsorted indexes, which is not suitable for iterator because it is poor for random access.

Is it correct that you renamed FlightDataset to NasaDataset? Correct me if I am wrong. (If you did, it's something should be annotated when the code review was requested.)

I am not sure you understand what I meant to make NasaDataset to be an iterator: https://docs.python.org/3/glossary.html#term-iterator ?

ThreeMonth03 · 2026-03-24T15:56:02Z

modmesh/track/dataset.py

+    def load(self):
+        """
+        Load all source datasets and build the timestamp timeline.
+
+        :return: None
+        :rtype: None
+        """
+        self.dataframes["imu"] = self._load_dataframe(self.imu_csv)
+        self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv)
+        self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv)
+        self._rebuild_timeline()
+
+    def _load_dataframe(self, path):
+        """
+        Load one CSV file into a time-series dataframe.
+
+        :param path: Path to a source CSV file.
+        :type path: pathlib.Path or str
+        :return: Loaded dataframe for the source file.
+        :rtype: DataFrame
+        """
+        tsdf = dataframe.DataFrame()
+        tsdf.read_from_text_file(
+            path,
+            delimiter=",",
+            timestamp_column="TIME_NANOSECONDS_TAI",
+        )
+        return tsdf


I have removed __load_text() in this pull request becuase I replaced __load_text() with read_from_text_file() to load tabular data, which is developed by Zon-Hang. Furthermore, I've avoided name mangling when creating member functions.

I see. The explanation is clear. It was not clear from the code and you have rewritten the history.

While the code is still being discussed, keeping the history helps tracking the change a little bit. In the end the changesets should be reorganized, so it's not a problem to rebase and squash early. You the author is responsible for helping make the review easy to go.

yungyuc

It's good that we are able to move forward the review again. Points to address:

Use a one liner when you can fit a variable within a 80-char line.
Clarify the design for NasaDataset to support iterator.

yungyuc · 2026-03-25T01:05:53Z

modmesh/track/dataset.py

-        self.download_dir = Path.cwd() / ".cache" / "download"
+        self.download_dir = pathlib.Path.cwd() / ".cache" / "download"
        self.filename = filename
+        self.csv_dir = (


Use a one liner when you can fit a variable within a 80-char line.

Do not abuse multi-line constructs. The abuse makes code harder to read.

There is a very easy idiom to shorten the 16 lines between 159-176 to 5:

_ = self.download_dir = pathlib.Path.cwd() / ".cache" / "download" self.csv_dir = _ / "Flight1_Catered_Dataset-20201013" / "Data" self.imu_csv = self.csv_dir / "dlc.csv" self.lidar_csv = self.csv_dir / "commercial_lidar.csv" self.gt_csv = self.csv_dir / "truth.csv"

yungyuc · 2026-03-25T01:12:01Z

modmesh/track/dataset.py

+    def load(self):
+        """
+        Load all source datasets and build the timestamp timeline.
+
+        :return: None
+        :rtype: None
+        """
+        self.dataframes["imu"] = self._load_dataframe(self.imu_csv)
+        self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv)
+        self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv)
+        self._rebuild_timeline()
+
+    def _load_dataframe(self, path):
+        """
+        Load one CSV file into a time-series dataframe.
+
+        :param path: Path to a source CSV file.
+        :type path: pathlib.Path or str
+        :return: Loaded dataframe for the source file.
+        :rtype: DataFrame
+        """
+        tsdf = dataframe.DataFrame()
+        tsdf.read_from_text_file(
+            path,
+            delimiter=",",
+            timestamp_column="TIME_NANOSECONDS_TAI",
+        )
+        return tsdf


I see. The explanation is clear. It was not clear from the code and you have rewritten the history.

While the code is still being discussed, keeping the history helps tracking the change a little bit. In the end the changesets should be reorganized, so it's not a problem to rebase and squash early. You the author is responsible for helping make the review easy to go.

yungyuc · 2026-03-25T01:16:01Z

modmesh/track/dataset.py

+    def __iter__(self):
+        """
+        Iterate over events in chronological order.
+
+        :return: Iterator over timestamp-ordered event references.
+        :rtype: iter[EventReference]
+        """
+        return iter(self.events)
+
+    def __len__(self):
+        """
+        Return the number of events in the timeline.
+
+        :return: Number of loaded events.
+        :rtype: int
+        """
+        return len(self.events)
+
+    def __getitem__(self, idx):
+        """
+        Return the event reference at ``idx``.
+
+        :param idx: Event index.
+        :type idx: int
+        :return: Event reference at the specified index.
+        :rtype: EventReference
+        """
+        return self.events[idx]


Is it correct that you renamed FlightDataset to NasaDataset? Correct me if I am wrong. (If you did, it's something should be annotated when the code review was requested.)

I am not sure you understand what I meant to make NasaDataset to be an iterator: https://docs.python.org/3/glossary.html#term-iterator ?

yungyuc · 2026-03-25T01:20:47Z

modmesh/track/dataset.py

+            for ref in timeline_map[timestamp]
+        ]
+
+    def __iter__(self):


You already provided __iter__() in NasaDataset. Why not just provide __next__()?

IIRC, Python automatically provides a default pair of __iter__() and __next__() when you implement both __len__() and __getitem__() that takes an integer. If you rely on the automatically provided iterator protocol implementation, you do not need to provide a custom __iter__().

That is, having __iter__() without __next__() should be a mistake. What do you really want to do here?

ThreeMonth03 commented Mar 17, 2026

View reviewed changes

yungyuc requested changes Mar 18, 2026

View reviewed changes

yungyuc changed the title ~~Implement NASA flight data loader.~~ Implement NASA flight data loader Mar 18, 2026

ThreeMonth03 force-pushed the data_processing branch 3 times, most recently from 0ef7fc3 to 022c53d Compare March 20, 2026 16:10

yungyuc assigned ThreeMonth03 Mar 21, 2026

yungyuc added the array Multi-dimensional array implementation label Mar 21, 2026

yungyuc added this to tabular data processing Mar 21, 2026

yungyuc moved this to In Progress in tabular data processing Mar 21, 2026

yungyuc mentioned this pull request Mar 21, 2026

Estimate the position and attitude of NASA flight using Kalman Filter #685

Open

4 tasks

ThreeMonth03 force-pushed the data_processing branch 4 times, most recently from fd28592 to 88d49f0 Compare March 22, 2026 05:31

ThreeMonth03 commented Mar 22, 2026

View reviewed changes

ThreeMonth03 requested a review from yungyuc March 22, 2026 05:32

yungyuc requested changes Mar 22, 2026

View reviewed changes

yungyuc reviewed Mar 22, 2026

View reviewed changes

modmesh/track/dataframe.py

Copy link

Member

yungyuc Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeseries_dataframe is too long a module name. Name it as dataframe.

ThreeMonth03 force-pushed the data_processing branch 3 times, most recently from 74aaa9e to 6b5c6e3 Compare March 22, 2026 10:13

ThreeMonth03 commented Mar 22, 2026

View reviewed changes

ThreeMonth03 force-pushed the data_processing branch from 6b5c6e3 to eca4ec8 Compare March 22, 2026 10:35

yungyuc requested changes Mar 22, 2026

View reviewed changes

ThreeMonth03 force-pushed the data_processing branch from eca4ec8 to 8661708 Compare March 22, 2026 13:16

ThreeMonth03 commented Mar 22, 2026

View reviewed changes

ThreeMonth03 requested a review from yungyuc March 22, 2026 13:23

yungyuc requested changes Mar 22, 2026

View reviewed changes

Implement NASA flight data loader.

5413205

ThreeMonth03 force-pushed the data_processing branch from 8661708 to 5413205 Compare March 22, 2026 14:22

ThreeMonth03 mentioned this pull request Mar 22, 2026

Regulate import style in python #701

Open

ThreeMonth03 commented Mar 22, 2026

View reviewed changes

ThreeMonth03 requested a review from yungyuc March 22, 2026 15:00

ThreeMonth03 commented Mar 24, 2026

View reviewed changes

yungyuc requested changes Mar 25, 2026

View reviewed changes

Conversation

ThreeMonth03 commented Mar 17, 2026

Uh oh!

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yungyuc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yungyuc commented Mar 20, 2026

Uh oh!

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yungyuc commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yungyuc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

yungyuc left a comment •

edited

Loading

yungyuc commented Mar 22, 2026 •

edited

Loading

yungyuc left a comment •

edited

Loading

yungyuc Mar 22, 2026 •

edited

Loading

yungyuc left a comment •

edited

Loading

yungyuc Mar 22, 2026 •

edited

Loading