Implement NASA flight data loader#696
Conversation
ThreeMonth03
left a comment
There was a problem hiding this comment.
@yungyuc Please review this pull request when you are available. Thanks.
modmesh/track/load.py
Outdated
| @dataclass | ||
| class GroundTruthData: | ||
| """ | ||
| Data structure for ground truth data. | ||
| """ | ||
| con_pos_in_ecef: np.ndarray | ||
| con_vel_in_ecef: np.ndarray | ||
| quat_con_ecef: np.ndarray | ||
|
|
||
|
|
||
| @dataclass | ||
| class IMUData: | ||
| """ | ||
| Data structure for IMU data. | ||
| """ | ||
| dvel_in_imu: np.ndarray | ||
| dangle_in_imu: np.ndarray | ||
|
|
||
|
|
||
| @dataclass | ||
| class LidarData: | ||
| """ | ||
| Data structure for Lidar data. | ||
| """ | ||
| range_m: np.ndarray | ||
| doppler_speed_mps: np.ndarray |
There was a problem hiding this comment.
Data structure from different sensors.
modmesh/track/load.py
Outdated
| @dataclass(order=True) | ||
| class SensorEvent: | ||
| """ | ||
| A generic sensor event with a timestamp, | ||
| source identifier, and data payload. | ||
| """ | ||
| timestamp: float | ||
| source: str = field(compare=False) | ||
| data: Any = field(compare=False) |
There was a problem hiding this comment.
A generic sensor event with a timestamp,
source identifier, and data payload.
modmesh/track/load.py
Outdated
| def load(self): | ||
| """ | ||
| Load imu, lidar, and ground-truth data, | ||
| then sort events by timestamp. | ||
| """ | ||
| self.load_imu() | ||
| self.load_lidar() | ||
| self.load_ground_truth() | ||
| self.events.sort() | ||
|
|
There was a problem hiding this comment.
When calling this function, this function will load data from different files, then sort the data based on the order of timestamps.
modmesh/track/load.py
Outdated
| def main(): | ||
| dataset = FlightDataset() | ||
| dataset.load() | ||
| print(f"Loaded {len(dataset)} events.") | ||
| it = dataset.iter_events() | ||
| for i in range(5): | ||
| event = next(it) | ||
| print(f"Event {i}:") | ||
| print(f" timestamp={event.timestamp}") | ||
| print(f" source={event.source}") | ||
| print(f" data={event.data}") | ||
| print() |
There was a problem hiding this comment.
A simple usage to load dataset by iterator.
There was a problem hiding this comment.
Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?
There was a problem hiding this comment.
There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.
Points to address:
- Clarify why the CLI is ever needed.
- Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
- Do not directly import module content.
- Clarify if it is better to make
FlightDatasetan iterator. - Name mangling is not necessary. Use
_load_text()instead.
The code quality is not very good. We could need more discussions.
modmesh/track/load.py
Outdated
| from dataclasses import dataclass, field | ||
| import numpy as np | ||
| from pathlib import Path | ||
| from typing import Any, Iterable, Optional |
There was a problem hiding this comment.
Do not directly import module content. That is, do not from ... import .... Use only import ... for all modules and entities.
import typing, and then consume the contents like typing.Any, instead of from typing import Any; Any. The former removes the module name at the consuming site.
modmesh/track/load.py
Outdated
| dataset = FlightDataset() | ||
| dataset.load() | ||
| print(f"Loaded {len(dataset)} events.") | ||
| it = dataset.iter_events() |
There was a problem hiding this comment.
What is preventing you from using enumerate()?
for i, event in enumerate(dataset):
if i > 5:
breakMaking dataset an iterator is more straigh-forward.
There was a problem hiding this comment.
There are around 130000 event data points, I'm not sure whether it is a good idea without iterator.
There was a problem hiding this comment.
enumerate() uses the iterator protocol as well.
modmesh/track/load.py
Outdated
| data: Any = field(compare=False) | ||
|
|
||
|
|
||
| class FlightDataset: |
There was a problem hiding this comment.
It is not a good idea for high-performance computing to rely on looping in Python. As a prototype, it's fine. But eventually you will need to get this thing into C++ for speed and saving memory.
modmesh/track/load.py
Outdated
| """ | ||
| return self.events[idx] | ||
|
|
||
| def iter_events(self, sources: Optional[str | Iterable[str]] = None): |
There was a problem hiding this comment.
Typing information in Python will get in your way when moving the code into C++.
Let Python be dynamic. If you need typing information, it is a sign that the code needs to be in C++ soon.
modmesh/track/load.py
Outdated
| Iterate over events of one or more sources. | ||
|
|
||
| :param sources: Event source(s). If None, iterate over all events. | ||
| Sources can be a single source string or an iterable of source strings. |
There was a problem hiding this comment.
Sphinx docstring format is off.
modmesh/track/load.py
Outdated
| """ | ||
| if sources is None: | ||
| yield from self.events | ||
| return |
There was a problem hiding this comment.
Why do you return after yeild? Is it ever reaching return?
modmesh/track/load.py
Outdated
| Load imu data. | ||
| """ | ||
| data = self.__load_text(self.imu_csv) | ||
| for row in data: |
There was a problem hiding this comment.
May be very slow compare to C++. I would guess 20-100x.
modmesh/track/load.py
Outdated
| ) | ||
| self.events.append(event) | ||
|
|
||
| def __load_text(self, path): |
There was a problem hiding this comment.
Name mangling is not necessary. Use _load_text() instead.
modmesh/track/load.py
Outdated
| def main(): | ||
| dataset = FlightDataset() | ||
| dataset.load() | ||
| print(f"Loaded {len(dataset)} events.") | ||
| it = dataset.iter_events() | ||
| for i in range(5): | ||
| event = next(it) | ||
| print(f"Event {i}:") | ||
| print(f" timestamp={event.timestamp}") | ||
| print(f" source={event.source}") | ||
| print(f" data={event.data}") | ||
| print() |
There was a problem hiding this comment.
Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?
|
@ThreeMonth03 update? |
0ef7fc3 to
022c53d
Compare
fd28592 to
88d49f0
Compare
ThreeMonth03
left a comment
There was a problem hiding this comment.
@yungyuc Please review this pull request.
|
|
||
| class TimeSeriesDataFrameTC(unittest.TestCase): | ||
|
|
||
| col_sol = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'DELTA_VEL[3]'] | ||
| col_sol2 = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'EPOCH', 'DELTA_VEL[3]'] | ||
|
|
||
| dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3] | ||
| 1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625 | ||
| 1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125 | ||
| 1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875 | ||
| 1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625 | ||
| 1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625 | ||
| 1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125 | ||
| 1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125 | ||
| 1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375 | ||
| 1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125 | ||
| 1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625 | ||
| """ | ||
| modified_dlc_data = """DELTA_VEL[1] ,DELTA_VEL[2] ,EPOCH ,DELTA_VEL[3] | ||
| -0.18792724609375,-0.00048828125,1602596010229299968,-0.0478515625 | ||
| -0.1903076171875,-0.0009765625,1602596010249309952,-0.0489501953125 | ||
| -0.18743896484375,0.0006103515625,1602596010269309952,-0.0498046875 | ||
| -0.18927001953125,-0.0009765625,1602596010289319936,-0.04840087890625 | ||
| -0.188720703125,-0.00103759765625,1602596010309309952,-0.0504150390625 | ||
| -0.18951416015625,-0.000732421875,1602596010329309952,-0.0489501953125 | ||
| -0.18902587890625,-0.000732421875,1602596010349309952,-0.0489501953125 | ||
| -0.1895751953125,-0.00128173828125,1602596010369299968,-0.04925537109375 | ||
| -0.18841552734375,6.103515625e-05,1602596010389309952,-0.0489501953125 | ||
| -0.1884765625,-0.00042724609375,1602596010409309952,-0.04840087890625 | ||
| """ | ||
| unsorted_dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3] | ||
| 1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125 | ||
| 1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125 | ||
| 1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625 | ||
| 1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125 | ||
| 1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625 | ||
| 1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375 | ||
| 1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625 | ||
| 1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875 | ||
| 1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625 | ||
| 1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125 | ||
| """ | ||
|
|
||
| def test_read_from_text_file_basic(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
|
|
||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
| self.assertEqual(tsdf._columns, self.col_sol) | ||
| self.assertEqual(len(tsdf._columns), 3) | ||
| for i in range(len(tsdf._columns)): | ||
| self.assertEqual(tsdf._data[i].ndarray.shape[0], 10) | ||
| self.assertEqual(tsdf._index_name, 'EPOCH') | ||
|
|
||
| tsdf.read_from_text_file( | ||
| io.StringIO(self.modified_dlc_data), | ||
| delimiter=',', | ||
| timestamp_column='EPOCH' | ||
| ) | ||
|
|
||
| self.assertEqual(tsdf._columns, self.col_sol) | ||
| self.assertEqual(len(tsdf._columns), 3) | ||
| for i in range(len(tsdf._columns)): | ||
| self.assertEqual(tsdf._data[i].ndarray.shape[0], 10) | ||
| self.assertEqual(tsdf._index_name, 'EPOCH') | ||
|
|
||
| tsdf.read_from_text_file( | ||
| io.StringIO(self.modified_dlc_data), | ||
| delimiter=',', | ||
| timestamp_in_file=False | ||
| ) | ||
| self.assertEqual(tsdf._columns, self.col_sol2) | ||
| self.assertEqual(len(tsdf._columns), 4) | ||
| for i in range(len(tsdf._columns)): | ||
| self.assertEqual(tsdf._data[i].ndarray.shape[0], 10) | ||
| self.assertEqual(tsdf._index_name, 'Index') | ||
|
|
||
| def test_dataframe_attribute_columns(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
| self.assertEqual(tsdf.columns, self.col_sol) | ||
|
|
||
| def test_dataframe_attribute_shape(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
| self.assertEqual(tsdf.shape, (10, 3)) | ||
|
|
||
| def test_dataframe_attribute_index(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
|
|
||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
|
|
||
| self.assertEqual( | ||
| list(tsdf.index), list(nd_arr[:, 0].astype(np.uint64)) | ||
| ) | ||
|
|
||
| def test_dataframe_get_column(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
|
|
||
| col_data = tsdf['DELTA_VEL[1]'] | ||
|
|
||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
|
|
||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| def test_dataframe_sort(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data)) | ||
|
|
||
| # Test out-of-place sort | ||
| reordered_tsdf = tsdf.sort(tsdf.columns, index_column=None, | ||
| inplace=False) | ||
| col_data = reordered_tsdf['DELTA_VEL[1]'] | ||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| # Test inplace sort_by_index | ||
| tsdf.sort_by_index() | ||
| col_data = tsdf['DELTA_VEL[1]'] | ||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| # Test out-of-place sort with index_column | ||
| tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data), | ||
| timestamp_in_file=False) | ||
|
|
||
| reordered_tsdf = tsdf.sort(['EPOCH', 'DELTA_VEL[1]'], | ||
| index_column='EPOCH', inplace=False) | ||
| col_data = reordered_tsdf['DELTA_VEL[1]'] | ||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| # vim: set ff=unix fenc=utf8 et sw=4 ts=4 sts=4: |
There was a problem hiding this comment.
Move unittest from tests/test_timeseries_dataframe.py to tests/test_track.py.
There was a problem hiding this comment.
Moving the original test file is refactoring. Do not mix refactoring with adding features in the same PR. Revert the change to the original test file.
Add new testing code in a new file.
|
|
||
| import os | ||
| import contextlib | ||
| from io import StringIO | ||
| import unittest | ||
|
|
||
| import numpy as np | ||
|
|
||
| from modmesh import SimpleArrayUint64, SimpleArrayFloat64 | ||
|
|
||
| all = ['TimeSeriesDataFrame'] | ||
|
|
||
|
|
||
| class TimeSeriesDataFrame(object): | ||
|
|
There was a problem hiding this comment.
Move tests/test_timeseries_dataframe.py to modmesh/track/timeseries_dataframe.py
modmesh/track/load.py
Outdated
| def load(self): | ||
| """ | ||
| Load all source datasets and build the timestamp timeline. | ||
| """ | ||
| self.dataframes["imu"] = self._load_dataframe(self.imu_csv) | ||
| self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv) | ||
| self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv) | ||
| self._rebuild_timeline() |
There was a problem hiding this comment.
load data with timeseries_dataframe().
modmesh/track/load.py
Outdated
| def _rebuild_timeline(self): | ||
| """ | ||
| Rebuild the timestamp timeline from loaded source dataframes. | ||
| """ | ||
| timeline_map: dict[int, list[EventReference]] = {} | ||
| for source, dataframe in self.dataframes.items(): | ||
| for row_index, timestamp in enumerate(dataframe.index): | ||
| timestamp = int(timestamp) | ||
| timeline_map.setdefault(timestamp, []).append( | ||
| EventReference( | ||
| dataset=self, | ||
| timestamp=timestamp, | ||
| source=source, | ||
| row=row_index, | ||
| ) | ||
| ) | ||
|
|
||
| self.events = [ | ||
| ref | ||
| for timestamp in sorted(timeline_map) | ||
| for ref in timeline_map[timestamp] | ||
| ] |
There was a problem hiding this comment.
Build a sorted list to record the order of timestamps.
modmesh/track/load.py
Outdated
| @property | ||
| def data(self): | ||
| """ | ||
| Return a lazy view of the original row data. | ||
|
|
||
| :return: Lazy row view backed by the source dataframe. | ||
| :rtype: _EventDataView | ||
| """ | ||
| return _EventDataView(self.dataset.dataframes[self.source], self.row) |
There was a problem hiding this comment.
Get data by _EventDataView.
I am not seeing your annotations to explain how you addressed the points I made. Please help my review by leaving annotations where you address the points. Without your annotations, it will take a lot of time to review. It wouldn't take unreasonable amount of time for the author to explain how the points are addressed with the annotations.
Additional point:
|
modmesh/track/load.py
Outdated
| import dataclasses | ||
| import pathlib | ||
|
|
||
| from .timeseries_dataframe import TimeSeriesDataFrame |
There was a problem hiding this comment.
Do not import module contents. Import module itself and use the contents from the module like import timeseries_dataframe; timeseries_datafrom.TimeSeriesDataFrame.
There was a problem hiding this comment.
timeseries_dataframe is too long a module name. Name it as dataframe.
74aaa9e to
6b5c6e3
Compare
ThreeMonth03
left a comment
There was a problem hiding this comment.
- Clarify why the CLI is ever needed.
We don't need CLI because we have demonstrate the usage of dataset in unittest. - Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
- Do not directly import module content.
- Clarify if it is better to make
FlightDatasetan iterator.
No, the data is not large. Therefore, I implement__getitem__method. - Name mangling is not necessary. Use
_load_text()instead. - Rename the overly-long module name
timeseries_dataframetodataframe.
@yungyuc Please review this pr. Thanks.
| def _write_csv(self, directory, name, content): | ||
| path = pathlib.Path(directory) / name | ||
| path.write_text(content, encoding="utf-8") | ||
| return str(path) | ||
|
|
||
| def test_load_flight_dataset(self): | ||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| imu_data = ( | ||
| "TIME_NANOSECONDS_TAI ,DATA_DELTA_VEL[1] ," | ||
| "DATA_DELTA_VEL[2] ,DATA_DELTA_VEL[3] ," | ||
| "DATA_DELTA_ANGLE[1] ,DATA_DELTA_ANGLE[2] ," | ||
| "DATA_DELTA_ANGLE[3]\n" | ||
| "30,3.0,3.1,3.2,30.0,30.1,30.2\n" | ||
| "10,1.0,1.1,1.2,10.0,10.1,10.2\n" | ||
| ) |
There was a problem hiding this comment.
Add unittest for data loader.
| import dataclasses | ||
| import json | ||
| import ssl | ||
| import pathlib |
There was a problem hiding this comment.
Import module undirectly.
| def _load_dataframe(self, path): | ||
| """ | ||
| Load one CSV file into a time-series dataframe. | ||
|
|
||
| :param path: Path to a source CSV file. | ||
| :type path: pathlib.Path or str | ||
| :return: Loaded dataframe for the source file. | ||
| :rtype: DataFrame | ||
| """ | ||
| tsdf = DataFrame() | ||
| tsdf.read_from_text_file( | ||
| path, | ||
| delimiter=",", | ||
| timestamp_column="TIME_NANOSECONDS_TAI", | ||
| ) | ||
| return tsdf |
There was a problem hiding this comment.
Remove dunder naming.
|
|
||
| class TimeSeriesDataFrame(object): | ||
|
|
||
| class DataFrame(object): |
There was a problem hiding this comment.
Rename the long module.
6b5c6e3 to
eca4ec8
Compare
There was a problem hiding this comment.
Please pay attention to the review comments. I believe I already mentioned the following point for 2 times in this PR, and you still made it wrong (this is the third time):
- Use relative import for importing modules in the project.
This PR is to add new feature. Do not include refactoring work.
- Add new testing code in a new file.
- I do not see
_load_text(). Clarify where it is.
modmesh/track/dataset.py
Outdated
| import zipfile | ||
|
|
||
| from pathlib import Path | ||
| from modmesh.track.dataframe import DataFrame |
There was a problem hiding this comment.
I think it's the third time I am making the comment in this PR: do not directly import module content. Import the module, and call the module content.
And you should use relative import for importing modules in the project.
If you are not familiar with PEP-8, get familiar with it.
|
|
||
| class TimeSeriesDataFrameTC(unittest.TestCase): | ||
|
|
||
| col_sol = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'DELTA_VEL[3]'] | ||
| col_sol2 = ['DELTA_VEL[1]', 'DELTA_VEL[2]', 'EPOCH', 'DELTA_VEL[3]'] | ||
|
|
||
| dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3] | ||
| 1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625 | ||
| 1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125 | ||
| 1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875 | ||
| 1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625 | ||
| 1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625 | ||
| 1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125 | ||
| 1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125 | ||
| 1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375 | ||
| 1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125 | ||
| 1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625 | ||
| """ | ||
| modified_dlc_data = """DELTA_VEL[1] ,DELTA_VEL[2] ,EPOCH ,DELTA_VEL[3] | ||
| -0.18792724609375,-0.00048828125,1602596010229299968,-0.0478515625 | ||
| -0.1903076171875,-0.0009765625,1602596010249309952,-0.0489501953125 | ||
| -0.18743896484375,0.0006103515625,1602596010269309952,-0.0498046875 | ||
| -0.18927001953125,-0.0009765625,1602596010289319936,-0.04840087890625 | ||
| -0.188720703125,-0.00103759765625,1602596010309309952,-0.0504150390625 | ||
| -0.18951416015625,-0.000732421875,1602596010329309952,-0.0489501953125 | ||
| -0.18902587890625,-0.000732421875,1602596010349309952,-0.0489501953125 | ||
| -0.1895751953125,-0.00128173828125,1602596010369299968,-0.04925537109375 | ||
| -0.18841552734375,6.103515625e-05,1602596010389309952,-0.0489501953125 | ||
| -0.1884765625,-0.00042724609375,1602596010409309952,-0.04840087890625 | ||
| """ | ||
| unsorted_dlc_data = """EPOCH ,DELTA_VEL[1] ,DELTA_VEL[2] ,DELTA_VEL[3] | ||
| 1.60259601024931e+18,-0.1903076171875,-0.0009765625,-0.0489501953125 | ||
| 1.60259601034931e+18,-0.18902587890625,-0.000732421875,-0.0489501953125 | ||
| 1.60259601040931e+18,-0.1884765625,-0.00042724609375,-0.04840087890625 | ||
| 1.60259601032931e+18,-0.18951416015625,-0.000732421875,-0.0489501953125 | ||
| 1.6025960102293e+18,-0.18792724609375,-0.00048828125,-0.0478515625 | ||
| 1.6025960103693e+18,-0.1895751953125,-0.00128173828125,-0.04925537109375 | ||
| 1.60259601030931e+18,-0.188720703125,-0.00103759765625,-0.0504150390625 | ||
| 1.60259601026931e+18,-0.18743896484375,0.0006103515625,-0.0498046875 | ||
| 1.60259601028932e+18,-0.18927001953125,-0.0009765625,-0.04840087890625 | ||
| 1.60259601038931e+18,-0.18841552734375,6.103515625e-05,-0.0489501953125 | ||
| """ | ||
|
|
||
| def test_read_from_text_file_basic(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
|
|
||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
| self.assertEqual(tsdf._columns, self.col_sol) | ||
| self.assertEqual(len(tsdf._columns), 3) | ||
| for i in range(len(tsdf._columns)): | ||
| self.assertEqual(tsdf._data[i].ndarray.shape[0], 10) | ||
| self.assertEqual(tsdf._index_name, 'EPOCH') | ||
|
|
||
| tsdf.read_from_text_file( | ||
| io.StringIO(self.modified_dlc_data), | ||
| delimiter=',', | ||
| timestamp_column='EPOCH' | ||
| ) | ||
|
|
||
| self.assertEqual(tsdf._columns, self.col_sol) | ||
| self.assertEqual(len(tsdf._columns), 3) | ||
| for i in range(len(tsdf._columns)): | ||
| self.assertEqual(tsdf._data[i].ndarray.shape[0], 10) | ||
| self.assertEqual(tsdf._index_name, 'EPOCH') | ||
|
|
||
| tsdf.read_from_text_file( | ||
| io.StringIO(self.modified_dlc_data), | ||
| delimiter=',', | ||
| timestamp_in_file=False | ||
| ) | ||
| self.assertEqual(tsdf._columns, self.col_sol2) | ||
| self.assertEqual(len(tsdf._columns), 4) | ||
| for i in range(len(tsdf._columns)): | ||
| self.assertEqual(tsdf._data[i].ndarray.shape[0], 10) | ||
| self.assertEqual(tsdf._index_name, 'Index') | ||
|
|
||
| def test_dataframe_attribute_columns(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
| self.assertEqual(tsdf.columns, self.col_sol) | ||
|
|
||
| def test_dataframe_attribute_shape(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
| self.assertEqual(tsdf.shape, (10, 3)) | ||
|
|
||
| def test_dataframe_attribute_index(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
|
|
||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
|
|
||
| self.assertEqual( | ||
| list(tsdf.index), list(nd_arr[:, 0].astype(np.uint64)) | ||
| ) | ||
|
|
||
| def test_dataframe_get_column(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.dlc_data)) | ||
|
|
||
| col_data = tsdf['DELTA_VEL[1]'] | ||
|
|
||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
|
|
||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| def test_dataframe_sort(self): | ||
| tsdf = TimeSeriesDataFrame() | ||
| tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data)) | ||
|
|
||
| # Test out-of-place sort | ||
| reordered_tsdf = tsdf.sort(tsdf.columns, index_column=None, | ||
| inplace=False) | ||
| col_data = reordered_tsdf['DELTA_VEL[1]'] | ||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| # Test inplace sort_by_index | ||
| tsdf.sort_by_index() | ||
| col_data = tsdf['DELTA_VEL[1]'] | ||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| # Test out-of-place sort with index_column | ||
| tsdf.read_from_text_file(io.StringIO(self.unsorted_dlc_data), | ||
| timestamp_in_file=False) | ||
|
|
||
| reordered_tsdf = tsdf.sort(['EPOCH', 'DELTA_VEL[1]'], | ||
| index_column='EPOCH', inplace=False) | ||
| col_data = reordered_tsdf['DELTA_VEL[1]'] | ||
| nd_arr = np.genfromtxt(io.StringIO(self.dlc_data), delimiter=',')[1:] | ||
| self.assertEqual(list(col_data), list(nd_arr[:, 1])) | ||
|
|
||
| # vim: set ff=unix fenc=utf8 et sw=4 ts=4 sts=4: |
There was a problem hiding this comment.
Moving the original test file is refactoring. Do not mix refactoring with adding features in the same PR. Revert the change to the original test file.
Add new testing code in a new file.
eca4ec8 to
8661708
Compare
ThreeMonth03
left a comment
There was a problem hiding this comment.
- Use relative import for importing modules in the project.
- Add new testing code in a new file.
- I do not see
_load_text(). Clarify where it is.
It has been removed becauseNasaDataset.load()loads data bymodmesh.track.dataframenow, which is developed by Zong-han.
@yungyuc Please review this pr again. Thanks.
| import zipfile | ||
|
|
||
| from pathlib import Path | ||
| from . import dataframe |
There was a problem hiding this comment.
Use relative import to import module.
tests/test_track.py
Outdated
| import tempfile | ||
| import unittest | ||
|
|
||
| import modmesh.track.dataset as dataset |
There was a problem hiding this comment.
Avoid import module content and add new unitttest in newfile.
There was a problem hiding this comment.
This is bad Python code. Change it to the relative import idiomatic Python:
from modmesh.track import dataset| from modmesh import SimpleArrayUint64, SimpleArrayFloat64 | ||
|
|
||
|
|
||
| class TimeSeriesDataFrame(object): | ||
|
|
||
| def __init__(self): | ||
| self._init_members() | ||
|
|
||
| def _init_members(self): | ||
| self._columns = list() | ||
| self._index_data = None | ||
| self._index_name = None | ||
| self._data = list() |
There was a problem hiding this comment.
Move dataframe to modmesh/track/dataframe.py.
There was a problem hiding this comment.
I found myself keeping to repeat the same points and lost counts for them. @ThreeMonth03 , I hope I will not need to repeat it for another time.
- Clarify if it is better to make
FlightDatasetan iterator.- Name mangling is not necessary. Use
_load_text()instead.
- I do not see
_load_text(). Clarify where it is.
Addition point to address:
- Use
relative importPythonicfrom modmesh.track import dataset
tests/test_track.py
Outdated
| import tempfile | ||
| import unittest | ||
|
|
||
| import modmesh.track.dataset as dataset |
There was a problem hiding this comment.
This is bad Python code. Change it to the relative import idiomatic Python:
from modmesh.track import dataset8661708 to
5413205
Compare
| import tempfile | ||
| import unittest | ||
|
|
||
| from modmesh.track import dataset |
There was a problem hiding this comment.
Import module with pythonic statement.
| True) | ||
| """ | ||
| return self.sort(columns=columns, index_column=None, inplace=inplace) | ||
| from modmesh.track import dataframe |
There was a problem hiding this comment.
Import module with pythonic statement.
@ThreeMonth03 you have not addressed the above points and made them clear with inline annotations. Please read my point carefully and address them. I really hope that I do not need to repeat them. |
ThreeMonth03
left a comment
There was a problem hiding this comment.
- Clarify if it is better to make
FlightDatasetan iterator. - Name mangling is not necessary. Use
_load_text()instead. - I do not see
_load_text(). Clarify where it is.
@yungyuc I think I have further explained the points in the inline annotations this time.
| def __iter__(self): | ||
| """ | ||
| Iterate over events in chronological order. | ||
|
|
||
| :return: Iterator over timestamp-ordered event references. | ||
| :rtype: iter[EventReference] | ||
| """ | ||
| return iter(self.events) | ||
|
|
||
| def __len__(self): | ||
| """ | ||
| Return the number of events in the timeline. | ||
|
|
||
| :return: Number of loaded events. | ||
| :rtype: int | ||
| """ | ||
| return len(self.events) | ||
|
|
||
| def __getitem__(self, idx): | ||
| """ | ||
| Return the event reference at ``idx``. | ||
|
|
||
| :param idx: Event index. | ||
| :type idx: int | ||
| :return: Event reference at the specified index. | ||
| :rtype: EventReference | ||
| """ | ||
| return self.events[idx] |
There was a problem hiding this comment.
It is not better to make FlightDataset an iterator. Sometimes, we want to take rows of data based on a series of unsorted indexes, which is not suitable for iterator because it is poor for random access.
There was a problem hiding this comment.
Is it correct that you renamed FlightDataset to NasaDataset? Correct me if I am wrong. (If you did, it's something should be annotated when the code review was requested.)
I am not sure you understand what I meant to make NasaDataset to be an iterator: https://docs.python.org/3/glossary.html#term-iterator ?
| def load(self): | ||
| """ | ||
| Load all source datasets and build the timestamp timeline. | ||
|
|
||
| :return: None | ||
| :rtype: None | ||
| """ | ||
| self.dataframes["imu"] = self._load_dataframe(self.imu_csv) | ||
| self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv) | ||
| self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv) | ||
| self._rebuild_timeline() | ||
|
|
||
| def _load_dataframe(self, path): | ||
| """ | ||
| Load one CSV file into a time-series dataframe. | ||
|
|
||
| :param path: Path to a source CSV file. | ||
| :type path: pathlib.Path or str | ||
| :return: Loaded dataframe for the source file. | ||
| :rtype: DataFrame | ||
| """ | ||
| tsdf = dataframe.DataFrame() | ||
| tsdf.read_from_text_file( | ||
| path, | ||
| delimiter=",", | ||
| timestamp_column="TIME_NANOSECONDS_TAI", | ||
| ) | ||
| return tsdf |
There was a problem hiding this comment.
I have removed __load_text() in this pull request becuase I replaced __load_text() with read_from_text_file() to load tabular data, which is developed by Zon-Hang. Furthermore, I've avoided name mangling when creating member functions.
There was a problem hiding this comment.
I see. The explanation is clear. It was not clear from the code and you have rewritten the history.
While the code is still being discussed, keeping the history helps tracking the change a little bit. In the end the changesets should be reorganized, so it's not a problem to rebase and squash early. You the author is responsible for helping make the review easy to go.
yungyuc
left a comment
There was a problem hiding this comment.
It's good that we are able to move forward the review again. Points to address:
- Use a one liner when you can fit a variable within a 80-char line.
- Clarify the design for
NasaDatasetto support iterator.
| self.download_dir = Path.cwd() / ".cache" / "download" | ||
| self.download_dir = pathlib.Path.cwd() / ".cache" / "download" | ||
| self.filename = filename | ||
| self.csv_dir = ( |
There was a problem hiding this comment.
Use a one liner when you can fit a variable within a 80-char line.
Do not abuse multi-line constructs. The abuse makes code harder to read.
There is a very easy idiom to shorten the 16 lines between 159-176 to 5:
_ = self.download_dir = pathlib.Path.cwd() / ".cache" / "download"
self.csv_dir = _ / "Flight1_Catered_Dataset-20201013" / "Data"
self.imu_csv = self.csv_dir / "dlc.csv"
self.lidar_csv = self.csv_dir / "commercial_lidar.csv"
self.gt_csv = self.csv_dir / "truth.csv"| def load(self): | ||
| """ | ||
| Load all source datasets and build the timestamp timeline. | ||
|
|
||
| :return: None | ||
| :rtype: None | ||
| """ | ||
| self.dataframes["imu"] = self._load_dataframe(self.imu_csv) | ||
| self.dataframes["lidar"] = self._load_dataframe(self.lidar_csv) | ||
| self.dataframes["ground_truth"] = self._load_dataframe(self.gt_csv) | ||
| self._rebuild_timeline() | ||
|
|
||
| def _load_dataframe(self, path): | ||
| """ | ||
| Load one CSV file into a time-series dataframe. | ||
|
|
||
| :param path: Path to a source CSV file. | ||
| :type path: pathlib.Path or str | ||
| :return: Loaded dataframe for the source file. | ||
| :rtype: DataFrame | ||
| """ | ||
| tsdf = dataframe.DataFrame() | ||
| tsdf.read_from_text_file( | ||
| path, | ||
| delimiter=",", | ||
| timestamp_column="TIME_NANOSECONDS_TAI", | ||
| ) | ||
| return tsdf |
There was a problem hiding this comment.
I see. The explanation is clear. It was not clear from the code and you have rewritten the history.
While the code is still being discussed, keeping the history helps tracking the change a little bit. In the end the changesets should be reorganized, so it's not a problem to rebase and squash early. You the author is responsible for helping make the review easy to go.
| def __iter__(self): | ||
| """ | ||
| Iterate over events in chronological order. | ||
|
|
||
| :return: Iterator over timestamp-ordered event references. | ||
| :rtype: iter[EventReference] | ||
| """ | ||
| return iter(self.events) | ||
|
|
||
| def __len__(self): | ||
| """ | ||
| Return the number of events in the timeline. | ||
|
|
||
| :return: Number of loaded events. | ||
| :rtype: int | ||
| """ | ||
| return len(self.events) | ||
|
|
||
| def __getitem__(self, idx): | ||
| """ | ||
| Return the event reference at ``idx``. | ||
|
|
||
| :param idx: Event index. | ||
| :type idx: int | ||
| :return: Event reference at the specified index. | ||
| :rtype: EventReference | ||
| """ | ||
| return self.events[idx] |
There was a problem hiding this comment.
Is it correct that you renamed FlightDataset to NasaDataset? Correct me if I am wrong. (If you did, it's something should be annotated when the code review was requested.)
I am not sure you understand what I meant to make NasaDataset to be an iterator: https://docs.python.org/3/glossary.html#term-iterator ?
| for ref in timeline_map[timestamp] | ||
| ] | ||
|
|
||
| def __iter__(self): |
There was a problem hiding this comment.
You already provided __iter__() in NasaDataset. Why not just provide __next__()?
IIRC, Python automatically provides a default pair of __iter__() and __next__() when you implement both __len__() and __getitem__() that takes an integer. If you rely on the automatically provided iterator protocol implementation, you do not need to provide a custom __iter__().
That is, having __iter__() without __next__() should be a mistake. What do you really want to do here?
I implement the data loader of NASA flight dataset to help us finish the future work in issue #685. As for the usage of data loader, you can see the following command and understand how to get data from a iterator.
(base) threemonth03@threemonthwin:~/Downloads/modmesh$ python3 modmesh/track/load.py Loaded 120791 events. Event 0: timestamp=1.60259601021904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36558835, -5268929.31643981, 3306577.65409484]), con_vel_in_ecef=array([ 0.00220006, -0.0081095 , 0.00414972]), quat_con_ecef=array([-0.45840088, -0.1767584 , 0.51147502, 0.70499532])) Event 1: timestamp=1.60259601022904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36564722, -5268929.31643027, 3306577.65410693]), con_vel_in_ecef=array([ 0.00197762, -0.00815516, 0.00414352]), quat_con_ecef=array([-0.45840118, -0.17675819, 0.51147449, 0.70499557])) Event 2: timestamp=1.6025960102293e+18 source=imu data=IMUData(dvel_in_imu=array([-0.18792725, -0.00048828, -0.04785156]), dangle_in_imu=array([-1.90734863e-06, 3.81469727e-06, 1.90734863e-06])) Event 3: timestamp=1.60259601023904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36570593, -5268929.31642279, 3306577.65411988]), con_vel_in_ecef=array([ 0.00190321, -0.00814863, 0.00428433]), quat_con_ecef=array([-0.45840138, -0.17675799, 0.51147416, 0.70499573])) Event 4: timestamp=1.60259601024904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36576642, -5268929.31641798, 3306577.65413258]), con_vel_in_ecef=array([ 0.00201493, -0.00820102, 0.00423451]), quat_con_ecef=array([-0.45840157, -0.17675801, 0.51147413, 0.70499562]))