As it currently stands, the data is spread out across multiple folders in JSON format in the following structure:
out_dir
└── user
└── spec_id
└── key
└── {start_ts}_{end_ts}.json
Note that all of the PhoneView-related data is in the following format:
{
"_id": {
"$oid": "..."
},
"metadata": {
"key": "...",
"platform": "...",
"read_ts": ...,
"time_zone": "...",
"type": "...",
"write_ts": ...
},
"user_id": {
"$uuid": "..."
},
"data": {
<keys are dependent on key specified in metadata>
}
}
Thus, I propose a series of Pandas DataFrames with the following columns:
- A DataFrame consisting of metadata
user
spec_id
key
start_ts
end_ts
_id
platform
read_ts
time_zone
type
write_ts
user_id
- A DataFrame for each key. The fields here correspond to the fields in the
data sub-object of the PhoneView data files. For instance, a DataFrame for the background/battery key for android devices can have these columns:
_id
android_health
android_plugged
android_technology
android_temperature
android_voltage
battery_level_pct
battery_status
ts
write_ts
With the amount of data there is, though, Pandas DataFrames might be insufficient and clunky. A SQLite database might be more appropriate, as things such as primary/foreign keys (which _id can take the role of) can be used to better organize the data as a whole.
When all is said and done, we would end up with these tables:
Metadata
BackgroundBattery
BackgroundFilteredLocation
BackgroundLocation
BackgroundMotionActivity
ManualEvaluationTransition
StatemachineTransition
that would encompass all our data.
Thoughts? @shankari
As it currently stands, the data is spread out across multiple folders in JSON format in the following structure:
Note that all of the PhoneView-related data is in the following format:
Thus, I propose a series of Pandas DataFrames with the following columns:
userspec_idkeystart_tsend_ts_idplatformread_tstime_zonetypewrite_tsuser_iddatasub-object of the PhoneView data files. For instance, a DataFrame for thebackground/batterykey forandroiddevices can have these columns:_idandroid_healthandroid_pluggedandroid_technologyandroid_temperatureandroid_voltagebattery_level_pctbattery_statustswrite_tsWith the amount of data there is, though, Pandas DataFrames might be insufficient and clunky. A SQLite database might be more appropriate, as things such as primary/foreign keys (which
_idcan take the role of) can be used to better organize the data as a whole.When all is said and done, we would end up with these tables:
MetadataBackgroundBatteryBackgroundFilteredLocationBackgroundLocationBackgroundMotionActivityManualEvaluationTransitionStatemachineTransitionthat would encompass all our data.
Thoughts? @shankari