Skip to content

c2g-dev/city2graph-case-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Case Study for City2Graph

Liverpool case study for City2Graph.

Case Study of City2Graph

Repository structure

city2graph-case-study
β”œβ”€β”€ configs
β”‚   └── experiment_config.yaml
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ outputs
β”‚   β”‚   β”œβ”€β”€ checkpoints
β”‚   β”‚   β”œβ”€β”€ clusters
β”‚   β”‚   β”œβ”€β”€ embeddings
β”‚   β”‚   β”œβ”€β”€ figures
β”‚   β”‚   └── tables
β”‚   β”œβ”€β”€ processed
β”‚   β”‚   β”œβ”€β”€ features
β”‚   β”‚   β”œβ”€β”€ graphs
β”‚   β”‚   └── isochrones
β”‚   └── raw
β”‚       β”œβ”€β”€ gtfs
β”‚       β”œβ”€β”€ output_area
β”‚       └── overture
β”œβ”€β”€ notebooks
β”‚   β”œβ”€β”€ 01_data_processing.ipynb
β”‚   β”œβ”€β”€ 02_graph_construction.ipynb
β”‚   β”œβ”€β”€ 03_model_training.ipynb
β”‚   β”œβ”€β”€ 04_evaluation.ipynb
β”‚   β”œβ”€β”€ 05_visualization.ipynb
β”‚   └── appendix_evaluation_hdbscan.ipynb
β”œβ”€β”€ notebooks_samples
β”‚   β”œβ”€β”€ data
β”‚   β”œβ”€β”€ morphology.ipynb
β”‚   β”œβ”€β”€ morphology_combined.jpg
β”‚   β”œβ”€β”€ morphology_graph.jpg
β”‚   β”œβ”€β”€ morphology_steps.jpg
β”‚   └── transportation_mobility.ipynb
β”œβ”€β”€ src
β”‚   β”œβ”€β”€ baselines
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── kmeans.py
β”‚   └── models
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ gat_gae.py
β”‚       β”œβ”€β”€ han_gae.py
β”‚       └── utils.py
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ uv.lock
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .python-version
└── README.md

Data (Zenodo)

The full data directory is hosted on Zenodo:

Sato, Y. (2026). Case Study Data for City2Graph: Clustering Urban Functions in Liverpool [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18396286

Download the Zenodo archive and unzip it to the repository root so the data/ directory matches the expected structure.

Models and baselines

  • GATGAE: 2-layer GAT encoder with DistMult structure decoder for the homogeneous contiguity graph.
  • HANGAE: 2-layer HAN encoder with semantic attention across metapaths, DistMult per relation.
  • run_kmeans: K-Means clustering for embeddings and baseline feature clustering.

Quickstart (notebooks)

  1. Prepare for the data in data/
  2. Run notebooks/01_data_processing.ipynb

fig8-1_land_use fig8-2_poi

  1. Run notebooks/02_graph_construction.ipynb / noteboosk/05_visualization.ipynb

fig9_liverpool_contig

fig10_liverpool_metapaths

  1. Run notebooks/03_model_training.ipynb
model-1 model-2 model-3 model-4
model-1s model-2s model-3s model-4s
  1. Run notebooks/04_evaluation.ipynb
clusterssimilarity

isochrones

Outputs

Results (embeddings, clusters, tables, and figures) are written under data/outputs/.

Reproducibility note

This case study uses uv for dependency management and environment reproducibility.

  • Dependency specification: pyproject.toml
  • Resolved, reproducible lockfile: uv.lock
  • Python version pin: .python-version (3.12.8)

To reproduce the exact environment from this repository:

uv sync

To verify installed package versions in the uv environment:

uv run python - <<'PY'
from importlib.metadata import version

packages = [
  "city2graph",
  "contextily",
  "geopandas",
  "hdbscan",
  "ipykernel",
  "jupyter",
  "mapclassify",
  "matplotlib",
  "matplotlib-scalebar",
  "networkx",
  "numpy",
  "pandas",
  "PyYAML",
  "scikit-learn",
  "seaborn",
  "splot",
  "torch",
  "torch-geometric",
  "torchaudio",
  "torchvision",
]

for pkg in packages:
  print(f"{pkg}=={version(pkg)}")
PY

This case study was run on a CPU of Apple M2 (ARM) with 16 GB RAM, and CUDA was not used.

Data sources and copyright

Source Data used License / attribution Source URL(s)
Office for National Statistics (ONS) Output Areas (Dec 2021) EW BGC V2 boundaries; Output Areas (Dec 2021) population-weighted centroids V3 Open Government Licence v3.0; Contains OS data Β© Crown copyright and database right 2023 (boundaries). Β© Crown copyright and database right 2024 (centroids). See https://www.ons.gov.uk/methodology/geography/licences. https://geoportal.statistics.gov.uk/datasets/6beafcfd9b9c4c9993a06b6b199d7e6d_0; https://geoportal.statistics.gov.uk/datasets/ons::output-areas-december-2021-ew-population-weighted-centroids-v3
Overture Maps Foundation Places (POIs), Base (land_use), Transportation (segment + connector), release 2025-12-17.0 Β© OpenStreetMap contributors, Overture Maps Foundation. Accessed on Janurary 28th, 2026. See https://docs.overturemaps.org/attribution/. https://overturemaps.org
UK Department for Transport (DfT) Bus Open Data (GTFS timetables), North West feed (accessed Dec 10, 2025) Open Government Licence v3.0; Β© Crown copyright. See https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. https://findtransportdata.dft.gov.uk/dataset/bus-open-data---download-all-timetable-data--18335fb19c4
Metropolitan Transportation Authority (MTA) GTFS schedules for NYC Subway (used in notebook samples) Use is subject to MTA data feed terms and conditions. See https://www.mta.info/developers/terms-and-conditions https://www.mta.info/developers
NY Open Data MTA Subway Origin–Destination Ridership Estimate: Beginning 2025 (used in notebook samples) Attribution in dataset metadata: β€œMetropolitan Transportation Authority”, with attribution link https://www.mta.info/open-data. https://data.ny.gov/Transportation/MTA-Subway-Origin-Destination-Ridership-Estimate-B/y2qv-fytt

About

The case study of City2Graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors