Skip to content

feat: add Vietnamese (vi) locale#641

Open
nhannht wants to merge 17 commits intowanasit:masterfrom
nhannht:feat/vi-locale
Open

feat: add Vietnamese (vi) locale#641
nhannht wants to merge 17 commits intowanasit:masterfrom
nhannht:feat/vi-locale

Conversation

@nhannht
Copy link
Copy Markdown

@nhannht nhannht commented Apr 12, 2026

Summary

This PR adds a full Vietnamese (vi) locale to chrono-node, bringing it to parity with the English locale in terms of parser and refiner coverage.

What's included

11 parsers:

  • VIStandardParserngày D tháng M năm YYYY (with/without ngày prefix, optional year)
  • VIMonthYearParsertháng M năm YYYY / tháng M/YYYY
  • VIYearParser — standalone năm YYYY and bare YYYY TCN (BC years, no năm prefix required)
  • VICasualDateParserhôm nay, hôm qua, ngày mai, ngày kia, bây giờ
  • VICasualTimeParsersáng, trưa, chiều, tối, đêm, nửa đêm, bình minh, sáng sớm
  • VITimeExpressionParserlúc/vào 7 giờ 30 phút chiều, 15:30
  • VIWeekdayParserthứ haithứ bảy, chủ nhật with tới/sau/qua modifiers
  • VITimeUnitAgoFormatParser3 ngày trước, 1 tháng qua
  • VITimeUnitLaterFormatParser2 tuần sau, 3 ngày nữa
  • VITimeUnitWithinFormatParsertrong vòng 2 giờ
  • VITimeUnitCasualRelativeFormatParsertuần này/trước/sau, tháng này/trước/sau (number optional — bare unit words supported)

3 refiners:

  • VIMergeDateTimeRefiner — merges date + time results separated by lúc/vào
  • VIMergeDateRangeRefiner — merges date ranges separated by –/đến/tới/và
  • VIMergeWeekdayComponentRefiner — standard weekday merge

16 test files, 95 tests covering standard dates, casual dates/times, time expressions, weekdays, slash dates, date ranges, month/year, year, time units (ago/later/within/casual-relative), strict mode, forward date, negative cases, and isCertain/toBeDate assertions.

Design notes

  • YEAR_PATTERN supports BC years via TCN suffix (trước Công nguyên); VIYearParser also handles bare YYYY TCN without a năm prefix (e.g. "179 TCN")
  • parseYear() calls findMostLikelyADYear() for short years (e.g. 1945 stays 1945, 751975)
  • Meridiem mapping: sáng=AM (12 giờ sáng=midnight), trưa=noon (~11 AM–1 PM), chiều=15h PM, tối=19h PM, đêm=22h PM, nửa đêm=0h AM, bình minh/sáng sớm=6h AM
  • VITimeUnitCasualRelativeFormatParser uses an optional-number pattern so bare unit words (tuần này, tháng trước) match without a numeric prefix; defaults to quantity 1
  • strictMode constructor parameter on all relative unit parsers (consistent with DE/FR)
  • Exports added to src/index.ts

Bug fixes included (found during self-review)

  • VITimeExpressionParser: trưa meridiem was incorrectly grouped with chiều/tối/đêm11 giờ trưa returned 23:00 instead of 11:00 AM. Fixed by separating trưa handling: hour < 10 → PM (+12), hour 10-11 → AM (keep as-is), hour 12 → PM (noon)
  • VITimeExpressionParser: 12 giờ sáng returned noon instead of midnight — added if (hour === 12) hour = 0 guard matching EN convention
  • VITimeExpressionParser: added minute validation — reject minute >= 60 (matching AbstractTimeExpressionParser behavior)
  • VICasualTimeParser: removed redundant \b from PATTERN — JS \b fails on Vietnamese đ (non-ASCII), silently breaking standalone đêm parsing. AbstractParserWithWordBoundaryChecking already provides left-boundary via (\W|^)
  • VIWeekdayParser: modifier group was non-capturing, making qua (last) undetectable; fixed to capturing group; also corrected quả typo → qua
  • VITimeUnitCasualRelativeFormatParser: bare tuần này/tháng trước never matched due to required numeric prefix in TIME_UNITS_PATTERN; replaced with optional-number pattern
  • package-lock.json: reverted to upstream — removed unrelated lockfile drift (dayjs removal, peer dependency changes)

🤖 Generated with Claude Code

@nhannht nhannht marked this pull request as draft April 12, 2026 11:24
@nhannht nhannht marked this pull request as ready for review April 15, 2026 11:33
nhannht added a commit to nhannht/obsidian-historica that referenced this pull request Apr 15, 2026
- vendor/chrono: fork of wanasit/chrono as git submodule on feat/vi-locale
  branch; contains full VI locale (11 parsers, 3 refiners, 41 tests)
- package.json: switch chrono-node to file:./vendor/chrono so VI locale
  is available at runtime
- tsconfig.json: exclude vendor/ from root tsc to prevent test-file errors
- ChronoParser.ts: add setupCustomChronoVi() and vi case in
  getParserForLanguage(); vie already mapped in FRANC_TO_LOCALE

PR to upstream: wanasit/chrono#641

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@wanasit wanasit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. Could you remove some of the changes I mentioned in the comment?

Comment thread test/vi/vi_corpus.test.ts Outdated
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test and its fixtures suit the project.

  • It doesn't test specific patterns we want to support.
  • Ensure that it's 85%+ accuracy is a corpus is not useful, and would be difficult to debug if failed.
  • Embedding a corpus also make the project too large.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback — completely agree on all three points.

Removed test/vi/vi_corpus.test.ts and the entire test/vi/fixtures/ directory (19 Wikipedia articles + curated JSON). The corpus added ~13k lines and wasn't testing specific parser patterns.

The VI locale now has 92 targeted unit tests across 16 suites, covering all 11 parsers and 3 refiners — including strict mode, forwardDate, isCertain() assertions, and negative cases. All tests follow the same testSingleCase/testUnexpectedResult conventions used by the EN locale.

Comment thread tsconfig.build.json Outdated
"outDir": "dist/cjs",
"module": "commonjs"
"module": "commonjs",
"typeRoots": ["./node_modules/@types"]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not modify the compiler option and dependencies in the same commit. If you think this necessary or good for the project, please create a separate CL.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — reverted tsconfig.build.json to match upstream. The typeRoots change was needed to resolve a build issue on my end, but I understand the preference to keep compiler/dependency changes in a separate CL. I'll submit that separately if needed.

nhannht and others added 15 commits April 20, 2026 11:14
Implements full Vietnamese date/time parsing for chrono-node with
EN-locale parity. Parsers cover all major Vietnamese temporal patterns
extracted from 19 Wikipedia war/history articles (1132 annotated fixtures).

Parsers:
- VIStandardParser      — ngày D tháng M năm YYYY / D tháng M năm YYYY
- VIMonthYearParser     — tháng M năm YYYY
- VIYearParser          — năm YYYY, năm N TCN (BC)
- VICasualDateParser    — hôm nay, hôm qua, ngày mai, ngày kia, bây giờ
- VICasualTimeParser    — buổi sáng/trưa/chiều/tối/đêm, nửa đêm
- VIWeekdayParser       — thứ Hai–CN, t2–t7/cn abbreviations
- VITimeExpressionParser — X giờ Y phút, HH:MM, lúc/vào prefixes, meridiem
- VITimeUnitAgoFormatParser     — N ngày/tuần/tháng/năm trước/qua
- VITimeUnitLaterFormatParser   — N ngày/tuần/tháng/năm sau/nữa/tới
- VITimeUnitWithinFormatParser  — trong (vòng) N ngày/tuần/tháng
- VITimeUnitCasualRelativeFormatParser — tuần này/trước/tới, tháng sau

Refiners:
- VIMergeDateTimeRefiner, VIMergeDateRangeRefiner, VIMergeWeekdayComponentRefiner

Common parsers inherited: ISOFormatParser, SlashDateFormatParser (DD/MM/YYYY)

Tests: 10 test files, 41 cases, all passing.
Wikipedia corpus: 19 articles, 1132 annotated date fixtures in
test/vi/fixtures/wikiwars_vi_curated.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VIStandardParser: improve boundary handling
- VICasualTimeParser: fix meridiem mapping and implied hours
- VITimeExpressionParser: correct colon-format minute group
- VIWeekdayParser: fix modifier detection
- vi_standard.test.ts: remove unreliable ngày-32 partial-match assertion
- vi_weekday.test.ts: align modifier expectations
- wikiwars_vi_curated.json: minor annotation correction
… typo

Non-capturing (?:...) on modifier group meant match[MODIFIER_GROUP] was
always undefined, making next/last weekday logic dead code. Changed to a
capturing group so modifier text is available. Also corrected the typo
'quả' (fruit) → 'qua' (past) in the last-weekday branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er group index

TIME_UNITS_PATTERN contains zero capturing groups (repeatedTimeunitPattern
strips them all to non-capturing). Group layout is:
  1 = prefix modifier, 2 = unit (prefix form)
  3 = unit (suffix form), 4 = suffix modifier

match[5] was off by one, so suffix-form 'tuần trước' / 'tháng qua' always
fell through to an undefined modifier and produced a future date instead of
past. Fixed to match[4].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In chrono's 12-hour convention, AM = hours 0–11, PM = hours 12–23.
Noon (12:00) is PM. The AM assignment caused noon to be interpreted as
midnight in downstream meridiem-aware code. Added inline comment to
explain the convention for future readers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… never used

All three relative-unit parsers accepted a strictMode constructor param but
innerPattern() always returned the casual PATTERN, making the parameter dead
code. Added STRICT_PATTERN (aliased to PATTERN — VI has no unit abbreviations
so both modes are identical) and switched innerPattern() to return the correct
variant. Matches the API contract established by EN/IT locales.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
STRICT_PATTERN was assigned as an alias for PATTERN (since VI has no unit
abbreviations) and then used in a ternary that always evaluated the same
branch. Remove the alias and dead conditional; move the explanatory comment
onto innerPattern() where it belongs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vi_corpus.test.ts: exercises all 1132 WikiWars-VI fixtures covering
full_date, month_year, year_only, slash_date, and bc_year expression
types. Accuracy: 1132/1132 (100%).

VIYearParser: extend pattern to also match bare 'YYYY TCN' without the
'năm' prefix (e.g. '179 TCN'). Previously only 'năm YYYY TCN' was
supported, leaving bare BC year expressions unparsed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s, number-words, weekday modifiers

New test files:
- vi_month_year.test.ts: VIMonthYearParser (previously zero coverage) —
  tháng M năm YYYY, tháng M/YYYY slash form, month-only implies year,
  month > 12 rejected
- vi_casual_time.test.ts: VICasualTimeParser — trưa=PM, bình minh/sáng
  sớm=AM, chiều/tối/đêm=PM, nửa đêm=AM, buổi prefix, date+time merge
- vi_negative_cases.test.ts: invalid day/month, invalid slash date, bare
  4-digit number, phone number false positive

Expanded existing files:
- vi_weekday.test.ts: next (tới/sau) and last (qua) modifier assertions
  with concrete expected dates
- vi_time_units_ago.test.ts: number-word durations (hai/ba/một)
- vi_time_units_later.test.ts: number-word durations (ba/hai)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VITimeExpressionParser: move trưa to PM branch (1 giờ trưa → 13:00);
  add hour=0 guard for 12 giờ sáng → midnight
- VIWeekdayParser: fix capturing group for modifier; correct quả→qua typo;
  add negative lookahead for 'sau khi' conjunction
- VITimeUnitCasualRelativeFormatParser: make number optional so bare
  'tuần này'/'tháng trước' match; use references helpers in casual date
- VIYearParser: support bare 'YYYY TCN' without năm prefix (e.g. 179 TCN)
- VIMonthYearParser: wire MONTH_DICTIONARY for word-form months
  (tháng ba, tháng giêng, tháng chạp); use references.yesterday/tomorrow
- VICasualDateParser: add hôm kia (-2 days); use references helpers
- README: add vi to supported locales list (lines 40 and 215)
- Tests: add vi_date_range, vi_casual_time, vi_month_year,
  vi_negative_cases; expand vi_time_exp, vi_weekday, vi_time_units_*
  (15 test files, 77 tests total)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ayParser

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Maintainer feedback: corpus benchmark tests don't fit the project's
testing philosophy, and embedding the corpus makes the project too large.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Maintainer wants compiler/dependency changes in a separate CL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds expect(r.index).toBe(...) checks to sentence-embedded test cases
across casual, standard, time expression, weekday, slash, ago, and
later test files — matching the pattern used in EN/DE test suites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes test coverage gaps identified by comparing EN (154 tests) with VI:

- vi_strict.test.ts: verify strict mode rejects casual/weekday-only
  expressions while accepting standard dates and explicit time units
- vi_forward_date.test.ts: verify forwardDate option rolls time-only,
  weekday, slash date, and month expressions to future dates
- vi_casual.test.ts: add isCertain assertions for casual date components
- vi_time_exp.test.ts: add isCertain assertions for hour and meridiem
- vi_standard.test.ts: add isCertain assertions for full and partial dates
- vi_negative_cases.test.ts: add bare numbers, currency, version numbers,
  hyphenated ranges, and URL-encoded string rejection tests

VI locale: 75 → 92 tests across 14 → 16 suites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nhannht nhannht marked this pull request as draft April 20, 2026 04:16
@nhannht
Copy link
Copy Markdown
Author

nhannht commented Apr 20, 2026

Hi @wanasit, thank you for the review. I've addressed both comments:

  1. Corpus test removed — deleted test/vi/vi_corpus.test.ts and all test/vi/fixtures/ (19 Wikipedia .txt files + curated JSON). Replaced with 92 targeted unit tests across 16 suites that test specific patterns, following the same testSingleCase/testUnexpectedResult conventions as the EN locale.

  2. tsconfig.build.json reverted — restored to match upstream exactly. Will submit compiler changes as a separate CL if needed.

Additionally:

  • Rebased on upstream/master to incorporate the Finnish locale merge (feat: Add Finnish (fi) locale support #642)
  • Added strict mode, forwardDate, isCertain(), and expanded negative case tests to match EN test quality

All 92 tests pass. Will mark as ready for review once I've done a final check.

nhannht and others added 2 commits April 20, 2026 11:41
… drift

- VITimeExpressionParser: separate "trưa" handling from "chiều/tối/đêm" —
  "11 giờ trưa" is 11 AM (approaching noon), not 23:00
- VICasualTimeParser: remove redundant \b from PATTERN — JS \b fails on
  Vietnamese đ (non-ASCII), breaking standalone "đêm" parsing.
  AbstractParserWithWordBoundaryChecking already provides left-boundary
- Revert package-lock.json to upstream (unrelated lockfile drift)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…edge cases

- VITimeExpressionParser: reject minute >= 60 (matches AbstractTimeExpressionParser)
- vi_casual: add toBeDate() assertions for today/yesterday/tomorrow/now
- vi_forward_date: add same-weekday edge case (stays on same day)
- vi_strict: add slash dates acceptance test (30/4/1975, 15/3)
- vi_negative_cases: add impossible minute test (61, 99)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nhannht
Copy link
Copy Markdown
Author

nhannht commented Apr 20, 2026

Pushed two additional commits addressing issues found during self-review:

86f0076 fix(vi): fix trưa meridiem bug, \b word boundary bug, revert lockfile drift

  • VITimeExpressionParser: "trưa" was incorrectly grouped with "chiều/tối/đêm", causing 11 giờ trưa to return 23:00 instead of 11:00 AM
  • VICasualTimeParser: removed redundant \b from regex — JS \b doesn't work with Vietnamese đ (non-ASCII), silently breaking standalone đêm parsing
  • Reverted package-lock.json to match upstream (unrelated lockfile drift)

8a476a6 test(vi): add minute validation, toBeDate assertions, strict/forward edge cases

  • Added minute >= 60 rejection in VITimeExpressionParser (matching AbstractTimeExpressionParser)
  • Added toBeDate() assertions to casual tests (matching EN conventions)
  • Added strict mode slash date acceptance test, forwardDate same-weekday edge case, impossible minute negative test

Test count: 95 tests across 16 suites, all passing. Full suite (610 tests / 130 suites) also clean.

@nhannht nhannht marked this pull request as ready for review April 20, 2026 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants