Skip to content

License Restructuring and Dataset Removal Implementation Plan

For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Remove SWOW (CC BY-NC-ND 3.0), IPhOD2 (GPL v2), and PHOIBLE remnants (CC BY-SA 3.0) from PhonoLex, then restructure licensing to Apache 2.0 + Proprietary split.

Architecture: Six targeted removal/restructuring tasks across the monorepo. Each removes a dataset or updates licensing. No new features. All deferred tasks (phonotactic recalculation, pickle regeneration) are documented but not implemented here.

Tech Stack: Python (data loaders, pipeline, governors, dashboard), TypeScript (web API, frontend, dashboard frontend), licensing files


File Structure

No new source code files. Changes to existing files only, plus new LICENSE/NOTICE files.

Task 1 (SWOW removal): - Modify: packages/data/src/phonolex_data/loaders/associations.py — delete load_swow() - Modify: packages/data/src/phonolex_data/loaders/__init__.py — remove exports - Modify: packages/data/src/phonolex_data/pipeline/edges.py — remove SWOW edge assembly - Modify: packages/data/src/phonolex_data/pipeline/schema.py — remove swow_strength field - Modify: packages/governors/src/phonolex_governors/thematic.py — update signature + docstrings - Modify: packages/governors/src/phonolex_governors/boosts.py — update comment - Modify: packages/dashboard/server/model.py — stop loading SWOW - Modify: packages/web/workers/src/config/edgeTypes.ts — remove SWOW entry - Modify: packages/web/workers/scripts/config.py — remove SWOW entry - Modify: packages/web/workers/src/types.ts — remove swow_strength - Modify: packages/web/workers/src/routes/associations.ts — remove swow_strength - Modify: packages/web/workers/scripts/export-to-d1.py — remove swow_strength - Modify: packages/web/frontend/src/types/phonology.ts — remove edge fields - Modify: packages/web/frontend/src/services/apiClient.ts — remove swow_strength - Modify: packages/web/frontend/src/components/tools/LookupTool.tsx — remove SWOW column/display - Modify: packages/web/frontend/src/components/AppHeader.tsx — update stats/chip text - Modify: packages/web/frontend/public/landing/phonological-therapy-materials.html — update copy - Modify: packages/web/workers/src/__tests__/api.test.ts — update edge-types assertions - Modify: packages/data/tests/test_pipeline.py — update EdgeRecord test

Task 2 (IPhOD2 removal): - Modify: packages/data/src/phonolex_data/loaders/norms.py — delete load_iphod() - Modify: packages/data/src/phonolex_data/loaders/__init__.py — remove exports - Modify: packages/data/src/phonolex_data/pipeline/schema.py — remove 6 IPhOD fields - Modify: packages/data/src/phonolex_data/pipeline/words.py — remove IPhOD import/loader/mapping - Modify: packages/web/workers/src/config/properties.ts — remove PHONOTACTIC_PROBABILITY category - Modify: packages/web/workers/scripts/config.py — remove PHONOTACTIC_PROBABILITY category - Modify: packages/web/workers/scripts/export-to-d1.py — remove 6 property + 6 percentile columns - Modify: packages/web/frontend/src/types/phonology.ts — remove 18 IPhOD fields - Modify: packages/dashboard/server/schemas.py — remove biphone_avg/pos_seg_avg from Phono - Modify: packages/dashboard/frontend/src/types.ts — remove from Phono interface - Modify: packages/dashboard/scripts/build_lookup.py — remove fallback writes - Modify: packages/dashboard/server/model.py — remove IPhOD field population - Modify: packages/data/tests/test_new_loaders.py — remove test_load_iphod - Modify: packages/data/tests/test_datasets.py — remove test_load_phonotactic_probability - Modify: packages/dashboard/server/tests/test_schemas.py — remove biphone_avg/pos_seg_avg

Task 3 (PHOIBLE remnants): - Delete: packages/data/src/phonolex_data/loaders/phoible.py - Modify: packages/data/src/phonolex_data/loaders/__init__.py — remove exports - Modify: packages/web/workers/src/lib/similarity.ts — update comments - Modify: packages/web/frontend/src/components/PhonemePickerDialog.tsx — update comment - Modify: packages/features/src/phonolex_features/validate.py — update docstrings - Modify: packages/data/src/phonolex_data/pipeline/schema.py — update DerivedData docstring - Modify: packages/data/tests/test_datasets.py — remove PHOIBLE tests

Task 4 (License restructuring): - Rewrite: LICENSE - Create: packages/governors/LICENSE - Create: packages/dashboard/LICENSE - Create: NOTICE - Rewrite: docs/about/license.md - Modify: docs/about/citations.md

Task 5 (Documentation updates): - Modify: CLAUDE.md - Modify: README.md - Delete or stub: docs/reference/phoible-features.md

Task 6 (TypeScript fix — already implemented): - Modify: packages/web/frontend/src/components/tools/ContrastiveInterventionTool.tsx


Task 1: Remove SWOW

Files: See file structure above (19 files)

  • [ ] Step 1: Remove load_swow() from associations.py

Delete lines 11-49 (the entire load_swow function) from packages/data/src/phonolex_data/loaders/associations.py.

  • [ ] Step 2: Remove SWOW from loaders __init__.py

In packages/data/src/phonolex_data/loaders/__init__.py: - Line 10: remove load_swow from the import - Line 24: remove "load_swow" from __all__

The import line becomes:

from phonolex_data.loaders.associations import load_free_association, load_simlex, load_men, load_wordsim, load_spp, load_eccc

The __all__ line becomes:

    "load_free_association", "load_simlex", "load_men", "load_wordsim", "load_spp", "load_eccc",

  • [ ] Step 3: Remove swow_strength from EdgeRecord

In packages/data/src/phonolex_data/pipeline/schema.py, delete line 91:

    swow_strength: float | None = None

  • [ ] Step 4: Remove SWOW edge assembly from edges.py

In packages/data/src/phonolex_data/pipeline/edges.py: - Remove load_swow from the imports (line 6) - Delete lines 41-56 (the entire # 1. SWOW block) - Update docstring on build_edges (line 32) from "7 association datasets" to "6 association datasets"

  • [ ] Step 5: Update thematic.py — remove SWOW parameter

In packages/governors/src/phonolex_governors/thematic.py:

Update module docstring (lines 1-9):

"""ThematicConstraint — semantic field boosting via cognitive association graphs.

Uses free-association data (USF) to define semantic fields from seed
words and boost tokens whose corresponding vocabulary word is associated with
any of the seeds.

    ThematicConstraint(seed_words=["dog", "cat"], strength=1.5, threshold=0.02)
        → LogitBoost where each token's boost = max_over_seeds(assoc_strength) * strength
          (zero for words below threshold; zero for words with no association data)
"""

Update build_assoc_graph (lines 32-61):

def build_assoc_graph(
    usf: dict[str, dict[str, float]],
) -> AssocGraph:
    """Build an undirected association graph from USF free-association data.

    Input is a ``{cue: {response: strength}}`` dict.  The result uses canonical
    ``(min(a, b), max(a, b))`` tuple keys so lookups are order-independent.

    Args:
        usf: USF association data.

    Returns:
        AssocGraph — a flat dict keyed by canonical (word, word) tuples.
    """
    merged: AssocGraph = {}

    for cue, responses in usf.items():
        cue_lower = cue.lower()
        for response, strength in responses.items():
            resp_lower = response.lower()
            key = (min(cue_lower, resp_lower), max(cue_lower, resp_lower))
            if strength > merged.get(key, 0.0):
                merged[key] = strength

    return merged

Update ThematicConstraint class docstring (line 68):

    graph (USF free-association data).  The boost for each token is:

  • [ ] Step 6: Update boosts.py comment

In packages/governors/src/phonolex_governors/boosts.py line 44, change:

        """Build a boost from a dict of token_id → score.

        Useful for AFINN sentiment, SWOW association weights, etc.
        """
to:
        """Build a boost from a dict of token_id → score.

        Useful for AFINN sentiment, association weights, etc.
        """

  • [ ] Step 7: Update dashboard model.py — stop loading SWOW

In packages/dashboard/server/model.py, replace lines 209-214:

        from phonolex_data.loaders import load_swow, load_free_association
        from phonolex_governors.thematic import build_assoc_graph
        print("Loading association graph (SWOW + USF)...")
        swow = load_swow()
        usf = load_free_association()
        _assoc_graph = build_assoc_graph(swow, usf)
with:
        from phonolex_data.loaders import load_free_association
        from phonolex_governors.thematic import build_assoc_graph
        print("Loading association graph (USF)...")
        usf = load_free_association()
        _assoc_graph = build_assoc_graph(usf)

  • [ ] Step 8: Remove SWOW from web API types and config

In packages/web/workers/src/types.ts, delete swow_strength from both interfaces: - Line 36 in EdgeRow: swow_strength: number | null; - Line 141 in EdgeResponse: swow_strength?: number | null;

In packages/web/workers/src/config/edgeTypes.ts, delete lines 12-16:

  SWOW: {
    label: 'Free Association (SWOW)',
    description: 'Small World of Words — forward association strength',
    strength_key: 'swow_strength',
  },

In packages/web/workers/scripts/config.py, delete lines 658-662:

    "SWOW": {
        "label": "Free Association (SWOW)",
        "description": "Small World of Words — forward association strength",
        "strength_key": "swow_strength",
    },

In packages/web/workers/src/routes/associations.ts, delete line 16:

    swow_strength: row.swow_strength,

In packages/web/workers/scripts/export-to-d1.py, remove "swow_strength" from EDGE_COLUMNS (line 63). Delete only the "swow_strength", entry. The list becomes:

EDGE_COLUMNS = [
    "source", "target", "edge_sources",
    "usf_forward", "usf_backward",
    "men_relatedness",
    "eccc_consistency", "eccc_n_instances", "eccc_phoneme_distance",
    "spp_first_priming", "spp_other_priming",
    "spp_fas", "spp_lsa",
    "simlex_similarity", "simlex_pos",
    "wordsim_relatedness",
]

  • [ ] Step 9: Remove SWOW from frontend types and components

In packages/web/frontend/src/types/phonology.ts, delete 'SWOW' from the EdgeType union (line 237):

export type EdgeType =
  | 'USF'    // was: | 'SWOW' \n  | 'USF'

In packages/web/frontend/src/services/apiClient.ts, delete line 51:

  swow_strength?: number;

In packages/web/frontend/src/components/tools/LookupTool.tsx: - Delete line 97 (SWOW: { label: 'SWOW', color: '#1976d2' },) - Delete line 399 (the SWOW % table header cell) - Lines 412-414: change the strength fallback from edge.swow_strength ?? edge.usf_forward ?? ... to edge.usf_forward ?? ... - Delete lines 432-434 (the SWOW % table data cell)

In packages/web/frontend/src/components/AppHeader.tsx: - Line 353: remove the SWOW bullet. Change to show USF instead. - Line 413: change "SWOW, USF, MEN, ECCC, SPP, SimLex-999, WordSim-353" to "USF, MEN, ECCC, SPP, SimLex-999, WordSim-353"

In packages/web/frontend/public/landing/phonological-therapy-materials.html line 187: - Change "free association norms (SWOW, USF)" to "free association norms (USF)" - Update "7 relationship types" to "6 relationship types"

  • [ ] Step 10: Update tests

In packages/web/workers/src/__tests__/api.test.ts: - Line 63: change expect(keys.length).toBe(7) to expect(keys.length).toBe(6) - Delete line 64: expect(keys).toContain('SWOW'); - Delete lines 69-72 (the SWOW property checks):

    const swow = body.SWOW as Record<string, unknown>;
    expect(swow).toHaveProperty('label');
    expect(swow).toHaveProperty('description');
    expect(swow).toHaveProperty('strength_key');

In packages/data/tests/test_pipeline.py, update test_edge_record_creation (lines 38-47):

    er = EdgeRecord(
        source="cat",
        target="dog",
        edge_sources=["USF"],
        usf_forward=0.08,
    )
    assert er.source == "cat"
    assert er.edge_sources == ["USF"]
    assert er.men_relatedness is None  # defaults to None

  • [ ] Step 11: Run tests
cd packages/web/workers && npm test
uv run python -m pytest packages/governors/tests/test_thematic.py -v
uv run python -m pytest packages/data/tests/test_pipeline.py -v

Expected: all pass.

  • [ ] Step 12: Commit
git commit -m "remove: SWOW dataset (CC BY-NC-ND 3.0) — incompatible license"

Task 2: Remove IPhOD2

Files: See file structure above (16 files)

  • [ ] Step 1: Remove load_iphod() from norms.py

In packages/data/src/phonolex_data/loaders/norms.py, delete lines 319-347 (the entire load_iphod function).

  • [ ] Step 2: Remove IPhOD from loaders __init__.py

In packages/data/src/phonolex_data/loaders/__init__.py: - Remove load_iphod from the norms import (line 8) - Remove "load_iphod" from __all__ (line 23)

  • [ ] Step 3: Remove 6 IPhOD fields from WordRecord

In packages/data/src/phonolex_data/pipeline/schema.py, delete lines 66-72:

    # Phonotactic probability (IPhOD)
    neighborhood_density: int | None = None
    phono_prob_avg: float | None = None
    positional_prob_avg: float | None = None
    str_phono_prob_avg: float | None = None
    str_positional_prob_avg: float | None = None
    str_neighborhood_density: int | None = None

  • [ ] Step 4: Remove IPhOD from pipeline words.py

In packages/data/src/phonolex_data/pipeline/words.py: - Remove load_iphod from imports (line 19) - Remove the 6 IPhOD entries from _NORM_FIELD_MAP (lines 68-74):

    # IPhOD
    "neighborhood_density": "neighborhood_density",
    "phono_prob_avg": "phono_prob_avg",
    "positional_prob_avg": "positional_prob_avg",
    "str_neighborhood_density": "str_neighborhood_density",
    "str_phono_prob_avg": "str_phono_prob_avg",
    "str_positional_prob_avg": "str_positional_prob_avg",
- Remove the IPhOD loader from norm_loaders list (line 180):
        ("IPhOD", load_iphod),

  • [ ] Step 5: Remove PHONOTACTIC_PROBABILITY from properties.ts

In packages/web/workers/src/config/properties.ts, delete the entire block from line 59 ({) through line 112 (},) — the phonotactic_probability category object.

  • [ ] Step 6: Remove PHONOTACTIC_PROBABILITY from config.py

In packages/web/workers/scripts/config.py: - Delete lines 89-162 (the entire PHONOTACTIC_PROBABILITY variable) - Remove PHONOTACTIC_PROBABILITY from the PROPERTY_CATEGORIES tuple (line 615)

Also remove the 6 IPhOD columns from INTEGER_PROPERTY_COLUMNS:

    "neighborhood_density", "str_neighborhood_density",

  • [ ] Step 7: Remove IPhOD columns from export-to-d1.py

In packages/web/workers/scripts/export-to-d1.py, remove the 6 IPhOD entries from PROPERTY_COLUMNS (lines 42-44):

    "phono_prob_avg", "positional_prob_avg",
    "neighborhood_density",
    "str_phono_prob_avg", "str_positional_prob_avg", "str_neighborhood_density",

  • [ ] Step 8: Remove IPhOD from frontend types

In packages/web/frontend/src/types/phonology.ts, delete: - Lines 47-53 (the 6 IPhOD fields in the Word interface):

  // Phonotactic probability
  phono_prob_avg: number | null;
  positional_prob_avg: number | null;
  neighborhood_density: number | null;
  str_phono_prob_avg: number | null;
  str_positional_prob_avg: number | null;
  str_neighborhood_density: number | null;
- Lines 129-140 (the 12 min/max filter fields in the filters interface)

Also update the comment on line 30 from "35 filterable properties from 18 research datasets" to "29 filterable properties from 15 research datasets".

Note: packages/web/workers/src/types.tsWordRow uses dynamic [key: string] indexing, so no named IPhOD fields to remove. No change needed there.

  • [ ] Step 9: Remove IPhOD from dashboard schemas + types

In packages/dashboard/server/schemas.py, delete lines 14-15 from the Phono model:

    biphone_avg: float
    pos_seg_avg: float

In packages/dashboard/frontend/src/types.ts, delete lines 9-10 from the Phono interface:

  biphone_avg: number;
  pos_seg_avg: number;

In packages/dashboard/scripts/build_lookup.py, remove "biphone_avg": 0.0 and "pos_seg_avg": 0.0 from both fallback dicts (lines 237-238 and 257-258).

In packages/dashboard/server/model.py, remove the IPhOD field population (lines 132-133):

                biphone_avg=phono_data.get("biphone_avg", 0.0),
                pos_seg_avg=phono_data.get("pos_seg_avg", 0.0),

  • [ ] Step 10: Update tests

In packages/data/tests/test_new_loaders.py, delete the test_load_iphod test function.

In packages/data/tests/test_datasets.py, delete test_load_phonotactic_probability.

In packages/dashboard/server/tests/test_schemas.py, remove "biphone_avg": 0.05 and "pos_seg_avg": 0.03 from the make_phono helper (lines 51-52).

  • [ ] Step 11: Run tests
cd packages/web/workers && npm test
cd packages/dashboard/frontend && npx vitest run
uv run python -m pytest packages/dashboard/server/tests/test_schemas.py -v
uv run python -m pytest packages/data/tests/test_pipeline.py -v

Expected: all pass.

  • [ ] Step 12: Commit
git commit -m "remove: IPhOD2 dataset (GPL v2) — incompatible license"

Task 3: Remove PHOIBLE Remnants

Files: 7 files (1 deletion, 6 modifications)

  • [ ] Step 1: Delete phoible.py loader
rm packages/data/src/phonolex_data/loaders/phoible.py
  • [ ] Step 2: Remove PHOIBLE from loaders __init__.py

In packages/data/src/phonolex_data/loaders/__init__.py: - Delete line 4: from phonolex_data.loaders.phoible import load_phoible, load_phonotactic_probability - Remove "load_phoible", "load_phonotactic_probability", from __all__

  • [ ] Step 3: Update stale PHOIBLE comments

In packages/web/workers/src/lib/similarity.ts: - Line 2: change "Soft Levenshtein similarity using phoneme-level PHOIBLE feature vectors." to "Soft Levenshtein similarity using phoneme-level feature vectors." - Line 20: change "Phoneme norms and pairwise dot products from PHOIBLE 76d vectors." to "Phoneme norms and pairwise dot products from learned feature vectors."

In packages/web/frontend/src/components/PhonemePickerDialog.tsx: - Line 65: change "data-driven from Phoible features" to "data-driven from phoneme features"

In packages/features/src/phonolex_features/validate.py: - Update any docstring references from "PHOIBLE" to "feature vectors"

In packages/data/src/phonolex_data/pipeline/schema.py: - Line 113: change "Computed data derived from word records and PHOIBLE vectors." to "Computed data derived from word records and learned feature vectors."

  • [ ] Step 4: Remove PHOIBLE tests

In packages/data/tests/test_datasets.py, delete the test_load_phoible test function and the load_phoible import. (Note: test_load_phonotactic_probability was already removed in Task 2 Step 10.)

  • [ ] Step 5: Run tests
uv run python -m pytest packages/data/tests/ -v --ignore=packages/data/tests/test_datasets.py --ignore=packages/data/tests/test_new_loaders.py
cd packages/web/workers && npm test

Expected: all pass.

  • [ ] Step 6: Commit
git commit -m "remove: PHOIBLE remnants (CC BY-SA 3.0) — replaced by learned feature vectors"

Task 4: License Restructuring

Files: 6 files (1 rewrite, 2 create, 1 create, 2 modify)

  • [ ] Step 1: Rewrite root LICENSE to Apache 2.0

Replace the entire contents of LICENSE with the standard Apache 2.0 text, prefixed with:

Copyright 2025-2026 Neumann's Workshop, LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

===============================================================================

SCOPE

This license applies to all files in this repository EXCEPT:
  - packages/governors/ — see packages/governors/LICENSE
  - packages/dashboard/ — see packages/dashboard/LICENSE

Those packages are proprietary software of Neumann's Workshop, LLC.

===============================================================================

DISCLAIMER

This software and data are provided "AS IS" without warranty of any kind.
While reasonable efforts have been made to ensure accuracy, users should:
  * Verify results independently
  * Cite original data sources appropriately
  * Not use for clinical diagnosis or treatment without professional oversight

See NOTICE for third-party attribution requirements.
  • [ ] Step 2: Create proprietary LICENSE files

Create packages/governors/LICENSE:

Copyright (c) 2025-2026 Neumann's Workshop, LLC. All rights reserved.

This software is proprietary and confidential. No part of this software
may be reproduced, distributed, or transmitted in any form or by any means
without the prior written permission of Neumann's Workshop, LLC.

For licensing inquiries: https://phonolex.com

Create packages/dashboard/LICENSE with the same content.

  • [ ] Step 3: Create NOTICE file

Create NOTICE at repo root:

PhonoLex
Copyright 2025-2026 Neumann's Workshop, LLC

This product includes data and software from the following sources:

================================================================================

CMU Pronouncing Dictionary
Copyright (c) 1993-2014 Carnegie Mellon University
Licensed under a Modified BSD License
http://www.speech.cs.cmu.edu/cgi-bin/cmudict

================================================================================

Edinburgh Closed-set Confusability Corpus (ECCC)
Marxer, R., Barker, J., Martin, N., & Coleman, J. (2016)
Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)
https://datashare.ed.ac.uk/handle/10283/2791

================================================================================

ipa-dict
Open Dict Data
Licensed under CC0 1.0 Universal (Public Domain)
https://github.com/open-dict-data/ipa-dict

================================================================================

Psycholinguistic Norm Datasets

This software incorporates published psycholinguistic norms from multiple
research groups. These datasets are used under standard academic practice
with appropriate citation. See docs/about/citations.md for full references.

Datasets: SUBTLEX-US, Kuperman AoA, Brysbaert Concreteness, Warriner VAD,
Glasgow Norms, Lancaster Sensorimotor, MorphoLex, English Lexicon Project,
Semantic Diversity, Socialness, BOI, Iconicity, Prevalence, CYP-LEX

Cognitive Association Datasets: USF Free Association, SimLex-999,
WordSim-353, MEN, Semantic Priming Project (SPP)

  • [ ] Step 4: Update docs/about/license.md

Rewrite to:

# License

PhonoLex uses a split license model:

## Open Source (Apache 2.0)

The data layer, web tools, and learned feature vectors are licensed under the
[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0):

- `packages/data/` — shared data loaders and phonological computation
- `packages/features/` — learned phoneme feature vectors
- `packages/web/` — API and frontend

## Proprietary

The constraint engine and governed chat platform are proprietary software
of Neumann's Workshop, LLC:

- `packages/governors/` — phonological constraint engine
- `packages/dashboard/` — governed chat server and frontend

For licensing inquiries: [phonolex.com](https://phonolex.com)

## Third-Party Data

See [Citations](citations.md) for full attribution of all research datasets.
See the `NOTICE` file in the repository root for third-party license details.

## Citation

If you use PhonoLex in your research or clinical work, please cite:
Neumann's Workshop, LLC. (2025). PhonoLex: Phonological Analysis Platform. https://phonolex.com
## Disclaimer

© 2025-2026 Neumann's Workshop, LLC. Provided as-is without warranty.

  • [ ] Step 5: Update docs/about/citations.md

Remove these three sections: - "### PHOIBLE" (lines 15-18) - "### Small World of Words (SWOW):" (lines 74-76) - "### Phonotactic Probability" section (lines 52-56 — Vitevitch & Luce 2004)

Update the intro line 3 from "18 research sources" to "15 research sources".

  • [ ] Step 6: Commit
git commit -m "license: restructure to Apache 2.0 + Proprietary split"

Task 5: Documentation Updates

Files: CLAUDE.md, README.md, docs/reference/phoible-features.md

  • [ ] Step 1: Update CLAUDE.md

Key changes: - Update "35 psycholinguistic properties from 18 datasets" to reflect new counts (29 filterable properties from 15 datasets) - Update "1M+ cognitive association edges" to reflect SWOW removal (~72K USF edges) - Update "7 relationship types" to "6 relationship types" - Remove PHOIBLE terminology note ("Use PHOIBLE vectors" → just "feature vectors, not embeddings") - Update property category count ("9 property categories" → "8 property categories" since PHONOTACTIC_PROBABILITY is gone) - Remove any remaining SWOW/IPhOD2/PHOIBLE references in the body

  • [ ] Step 2: Update README.md

  • Update license badge from CC BY-SA 3.0 to Apache 2.0

  • Update data source counts
  • Remove PHOIBLE, SWOW, IPhOD2 from any listed data sources

  • [ ] Step 3: Handle phoible-features.md

Delete docs/reference/phoible-features.md — the page is entirely about PHOIBLE features which are replaced by learned vectors. If mkdocs references it, remove the nav entry from mkdocs.yml.

  • [ ] Step 4: Run all tests
cd packages/web/workers && npm test
cd packages/web/frontend && npx tsc --noEmit
cd packages/dashboard/frontend && npx vitest run
uv run python -m pytest packages/governors/tests/ -v
uv run python -m pytest packages/dashboard/server/tests/ -v

Expected: all pass, zero TypeScript errors.

  • [ ] Step 5: Commit
git commit -m "docs: update CLAUDE.md, README, citations for dataset removal and new license"

Task 6: Commit Pre-existing TypeScript Fix

The null guard in ContrastiveInterventionTool.tsx was already implemented earlier in this session.

  • [ ] Step 1: Commit the fix
git add packages/web/frontend/src/components/tools/ContrastiveInterventionTool.tsx
git commit -m "fix: add null guard for phoneme_count in contrastive position filter"

Task 7: Final Verification

  • [ ] Step 1: Run all test suites
# Web API
cd packages/web/workers && npm test

# Web frontend type check
cd packages/web/frontend && npx tsc --noEmit

# Dashboard frontend
cd packages/dashboard/frontend && npx vitest run

# Python — governors
uv run python -m pytest packages/governors/tests/ -v

# Python — dashboard server
uv run python -m pytest packages/dashboard/server/tests/ -v

# Python — data package (excluding dataset-dependent tests)
uv run python -m pytest packages/data/tests/ -v --ignore=packages/data/tests/test_datasets.py --ignore=packages/data/tests/test_new_loaders.py

Expected: all pass, zero TypeScript errors.

  • [ ] Step 2: Verify no remaining references
grep -ri "swow" packages/ --include="*.py" --include="*.ts" --include="*.tsx" | grep -v node_modules | grep -v __pycache__ | grep -v ".pyc"
grep -ri "iphod" packages/ --include="*.py" --include="*.ts" --include="*.tsx" | grep -v node_modules | grep -v __pycache__
grep -ri "phoible" packages/ --include="*.py" --include="*.ts" --include="*.tsx" | grep -v node_modules | grep -v __pycache__

Expected: zero matches in active code (historical docs/specs are OK).


Deferred: Task F — Recalculate Phonotactic Probability from CMU Dict

Not implemented in this plan. To be designed and implemented in a separate spec after this work merges. Will restore the 6 PHONOTACTIC_PROBABILITY properties with clean provenance computed from CMU Dict biphone frequencies and edit-distance-1 neighbors.

Deferred: Task G — Regenerate Pickle and D1 Seed

Run locally after Tasks 1-3 merge:

python packages/data/src/phonolex_data/pipeline/main.py
python packages/web/workers/scripts/export-to-d1.py
npx wrangler d1 execute phonolex --local --file packages/web/workers/scripts/d1-seed.sql

Deferred: Task H — Audit Frontend Copy and Docs

Comprehensive review of all user-facing text, landing pages, mkdocs pages, and component copy to verify consistency with the dataset removals and license change. Check for stale property counts, dataset names, edge type references, or license mentions that may have been missed in the code-focused tasks above.