Grammario Performance and ML/NLP Feature Upgrade

This document describes every change made to the Grammario codebase during the performance optimization and advanced ML/NLP feature upgrade. It is organized by phase, with exact file paths, code explanations, and architectural rationale.

Problem Statement and Root Cause Analysis
Phase 1: Performance Fixes
Phase 2: ML/NLP Feature Additions
Phase 3: Data Engineering -- Spaced Repetition System
Phase 4: Observability and ML Ops
Phase 5: Admin Dashboard
Schema and Type Updates
Infrastructure Changes
Complete File Manifest

System Architecture

1. Problem Statement and Root Cause Analysis

Symptom: Each sentence analysis query took 10-15 seconds to return on a 2GB DigitalOcean VM.

Root cause: The original NLPService.analyze_text() ran two expensive operations sequentially:

Stanza neural inference (~3-5s on a small CPU): Loading a full neural NLP pipeline (tokenizer, multi-word token expander, POS tagger, lemmatizer, dependency parser) and running it through PyTorch on a CPU without AVX2 support.
LLM API call (~3-8s): A synchronous HTTP call to OpenRouter/OpenAI for pedagogical explanations, including JSON parsing and retries.

These two operations are completely independent -- the LLM only needs the raw text, not the parse output -- yet they were chained sequentially, meaning the total wall-clock time was their sum.

Additionally:

The route handler was declared async def but called the synchronous analyze_text(), which blocks the event loop.
No caching existed. Identical sentences triggered full re-analysis every time.
Stanza is 10-50x slower than spaCy on CPU for the same tasks.

Before (sequential):

User request -> Stanza (4s) -> LLM (5s) -> Response = 9s total

After (parallel + spaCy + cache):

User request -> Cache check (1ms)
             -> [miss] -> spaCy (0.3s) | LLM (4s) | Embedding (0.1s)  [parallel]
                       -> Post-processing (5ms)
                       -> Cache set
                       -> Response = ~4s total
             -> [hit]  -> Response = 5ms

Analysis Request Flow

2. Phase 1: Performance Fixes

1A. Parallel Inference with asyncio

The NLPService class was rewritten to expose an analyze_text_async() method that uses asyncio.gather() with a ThreadPoolExecutor to run three independent operations concurrently:

NLP Pipeline (spaCy/Stanza): Tokenization, POS tagging, lemmatization, dependency parsing
LLM Call: Pedagogical data generation (translation, grammar concepts, tips, error detection)
Embedding: Sentence vector encoding for similarity search

run_in_executor with a thread pool lets both CPU-bound (NLP) and I/O-bound (LLM) tasks run without blocking the FastAPI event loop.

Parallel NLP Pipeline

1B. Redis Caching Layer

A CacheService class backed by Redis:

Keys: SHA-256 hashes of "{language}:{text.strip().lower()}", prefixed with grammario:analysis:
Values: JSON-serialized analysis results
TTL: 24 hours (configurable)
Graceful degradation: If Redis is unreachable, caching is silently disabled

Hit-rate tracking is exposed via the /health endpoint and /api/v1/cache/stats.

Redis Caching Flow

1C. spaCy as a Faster NLP Engine

spaCy is 10-50x faster than Stanza on CPU. A SpacyManager singleton was added with:

Supported languages: IT, ES, DE, RU (Turkish has no spaCy model with dependency parsing)
Models: it_core_news_md, es_core_news_md, de_core_news_md, ru_core_news_md
Auto-download: If spacy.load() fails, the manager calls spacy.cli.download() and retries
Fallback: If spaCy fails at runtime, falls back to Stanza transparently

1D. Truly Async Route Handlers

The /analyze endpoint was changed from synchronous to properly async, preventing the uvicorn event loop from blocking during analysis.

3. Phase 2: ML/NLP Feature Additions

2A. CEFR Difficulty Scoring

A feature-engineered classification pipeline that rates sentence difficulty on the CEFR scale (A1-C2) using 10 linguistic features:

Feature	What It Measures
Sentence length	Raw complexity
Average word length	Morphological richness
Type-token ratio	Lexical diversity
Tree depth	Embedding depth of clauses
Tree width	Parallel structure complexity
Subordinate clause count	Syntactic subordination
Morphological complexity	Inflectional richness
Unique POS count	Syntactic variety
Rare word proportion	Vocabulary difficulty
Lexical density	Information density

Tree depth and subordination get the highest weights (0.20 each) because embedded clauses are the strongest predictor of syntactic difficulty in second language acquisition research.

2B. Word Frequency Analysis

Each word is tagged with a frequency band (1-5):

Band	Rank Range	Label
1	Top 500	Very Common
2	501-2000	Common
3	2001-5000	Intermediate
4	5001-10000	Uncommon
5	10001+	Rare

Frequency data is loaded from per-language JSON files generated from corpus frequency data. In the UI, colored dots indicate frequency bands (green to red).

2C. Sentence Embeddings and Similarity Search

Model: paraphrase-multilingual-MiniLM-L12-v2 from sentence-transformers

384-dimensional normalized vectors
Supports 50+ languages
~90MB, runs fast on CPU (~50-100ms per sentence)

Embeddings enable future similarity search: "find sentences I've studied that are similar to this one."

2D. Grammar Error Detection

A dual-approach system:

Rule-based (from parse tree):

DET-NOUN gender/number agreement (IT, ES)
ADJ-NOUN case/gender agreement (DE, RU)
Vowel harmony violations (TR)
Subject-verb number/person agreement (all languages)

LLM-based (from prompt):

Spelling, agreement, conjugation, case, word order, preposition, and article errors
Each error includes the word, type, correction, and explanation

4. Phase 3: Data Engineering -- Spaced Repetition System

SM-2 Algorithm

A TypeScript implementation of the SuperMemo SM-2 spaced repetition algorithm (the standard used by Anki):

Quality ratings 0-5
Adaptive ease factor (minimum 1.3)
Interval scheduling: 1 day, then 6 days, then interval multiplied by easeFactor
Mastery score: weighted combination of repetitions, interval length, and ease factor

Vocabulary Review UI

A full flashcard review interface with:

Stats bar (Total Words, Due Today, Mastered)
Flashcard with show/hide answer
Three-button rating (Wrong/Hard/Good) plus advanced 0-5 ratings
Progress bar and session summary

Spaced Repetition Cycle (SM-2)

5. Phase 4: Observability and ML Ops

Enhanced Health Endpoint

The /health endpoint reports:

Service status (LLM, Redis, Embeddings)
Engine info (spaCy/Stanza loaded models)
Feature flags
Memory usage (RSS/VMS in MB)

Structured Performance Logging

Every service logs execution time in milliseconds:

NLP pipeline (spacy) completed in 45ms for lang=it
LLM completed in 2300ms: translation=The cat eats..., concepts=3, tips=2, errors=0
Encoded sentence in 52ms (dim=384)
Cache HIT for key=grammario:analysis:a3b2c1d4e5f6
Total analysis completed in 2410ms (parallel)

6. Phase 5: Admin Dashboard

A standalone admin console at /admin with:

Overview: KPI cards, language breakdown, recent activity
Users: Full management table with inline edit, delete, search, pagination
Requests & Data: Every analysis with full raw JSON viewer, copy-to-clipboard
Vocabulary: All saved vocabulary across all users
Backend: Live health monitoring with service status, engine info, memory usage

Admin access is restricted to the hardcoded admin user ID.

7. Schema and Type Updates

New Pydantic Models (Backend)

LLMGrammarError: Grammar error detected by the LLM
RuleBasedError: Grammar error from parse-tree heuristics
DifficultyInfo: CEFR assessment with score and linguistic features

Extended Models

TokenNode gained frequency_band
PedagogicalData gained errors
SentenceAnalysis gained difficulty, grammar_errors, embedding

Database Schema

pgvector extension for embedding storage
difficulty_level, difficulty_score, embedding columns on analyses
IVFFlat index for similarity search
match_analyses() PostgreSQL function

8. Infrastructure Changes

Docker Compose

Redis service with allkeys-lru eviction and persistent volume
Backend depends on Redis health
CPU-only PyTorch build for smaller image size

Production Dockerfile

Multi-stage build with spaCy, Stanza, and sentence-transformers models pre-downloaded
Frequency data bundled in image

New Dependencies

spacy, sentence-transformers, scikit-learn, joblib, numpy, redis, psutil

9. Complete File Manifest

22 new files and 20 modified files across backend services, frontend components, API routes, database schema, Docker configuration, and deployment scripts.

Key New Services

Service	Purpose
Redis Cache	Analysis result caching with 24h TTL
spaCy Manager	Fast CPU NLP engine (10-50x faster than Stanza)
Difficulty Scorer	CEFR level classification via linguistic features
Frequency Service	Word frequency band lookups
Embedding Service	Sentence vectors for similarity search
Error Detector	Rule-based grammar error detection
SM-2 Algorithm	Spaced repetition for vocabulary review

Grammario Performance and ML/NLP Feature Upgrade

Problem Statement and Root Cause Analysis
Phase 1: Performance Fixes
Phase 2: ML/NLP Feature Additions
Phase 3: Data Engineering -- Spaced Repetition System
Phase 4: Observability and ML Ops
Phase 5: Admin Dashboard
Schema and Type Updates
Infrastructure Changes
Complete File Manifest

System Architecture

1. Problem Statement and Root Cause Analysis

Symptom: Each sentence analysis query took 10-15 seconds to return on a 2GB DigitalOcean VM.

Root cause: The original NLPService.analyze_text() ran two expensive operations sequentially:

Stanza neural inference (~3-5s on a small CPU): Loading a full neural NLP pipeline (tokenizer, multi-word token expander, POS tagger, lemmatizer, dependency parser) and running it through PyTorch on a CPU without AVX2 support.
LLM API call (~3-8s): A synchronous HTTP call to OpenRouter/OpenAI for pedagogical explanations, including JSON parsing and retries.

These two operations are completely independent -- the LLM only needs the raw text, not the parse output -- yet they were chained sequentially, meaning the total wall-clock time was their sum.

Additionally:

The route handler was declared async def but called the synchronous analyze_text(), which blocks the event loop.
No caching existed. Identical sentences triggered full re-analysis every time.
Stanza is 10-50x slower than spaCy on CPU for the same tasks.

Before (sequential):

User request -> Stanza (4s) -> LLM (5s) -> Response = 9s total

After (parallel + spaCy + cache):

User request -> Cache check (1ms)
             -> [miss] -> spaCy (0.3s) | LLM (4s) | Embedding (0.1s)  [parallel]
                       -> Post-processing (5ms)
                       -> Cache set
                       -> Response = ~4s total
             -> [hit]  -> Response = 5ms

Analysis Request Flow

2. Phase 1: Performance Fixes

1A. Parallel Inference with asyncio

The NLPService class was rewritten to expose an analyze_text_async() method that uses asyncio.gather() with a ThreadPoolExecutor to run three independent operations concurrently:

NLP Pipeline (spaCy/Stanza): Tokenization, POS tagging, lemmatization, dependency parsing
LLM Call: Pedagogical data generation (translation, grammar concepts, tips, error detection)
Embedding: Sentence vector encoding for similarity search

run_in_executor with a thread pool lets both CPU-bound (NLP) and I/O-bound (LLM) tasks run without blocking the FastAPI event loop.

Parallel NLP Pipeline

1B. Redis Caching Layer

A CacheService class backed by Redis:

Keys: SHA-256 hashes of "{language}:{text.strip().lower()}", prefixed with grammario:analysis:
Values: JSON-serialized analysis results
TTL: 24 hours (configurable)
Graceful degradation: If Redis is unreachable, caching is silently disabled

Hit-rate tracking is exposed via the /health endpoint and /api/v1/cache/stats.

Redis Caching Flow

1C. spaCy as a Faster NLP Engine

spaCy is 10-50x faster than Stanza on CPU. A SpacyManager singleton was added with:

Supported languages: IT, ES, DE, RU (Turkish has no spaCy model with dependency parsing)
Models: it_core_news_md, es_core_news_md, de_core_news_md, ru_core_news_md
Auto-download: If spacy.load() fails, the manager calls spacy.cli.download() and retries
Fallback: If spaCy fails at runtime, falls back to Stanza transparently

1D. Truly Async Route Handlers

The /analyze endpoint was changed from synchronous to properly async, preventing the uvicorn event loop from blocking during analysis.

3. Phase 2: ML/NLP Feature Additions

2A. CEFR Difficulty Scoring

A feature-engineered classification pipeline that rates sentence difficulty on the CEFR scale (A1-C2) using 10 linguistic features:

Feature	What It Measures
Sentence length	Raw complexity
Average word length	Morphological richness
Type-token ratio	Lexical diversity
Tree depth	Embedding depth of clauses
Tree width	Parallel structure complexity
Subordinate clause count	Syntactic subordination
Morphological complexity	Inflectional richness
Unique POS count	Syntactic variety
Rare word proportion	Vocabulary difficulty
Lexical density	Information density

Tree depth and subordination get the highest weights (0.20 each) because embedded clauses are the strongest predictor of syntactic difficulty in second language acquisition research.

2B. Word Frequency Analysis

Each word is tagged with a frequency band (1-5):

Band	Rank Range	Label
1	Top 500	Very Common
2	501-2000	Common
3	2001-5000	Intermediate
4	5001-10000	Uncommon
5	10001+	Rare

Frequency data is loaded from per-language JSON files generated from corpus frequency data. In the UI, colored dots indicate frequency bands (green to red).

2C. Sentence Embeddings and Similarity Search

Model: paraphrase-multilingual-MiniLM-L12-v2 from sentence-transformers

384-dimensional normalized vectors
Supports 50+ languages
~90MB, runs fast on CPU (~50-100ms per sentence)

Embeddings enable future similarity search: "find sentences I've studied that are similar to this one."

2D. Grammar Error Detection

A dual-approach system:

Rule-based (from parse tree):

DET-NOUN gender/number agreement (IT, ES)
ADJ-NOUN case/gender agreement (DE, RU)
Vowel harmony violations (TR)
Subject-verb number/person agreement (all languages)

LLM-based (from prompt):

Spelling, agreement, conjugation, case, word order, preposition, and article errors
Each error includes the word, type, correction, and explanation

4. Phase 3: Data Engineering -- Spaced Repetition System

SM-2 Algorithm

A TypeScript implementation of the SuperMemo SM-2 spaced repetition algorithm (the standard used by Anki):

Quality ratings 0-5
Adaptive ease factor (minimum 1.3)
Interval scheduling: 1 day, then 6 days, then interval multiplied by easeFactor
Mastery score: weighted combination of repetitions, interval length, and ease factor

Vocabulary Review UI

A full flashcard review interface with:

Stats bar (Total Words, Due Today, Mastered)
Flashcard with show/hide answer
Three-button rating (Wrong/Hard/Good) plus advanced 0-5 ratings
Progress bar and session summary

Spaced Repetition Cycle (SM-2)

5. Phase 4: Observability and ML Ops

Enhanced Health Endpoint

The /health endpoint reports:

Service status (LLM, Redis, Embeddings)
Engine info (spaCy/Stanza loaded models)
Feature flags
Memory usage (RSS/VMS in MB)

Structured Performance Logging

Every service logs execution time in milliseconds:

NLP pipeline (spacy) completed in 45ms for lang=it
LLM completed in 2300ms: translation=The cat eats..., concepts=3, tips=2, errors=0
Encoded sentence in 52ms (dim=384)
Cache HIT for key=grammario:analysis:a3b2c1d4e5f6
Total analysis completed in 2410ms (parallel)

6. Phase 5: Admin Dashboard

A standalone admin console at /admin with:

Overview: KPI cards, language breakdown, recent activity
Users: Full management table with inline edit, delete, search, pagination
Requests & Data: Every analysis with full raw JSON viewer, copy-to-clipboard
Vocabulary: All saved vocabulary across all users
Backend: Live health monitoring with service status, engine info, memory usage

Admin access is restricted to the hardcoded admin user ID.

7. Schema and Type Updates

New Pydantic Models (Backend)

LLMGrammarError: Grammar error detected by the LLM
RuleBasedError: Grammar error from parse-tree heuristics
DifficultyInfo: CEFR assessment with score and linguistic features

Extended Models

TokenNode gained frequency_band
PedagogicalData gained errors
SentenceAnalysis gained difficulty, grammar_errors, embedding

Database Schema

pgvector extension for embedding storage
difficulty_level, difficulty_score, embedding columns on analyses
IVFFlat index for similarity search
match_analyses() PostgreSQL function

8. Infrastructure Changes

Docker Compose

Redis service with allkeys-lru eviction and persistent volume
Backend depends on Redis health
CPU-only PyTorch build for smaller image size

Production Dockerfile

Multi-stage build with spaCy, Stanza, and sentence-transformers models pre-downloaded
Frequency data bundled in image

New Dependencies

spacy, sentence-transformers, scikit-learn, joblib, numpy, redis, psutil

9. Complete File Manifest

22 new files and 20 modified files across backend services, frontend components, API routes, database schema, Docker configuration, and deployment scripts.

Key New Services

Service	Purpose
Redis Cache	Analysis result caching with 24h TTL
spaCy Manager	Fast CPU NLP engine (10-50x faster than Stanza)
Difficulty Scorer	CEFR level classification via linguistic features
Frequency Service	Word frequency band lookups
Embedding Service	Sentence vectors for similarity search
Error Detector	Rule-based grammar error detection
SM-2 Algorithm	Spaced repetition for vocabulary review

Public Launch

Stripe Payments Live

Free Demo — No Login

Blog & Comparison Pages

Educator Onboarding

Shareable Analysis Cards

Mobile-Responsive Analyzer

Performance & ML/NLP Upgrade

Faster Analysis

Instant Repeats

Difficulty Ratings

Smart Word Insights

Vocabulary Review

Admin Dashboard

Grammario Performance and ML/NLP Feature Upgrade

Table of Contents

System Architecture

1. Problem Statement and Root Cause Analysis

Analysis Request Flow

2. Phase 1: Performance Fixes

1A. Parallel Inference with asyncio

Parallel NLP Pipeline

1B. Redis Caching Layer

Redis Caching Flow

1C. spaCy as a Faster NLP Engine

1D. Truly Async Route Handlers

3. Phase 2: ML/NLP Feature Additions

2A. CEFR Difficulty Scoring

2B. Word Frequency Analysis

2C. Sentence Embeddings and Similarity Search

2D. Grammar Error Detection

4. Phase 3: Data Engineering -- Spaced Repetition System

SM-2 Algorithm

Vocabulary Review UI

Spaced Repetition Cycle (SM-2)

5. Phase 4: Observability and ML Ops

Enhanced Health Endpoint

Structured Performance Logging

6. Phase 5: Admin Dashboard

7. Schema and Type Updates

New Pydantic Models (Backend)

Extended Models

Database Schema

8. Infrastructure Changes

Docker Compose

Production Dockerfile

New Dependencies

9. Complete File Manifest

Key New Services

Public Launch

Stripe Payments Live

Free Demo — No Login

Blog & Comparison Pages

Educator Onboarding

Shareable Analysis Cards

Mobile-Responsive Analyzer

Performance & ML/NLP Upgrade

Faster Analysis

Instant Repeats

Difficulty Ratings

Smart Word Insights

Vocabulary Review

Admin Dashboard

Grammario Performance and ML/NLP Feature Upgrade

Table of Contents

System Architecture

1. Problem Statement and Root Cause Analysis

Analysis Request Flow

2. Phase 1: Performance Fixes

1A. Parallel Inference with asyncio

Parallel NLP Pipeline

1B. Redis Caching Layer

Redis Caching Flow

1C. spaCy as a Faster NLP Engine

1D. Truly Async Route Handlers

3. Phase 2: ML/NLP Feature Additions

2A. CEFR Difficulty Scoring

2B. Word Frequency Analysis

2C. Sentence Embeddings and Similarity Search

2D. Grammar Error Detection