Grammario Performance and ML/NLP Feature Upgrade
This document describes every change made to the Grammario codebase during the performance optimization and advanced ML/NLP feature upgrade. It is organized by phase, with exact file paths, code explanations, and architectural rationale.
Table of Contents
System Architecture
1. Problem Statement and Root Cause Analysis
Symptom: Each sentence analysis query took 10-15 seconds to return on a 2GB DigitalOcean VM.
Root cause: The original NLPService.analyze_text() ran two expensive operations sequentially:
- Stanza neural inference (~3-5s on a small CPU): Loading a full neural NLP pipeline (tokenizer, multi-word token expander, POS tagger, lemmatizer, dependency parser) and running it through PyTorch on a CPU without AVX2 support.
- LLM API call (~3-8s): A synchronous HTTP call to OpenRouter/OpenAI for pedagogical explanations, including JSON parsing and retries.
These two operations are completely independent -- the LLM only needs the raw text, not the parse output -- yet they were chained sequentially, meaning the total wall-clock time was their sum.
Additionally:
- The route handler was declared
async defbut called the synchronousanalyze_text(), which blocks the event loop. - No caching existed. Identical sentences triggered full re-analysis every time.
- Stanza is 10-50x slower than spaCy on CPU for the same tasks.
Before (sequential):
User request -> Stanza (4s) -> LLM (5s) -> Response = 9s total
After (parallel + spaCy + cache):
User request -> Cache check (1ms)
-> [miss] -> spaCy (0.3s) | LLM (4s) | Embedding (0.1s) [parallel]
-> Post-processing (5ms)
-> Cache set
-> Response = ~4s total
-> [hit] -> Response = 5ms
Analysis Request Flow
2. Phase 1: Performance Fixes
1A. Parallel Inference with asyncio
The NLPService class was rewritten to expose an analyze_text_async() method that uses asyncio.gather() with a ThreadPoolExecutor to run three independent operations concurrently:
- NLP Pipeline (spaCy/Stanza): Tokenization, POS tagging, lemmatization, dependency parsing
- LLM Call: Pedagogical data generation (translation, grammar concepts, tips, error detection)
- Embedding: Sentence vector encoding for similarity search
run_in_executor with a thread pool lets both CPU-bound (NLP) and I/O-bound (LLM) tasks run without blocking the FastAPI event loop.
Parallel NLP Pipeline
1B. Redis Caching Layer
A CacheService class backed by Redis:
- Keys: SHA-256 hashes of
"{language}:{text.strip().lower()}", prefixed withgrammario:analysis: - Values: JSON-serialized analysis results
- TTL: 24 hours (configurable)
- Graceful degradation: If Redis is unreachable, caching is silently disabled
Hit-rate tracking is exposed via the /health endpoint and /api/v1/cache/stats.
Redis Caching Flow
1C. spaCy as a Faster NLP Engine
spaCy is 10-50x faster than Stanza on CPU. A SpacyManager singleton was added with:
- Supported languages: IT, ES, DE, RU (Turkish has no spaCy model with dependency parsing)
- Models:
it_core_news_md,es_core_news_md,de_core_news_md,ru_core_news_md - Auto-download: If
spacy.load()fails, the manager callsspacy.cli.download()and retries - Fallback: If spaCy fails at runtime, falls back to Stanza transparently
1D. Truly Async Route Handlers
The /analyze endpoint was changed from synchronous to properly async, preventing the uvicorn event loop from blocking during analysis.
3. Phase 2: ML/NLP Feature Additions
2A. CEFR Difficulty Scoring
A feature-engineered classification pipeline that rates sentence difficulty on the CEFR scale (A1-C2) using 10 linguistic features:
| Feature | What It Measures |
|---|---|
| Sentence length | Raw complexity |
| Average word length | Morphological richness |
| Type-token ratio | Lexical diversity |
| Tree depth | Embedding depth of clauses |
| Tree width | Parallel structure complexity |
| Subordinate clause count | Syntactic subordination |
| Morphological complexity | Inflectional richness |
| Unique POS count | Syntactic variety |
| Rare word proportion | Vocabulary difficulty |
| Lexical density | Information density |
Tree depth and subordination get the highest weights (0.20 each) because embedded clauses are the strongest predictor of syntactic difficulty in second language acquisition research.
2B. Word Frequency Analysis
Each word is tagged with a frequency band (1-5):
| Band | Rank Range | Label |
|---|---|---|
| 1 | Top 500 | Very Common |
| 2 | 501-2000 | Common |
| 3 | 2001-5000 | Intermediate |
| 4 | 5001-10000 | Uncommon |
| 5 | 10001+ | Rare |
Frequency data is loaded from per-language JSON files generated from corpus frequency data. In the UI, colored dots indicate frequency bands (green to red).
2C. Sentence Embeddings and Similarity Search
Model: paraphrase-multilingual-MiniLM-L12-v2 from sentence-transformers
- 384-dimensional normalized vectors
- Supports 50+ languages
- ~90MB, runs fast on CPU (~50-100ms per sentence)
Embeddings enable future similarity search: "find sentences I've studied that are similar to this one."
2D. Grammar Error Detection
A dual-approach system:
Rule-based (from parse tree):
- DET-NOUN gender/number agreement (IT, ES)
- ADJ-NOUN case/gender agreement (DE, RU)
- Vowel harmony violations (TR)
- Subject-verb number/person agreement (all languages)
LLM-based (from prompt):
- Spelling, agreement, conjugation, case, word order, preposition, and article errors
- Each error includes the word, type, correction, and explanation
4. Phase 3: Data Engineering -- Spaced Repetition System
SM-2 Algorithm
A TypeScript implementation of the SuperMemo SM-2 spaced repetition algorithm (the standard used by Anki):
- Quality ratings 0-5
- Adaptive ease factor (minimum 1.3)
- Interval scheduling: 1 day, then 6 days, then interval multiplied by easeFactor
- Mastery score: weighted combination of repetitions, interval length, and ease factor
Vocabulary Review UI
A full flashcard review interface with:
- Stats bar (Total Words, Due Today, Mastered)
- Flashcard with show/hide answer
- Three-button rating (Wrong/Hard/Good) plus advanced 0-5 ratings
- Progress bar and session summary
Spaced Repetition Cycle (SM-2)
5. Phase 4: Observability and ML Ops
Enhanced Health Endpoint
The /health endpoint reports:
- Service status (LLM, Redis, Embeddings)
- Engine info (spaCy/Stanza loaded models)
- Feature flags
- Memory usage (RSS/VMS in MB)
Structured Performance Logging
Every service logs execution time in milliseconds:
NLP pipeline (spacy) completed in 45ms for lang=it
LLM completed in 2300ms: translation=The cat eats..., concepts=3, tips=2, errors=0
Encoded sentence in 52ms (dim=384)
Cache HIT for key=grammario:analysis:a3b2c1d4e5f6
Total analysis completed in 2410ms (parallel)
6. Phase 5: Admin Dashboard
A standalone admin console at /admin with:
- Overview: KPI cards, language breakdown, recent activity
- Users: Full management table with inline edit, delete, search, pagination
- Requests & Data: Every analysis with full raw JSON viewer, copy-to-clipboard
- Vocabulary: All saved vocabulary across all users
- Backend: Live health monitoring with service status, engine info, memory usage
Admin access is restricted to the hardcoded admin user ID.
7. Schema and Type Updates
New Pydantic Models (Backend)
LLMGrammarError: Grammar error detected by the LLMRuleBasedError: Grammar error from parse-tree heuristicsDifficultyInfo: CEFR assessment with score and linguistic features
Extended Models
TokenNodegainedfrequency_bandPedagogicalDatagainederrorsSentenceAnalysisgaineddifficulty,grammar_errors,embedding
Database Schema
- pgvector extension for embedding storage
difficulty_level,difficulty_score,embeddingcolumns on analyses- IVFFlat index for similarity search
match_analyses()PostgreSQL function
8. Infrastructure Changes
Docker Compose
- Redis service with
allkeys-lrueviction and persistent volume - Backend depends on Redis health
- CPU-only PyTorch build for smaller image size
Production Dockerfile
- Multi-stage build with spaCy, Stanza, and sentence-transformers models pre-downloaded
- Frequency data bundled in image
New Dependencies
- spacy, sentence-transformers, scikit-learn, joblib, numpy, redis, psutil
9. Complete File Manifest
22 new files and 20 modified files across backend services, frontend components, API routes, database schema, Docker configuration, and deployment scripts.
Key New Services
| Service | Purpose |
|---|---|
| Redis Cache | Analysis result caching with 24h TTL |
| spaCy Manager | Fast CPU NLP engine (10-50x faster than Stanza) |
| Difficulty Scorer | CEFR level classification via linguistic features |
| Frequency Service | Word frequency band lookups |
| Embedding Service | Sentence vectors for similarity search |
| Error Detector | Rule-based grammar error detection |
| SM-2 Algorithm | Spaced repetition for vocabulary review |