Grammario
DemoFeaturesPricing
Back to home
April 22, 2026v1.3.0

Public Launch

Grammario is now open to the public. Stripe payments are live, the free demo requires no account, and the educator channel is fully operational.

Stripe Payments Live

Pro subscriptions are now enabled. Monthly ($5) and annual billing available directly from the pricing page and in-app upgrade prompts.

Free Demo — No Login

Try a full grammar analysis at grammario.ai/demo without creating an account. Paste any sentence in any of the 6 supported languages.

Blog & Comparison Pages

Five new articles published — including Grammario vs. LanguageTool, vs. DeepL, and per-language grammar guides. Competitor pages for all four major tools.

Educator Onboarding

Teachers can now select 'I'm a teacher/tutor' during onboarding to be routed to the teacher dashboard and receive educator-specific emails.

Shareable Analysis Cards

Share button now prominently placed in the analyzer action bar. Twitter Card meta tags on /share/[id] for rich Discord and social previews.

Mobile-Responsive Analyzer

The analyzer input bar, layout toggles, and action buttons now adapt to small screens — fully usable on any phone.

Full changelist

  • Stripe checkout wired in pricing-client.tsx — purchases are live
  • Annual billing toggle with "Save 20%" callout and calculated annual total
  • Stripe Billing Portal linked in /app/settings for Pro users
  • Class tier Stripe product defined; hasClassTier() helper for future gating
  • Rewardful affiliate tracking script loaded; client_reference_id passed on every checkout
  • CRO improvements on pricing page: Most Popular badge, price anchoring, 30-day guarantee, Cancel anytime
  • Usage-based upgrade modal at daily cap and language-lock friction point
  • Soft upgrade banner on app home for active free users
  • Referral program card in /app/settings with link, count, and reward copy
  • Referral CTA added to Day 1 welcome email
  • Day 3, 14, 21 emails added to the email-sequences Edge Function
  • Educator Day 2 and Day 5 email sequence
  • Role-selection step in onboarding modal (learner vs. teacher)
  • Educator role sets is_educator = true and routes to /app/teacher
  • Database migration: is_educator boolean DEFAULT false column + partial index
  • isTeacher() returns true for Pro users who flagged as educators
  • Share button added to analyzer action bar
  • Twitter Card OG meta tags on /share/[id]
  • Free demo at /demo — no auth, IP rate-limited, teased results
  • Hero CTAs updated: "Try it free" → /demo, "See pricing" replaces beta copy
  • Testimonials section with 6 reviews and 4.9/5 star rating on landing page
  • Inline star rating + 10,000+ analyses social proof in hero
  • /blog index + /blog/[slug] article pages with 5 published posts
  • Competitor pages with real content for LanguageTool, DeepL, Reverso, ChatGPT
  • Mobile: analyzer input bar stacks vertically; toggles and action buttons icon-only on small screens
  • Mobile screenshot converted to next/image; preconnect hints added to layout
  • robots.txt and updated sitemap.ts with all new routes
  • Google Search Console verification meta tag support (NEXT_PUBLIC_GSC_VERIFICATION)
  • Product Hunt 60-second Remotion composition at 1200×800
  • checkout_completed PostHog event + success toast on return from Stripe
Prior Releasev1.2.0

Performance & ML/NLP Upgrade

A major upgrade to how Grammario analyzes sentences, with new features for learners and significant speed improvements under the hood.

Faster Analysis

Sentence analysis now runs 2-3x faster by processing multiple tasks at the same time instead of one after another.

Instant Repeats

If you analyze a sentence you've seen before, the result comes back instantly from cache instead of being re-computed.

Difficulty Ratings

Every sentence now gets a difficulty rating from A1 (beginner) to C2 (mastery) based on its grammar complexity and vocabulary.

Smart Word Insights

Each word is color-coded by how common it is in the language, and grammar mistakes are automatically detected and explained.

Vocabulary Review

A built-in flashcard system uses spaced repetition (the same method Anki uses) to help you memorize saved vocabulary over time.

Admin Dashboard

A full admin console for monitoring system health, managing users, and viewing all analysis data across the platform.

Technical Implementation Details

Grammario Performance and ML/NLP Feature Upgrade

This document describes every change made to the Grammario codebase during the performance optimization and advanced ML/NLP feature upgrade. It is organized by phase, with exact file paths, code explanations, and architectural rationale.


Table of Contents

  1. Problem Statement and Root Cause Analysis
  2. Phase 1: Performance Fixes
  3. Phase 2: ML/NLP Feature Additions
  4. Phase 3: Data Engineering -- Spaced Repetition System
  5. Phase 4: Observability and ML Ops
  6. Phase 5: Admin Dashboard
  7. Schema and Type Updates
  8. Infrastructure Changes
  9. Complete File Manifest

System Architecture


1. Problem Statement and Root Cause Analysis

Symptom: Each sentence analysis query took 10-15 seconds to return on a 2GB DigitalOcean VM.

Root cause: The original NLPService.analyze_text() ran two expensive operations sequentially:

  1. Stanza neural inference (~3-5s on a small CPU): Loading a full neural NLP pipeline (tokenizer, multi-word token expander, POS tagger, lemmatizer, dependency parser) and running it through PyTorch on a CPU without AVX2 support.
  2. LLM API call (~3-8s): A synchronous HTTP call to OpenRouter/OpenAI for pedagogical explanations, including JSON parsing and retries.

These two operations are completely independent -- the LLM only needs the raw text, not the parse output -- yet they were chained sequentially, meaning the total wall-clock time was their sum.

Additionally:

  • The route handler was declared async def but called the synchronous analyze_text(), which blocks the event loop.
  • No caching existed. Identical sentences triggered full re-analysis every time.
  • Stanza is 10-50x slower than spaCy on CPU for the same tasks.

Before (sequential):

User request -> Stanza (4s) -> LLM (5s) -> Response = 9s total

After (parallel + spaCy + cache):

User request -> Cache check (1ms)
             -> [miss] -> spaCy (0.3s) | LLM (4s) | Embedding (0.1s)  [parallel]
                       -> Post-processing (5ms)
                       -> Cache set
                       -> Response = ~4s total
             -> [hit]  -> Response = 5ms

Analysis Request Flow


2. Phase 1: Performance Fixes

1A. Parallel Inference with asyncio

The NLPService class was rewritten to expose an analyze_text_async() method that uses asyncio.gather() with a ThreadPoolExecutor to run three independent operations concurrently:

  • NLP Pipeline (spaCy/Stanza): Tokenization, POS tagging, lemmatization, dependency parsing
  • LLM Call: Pedagogical data generation (translation, grammar concepts, tips, error detection)
  • Embedding: Sentence vector encoding for similarity search

run_in_executor with a thread pool lets both CPU-bound (NLP) and I/O-bound (LLM) tasks run without blocking the FastAPI event loop.

Parallel NLP Pipeline

1B. Redis Caching Layer

A CacheService class backed by Redis:

  • Keys: SHA-256 hashes of "{language}:{text.strip().lower()}", prefixed with grammario:analysis:
  • Values: JSON-serialized analysis results
  • TTL: 24 hours (configurable)
  • Graceful degradation: If Redis is unreachable, caching is silently disabled

Hit-rate tracking is exposed via the /health endpoint and /api/v1/cache/stats.

Redis Caching Flow

1C. spaCy as a Faster NLP Engine

spaCy is 10-50x faster than Stanza on CPU. A SpacyManager singleton was added with:

  • Supported languages: IT, ES, DE, RU (Turkish has no spaCy model with dependency parsing)
  • Models: it_core_news_md, es_core_news_md, de_core_news_md, ru_core_news_md
  • Auto-download: If spacy.load() fails, the manager calls spacy.cli.download() and retries
  • Fallback: If spaCy fails at runtime, falls back to Stanza transparently

1D. Truly Async Route Handlers

The /analyze endpoint was changed from synchronous to properly async, preventing the uvicorn event loop from blocking during analysis.


3. Phase 2: ML/NLP Feature Additions

2A. CEFR Difficulty Scoring

A feature-engineered classification pipeline that rates sentence difficulty on the CEFR scale (A1-C2) using 10 linguistic features:

FeatureWhat It Measures
Sentence lengthRaw complexity
Average word lengthMorphological richness
Type-token ratioLexical diversity
Tree depthEmbedding depth of clauses
Tree widthParallel structure complexity
Subordinate clause countSyntactic subordination
Morphological complexityInflectional richness
Unique POS countSyntactic variety
Rare word proportionVocabulary difficulty
Lexical densityInformation density

Tree depth and subordination get the highest weights (0.20 each) because embedded clauses are the strongest predictor of syntactic difficulty in second language acquisition research.

2B. Word Frequency Analysis

Each word is tagged with a frequency band (1-5):

BandRank RangeLabel
1Top 500Very Common
2501-2000Common
32001-5000Intermediate
45001-10000Uncommon
510001+Rare

Frequency data is loaded from per-language JSON files generated from corpus frequency data. In the UI, colored dots indicate frequency bands (green to red).

2C. Sentence Embeddings and Similarity Search

Model: paraphrase-multilingual-MiniLM-L12-v2 from sentence-transformers

  • 384-dimensional normalized vectors
  • Supports 50+ languages
  • ~90MB, runs fast on CPU (~50-100ms per sentence)

Embeddings enable future similarity search: "find sentences I've studied that are similar to this one."

2D. Grammar Error Detection

A dual-approach system:

Rule-based (from parse tree):

  • DET-NOUN gender/number agreement (IT, ES)
  • ADJ-NOUN case/gender agreement (DE, RU)
  • Vowel harmony violations (TR)
  • Subject-verb number/person agreement (all languages)

LLM-based (from prompt):

  • Spelling, agreement, conjugation, case, word order, preposition, and article errors
  • Each error includes the word, type, correction, and explanation

4. Phase 3: Data Engineering -- Spaced Repetition System

SM-2 Algorithm

A TypeScript implementation of the SuperMemo SM-2 spaced repetition algorithm (the standard used by Anki):

  • Quality ratings 0-5
  • Adaptive ease factor (minimum 1.3)
  • Interval scheduling: 1 day, then 6 days, then interval multiplied by easeFactor
  • Mastery score: weighted combination of repetitions, interval length, and ease factor

Vocabulary Review UI

A full flashcard review interface with:

  • Stats bar (Total Words, Due Today, Mastered)
  • Flashcard with show/hide answer
  • Three-button rating (Wrong/Hard/Good) plus advanced 0-5 ratings
  • Progress bar and session summary

Spaced Repetition Cycle (SM-2)


5. Phase 4: Observability and ML Ops

Enhanced Health Endpoint

The /health endpoint reports:

  • Service status (LLM, Redis, Embeddings)
  • Engine info (spaCy/Stanza loaded models)
  • Feature flags
  • Memory usage (RSS/VMS in MB)

Structured Performance Logging

Every service logs execution time in milliseconds:

NLP pipeline (spacy) completed in 45ms for lang=it
LLM completed in 2300ms: translation=The cat eats..., concepts=3, tips=2, errors=0
Encoded sentence in 52ms (dim=384)
Cache HIT for key=grammario:analysis:a3b2c1d4e5f6
Total analysis completed in 2410ms (parallel)

6. Phase 5: Admin Dashboard

A standalone admin console at /admin with:

  • Overview: KPI cards, language breakdown, recent activity
  • Users: Full management table with inline edit, delete, search, pagination
  • Requests & Data: Every analysis with full raw JSON viewer, copy-to-clipboard
  • Vocabulary: All saved vocabulary across all users
  • Backend: Live health monitoring with service status, engine info, memory usage

Admin access is restricted to the hardcoded admin user ID.


7. Schema and Type Updates

New Pydantic Models (Backend)

  • LLMGrammarError: Grammar error detected by the LLM
  • RuleBasedError: Grammar error from parse-tree heuristics
  • DifficultyInfo: CEFR assessment with score and linguistic features

Extended Models

  • TokenNode gained frequency_band
  • PedagogicalData gained errors
  • SentenceAnalysis gained difficulty, grammar_errors, embedding

Database Schema

  • pgvector extension for embedding storage
  • difficulty_level, difficulty_score, embedding columns on analyses
  • IVFFlat index for similarity search
  • match_analyses() PostgreSQL function

8. Infrastructure Changes

Docker Compose

  • Redis service with allkeys-lru eviction and persistent volume
  • Backend depends on Redis health
  • CPU-only PyTorch build for smaller image size

Production Dockerfile

  • Multi-stage build with spaCy, Stanza, and sentence-transformers models pre-downloaded
  • Frequency data bundled in image

New Dependencies

  • spacy, sentence-transformers, scikit-learn, joblib, numpy, redis, psutil

9. Complete File Manifest

22 new files and 20 modified files across backend services, frontend components, API routes, database schema, Docker configuration, and deployment scripts.

Key New Services

ServicePurpose
Redis CacheAnalysis result caching with 24h TTL
spaCy ManagerFast CPU NLP engine (10-50x faster than Stanza)
Difficulty ScorerCEFR level classification via linguistic features
Frequency ServiceWord frequency band lookups
Embedding ServiceSentence vectors for similarity search
Error DetectorRule-based grammar error detection
SM-2 AlgorithmSpaced repetition for vocabulary review

Grammario — Visual Grammar for Deep Learners