Transparency

AI Manifesto

BlogIA is fully transparent about how content is produced. No black box — here is exactly how our autonomous neural newsroom works, what safeguards are in place, and why you can trust the output.

Why Full Transparency?

AI-generated content raises legitimate questions about accuracy, bias, and trust. We believe the answer is not to hide behind vague disclaimers, but to open the entire pipeline for scrutiny.

Every article on BlogIA is machine-generated — and we are proud of that. But "machine-generated" does not mean "unverified." Our pipeline includes multiple layers of source verification, adversarial scoring, hallucination detection, and quality validation that many human-staffed newsrooms lack.

This manifesto documents each step of our content pipeline so you can judge for yourself.

The Content Pipeline — Step by Step

1. Data Ingestion

Every day, automated collectors ingest content from 50+ curated RSS feeds, ArXiv pre-prints, HuggingFace model cards, GitHub trending repos, and PyPI releases. Data is stored in a local SQLite database and a ChromaDB vector store for RAG retrieval.

2. Multi-Source Enrichment

Before any article is generated, our Multi-Source Enricher cross-references the topic across the entire RSS database (6,700+ articles) to find 2–4 independent coverages. Single-source topics are flagged and deprioritized.

3. Local LLM Generation

Content is generated by large language models running locally on our own hardware via Ollama — not cloud APIs. This ensures full data sovereignty, zero third-party dependencies, and predictable costs. Models used include Qwen 2.5 14B and Llama 3.1.

4. Anti-Hallucination Pipeline

Every generated article passes through 6 strict anti-hallucination rules embedded in the system prompt: no invented quotes, no fabricated statistics, no unverifiable claims. The AI must cite its sources and flag uncertainty explicitly.

5. Adversarial Scoring

For reviews and comparisons, an Adversarial Scorer pits an AI 'Advocate' against an AI 'Prosecutor' to debate product merits. A separate 'Judge' model assigns the final score. This prevents one-sided evaluations.

6. Quality Validation

An automated Article Validator scores every piece on 10 criteria: source diversity (must be multi-source), factual density, SEO optimization, structure, readability, and hallucination patterns. Articles scoring below 70/100 are rejected and sent to drafts.

7. SEO Scoring

A dedicated SEO Scorer evaluates title length, meta description, heading structure, internal linking, keyword density, and content length. Average score across all published articles: 80+/100.

8. Build & Deploy

The Next.js frontend reads validated content and generates a fully static site (4,000+ pages). The site is deployed via rsync to our hosting provider every 2 hours, with IndexNow notifications sent to search engines automatically.

Anti-Hallucination Safeguards

LLMs can produce plausible-sounding but factually incorrect content. Here are the specific safeguards we have built into every pipeline:

No Invented Quotes

The AI never fabricates quotes from real people. All attributed statements must come from verifiable source material.

No Fabricated Statistics

Numbers, percentages, and data points must originate from ingested sources. The AI cannot generate plausible-sounding but fake statistics.

Consensus Engine

Facts are triangulated from multiple independent sources before being included. Contradictory claims are flagged and presented as such.

Citation-First Generation

Tutorials and guides use a Fact Sheet Generator that builds a verified context of citations before any prose is written.

Density Validation

A Density Validator measures information-to-fluff ratio. Articles with low information density are rejected.

Local Processing Only

All LLM inference runs on our own hardware. No content passes through third-party cloud APIs. Full data sovereignty.

Technology Stack

Frontend: Next.js 16, React, Tailwind CSS, shadcn/ui
LLM Runtime: Ollama (local inference)
Models: Qwen 2.5 14B, Llama 3.1
Vector Store: ChromaDB (RAG retrieval)
Database: SQLite (articles, analytics)
Image Gen: Stable Diffusion (local)
Hosting: Static export, rsync to Hostinger
Orchestration: cron + bash master script

Known Limitations

No first-hand experience. Our AI has not physically used the products it reviews. Reviews are based on documented benchmarks, official specifications, and aggregated user reports from multiple sources.

Potential for subtle inaccuracies. Despite multi-source verification and anti-hallucination rules, edge cases exist. If you spot an error, please contact us and we will correct it promptly.

No editorial opinion. BlogIA does not express subjective opinions. Analysis sections synthesize documented viewpoints from the AI research community rather than advocating a personal stance.

Questions about our pipeline? Spotted an inaccuracy?

Get in touch