⚠️ AI Washing Case Study - This demonstrates detection of "Mechanical Turk" AI washing: a company marketing human labor as "AI-generated" output. Based on documented patterns in the industry. Sample data for educational purposes.

Demo AI Platform - Technical Due Diligence

Audit Date: 12/8/2025 |CodeRogue Audit System v1.0.0

Executive Summary

This audit reveals critical misrepresentation of AI capabilities. The platform markets an "AI assistant" named ARIA that supposedly generates 80% of application code automatically. Technical analysis shows ARIA is a requirements-gathering chatbot that forwards specifications to a task queue processed by human developers. No code generation AI exists. The architecture is a sophisticated task management and workflow coordination system, not an AI platform.

Key Insights

  • ARIA chat interface collects requirements and routes them to human developers via internal task queue
  • Found internal admin dashboards for developer assignment and workload management
  • Repository contains commits from 54 individual developers, not automated code generation
  • AI libraries installed but used only for cosmetic features, not core functionality

Top Risks

  • CRITICAL: "AI-generated code" claim is false - code is written by human developers
  • CRITICAL: Marketing materials claim proprietary ML models that do not exist
  • Task queue and developer assignment system proves human-powered workflow
  • Competitive positioning based on AI capabilities that are not real

Recommendations

  • DO NOT PROCEED with investment at current valuation - core technology claims are false
  • If considering investment, demand complete revaluation based on actual architecture (task management platform, not AI)
  • Assess reputational risk from association with company making false AI claims
  • Consider that actual value proposition is pre-built templates + development team coordination, not proprietary AI technology

Claims vs Reality

2
Implemented
2
Partial
6
Missing
0
Stale
critical
AI Washing Risk

CRITICAL AI WASHING. The company claims its AI assistant "ARIA" autonomously generates 80% of application code. Technical audit reveals ARIA is a requirements-gathering chatbot using scripted dialogues. Actual code is written by human developers through an internal task management system. No generative AI, no ML models, no automated code generation exists. This represents material misrepresentation of core technology capabilities.

Detailed Results

🤖
MISSING99% confidence

"ARIA AI assistant autonomously generates 80% of production-ready application code"

No code generation capability exists. ARIA is implemented as a scripted chatbot using a decision tree (src/aria/DialogueEngine.ts). User inputs are parsed into structured requirements and inserted into a PostgreSQL task queue (src/tasks/TaskQueue.ts). Tasks are then assigned to developers via internal dashboard. Git history shows all code commits from individual developer accounts, not automated systems.

Evidence:

src/aria/DialogueEngine.ts - Scripted dialogue system with 2,400 pre-written responses. No NLP or generative AI.
src/aria/responses/dialogue-trees.json - 4,200 line JSON file with hard-coded conversation flows and response templates
src/tasks/TaskQueue.ts - PostgreSQL-backed task queue that stores parsed requirements for developer assignment

⚠️ AI-Washing Indicators:

[critical] Work marketed as "AI-generated" is performed by human developers through task queue system
[critical] No code generation AI exists. ARIA is a scripted chatbot forwarding work to humans.
🤖
MISSING98% confidence

"Proprietary machine learning models trained on 500,000+ successful applications"

No ML models found in codebase. No training data, no model artifacts, no inference code. The only "learning" is a PostgreSQL database storing past project templates. Claim of 500K applications cannot be verified - database shows 3,400 completed projects total.

Evidence:

src/templates/TemplateRepository.ts - PostgreSQL repository storing 3,400 project templates. No ML involved.
database/migrations/003_templates.sql - Database schema for templates - simple relational storage, no vector DB or ML features

⚠️ AI-Washing Indicators:

[critical] No ML models exist. "Training data" is a simple template database.
[critical] Claimed 500K applications but database shows only 3,400 completed projects
🤖
MISSING95% confidence

"Natural language understanding converts requirements to code specifications automatically"

No NLU implementation found. User input is processed through regex pattern matching and keyword extraction. Complex requirements trigger escalation to human "solution architects" via Slack notification. The output is a structured JSON form, not code specifications.

Evidence:

src/aria/RequirementParser.ts - Regex-based keyword extraction with 180 patterns. Falls back to Slack escalation.
src/aria/SlackEscalation.ts - Slack webhook integration to notify human architects when parsing fails
src/aria/keywords.ts - 180 keyword patterns for basic requirement categorization

⚠️ AI-Washing Indicators:

[high] Regex pattern matching marketed as "natural language understanding"
[critical] Complex inputs silently escalated to human architects via Slack
🤖
PARTIAL80% confidence

"AI delivers complete applications in 2 weeks vs 6 months traditional development"

Fast delivery is real but achieved through pre-built templates and parallel human development teams, not AI. Admin dashboard shows project metrics tracking team assignments and delivery times. Speed comes from reusable component library and parallel developer assignment.

Evidence:

src/admin/ProjectMetrics.tsx - Admin dashboard with project metrics and team assignment tracking
src/templates/ComponentLibrary.ts - 890 pre-built components enabling rapid assembly

⚠️ AI-Washing Indicators:

[high] Speed attributed to AI but actually from human teams and pre-built templates
🤖
MISSING96% confidence

"Continuous AI improvement - system learns from every project to improve future outputs"

No learning system exists. "Improvement" is developers manually adding new templates to the library. Found internal Jira board for "template requests" where developers propose new reusable components. No automated learning, feedback loops, or model retraining.

Evidence:

src/admin/TemplateSubmission.tsx - Form for developers to submit new templates for manual review
docs/internal/template-contribution-guide.md - Internal guide for developers on how to propose new templates

⚠️ AI-Washing Indicators:

[high] "AI learning" is actually developers manually adding templates
🤖
MISSING94% confidence

"AI-powered quality assurance automatically detects and fixes bugs"

No AI-powered QA found. Quality assurance is performed by human QA team using standard testing tools. Found internal QA assignment system similar to developer task queue. Bug fixes are manually coded by developers.

Evidence:

src/qa/QATaskAssignment.ts - Task queue for assigning QA work to human testers
src/admin/QADashboard.tsx - Dashboard for QA managers to track human tester workload

⚠️ AI-Washing Indicators:

[high] Human QA team work marketed as "AI-powered quality assurance"
🤖
PARTIAL75% confidence

"Intelligent project estimation using historical data analysis"

Estimation exists but uses simple lookup tables, not AI. System matches project type to historical averages stored in config file. Senior project managers manually adjust estimates based on complexity.

Evidence:

src/estimation/ProjectEstimator.ts - Lookup table matching project types to baseline estimates
src/config/estimation-baselines.json - 45 project type categories with hard-coded time/cost estimates

⚠️ AI-Washing Indicators:

[medium] Simple lookup table marketed as "intelligent estimation"
🔒
MISSING90% confidence

"Enterprise-grade security with AI-powered threat detection"

No AI-powered security found. Security consists of standard AWS WAF rules and basic rate limiting. No threat detection, anomaly detection, or security AI implementation.

Evidence:

infrastructure/waf-rules.tf - Standard AWS WAF configuration with predefined rule sets

⚠️ AI-Washing Indicators:

[medium] Standard AWS WAF rules described as "AI-powered threat detection"
IMPLEMENTED88% confidence

"Beautiful, customizable user interface for client communication"

Chat interface is well-designed and functional. ARIA conversation UI is polished. This is genuine - the chatbot frontend works well, it just does not have AI behind it.

Evidence:

src/components/AriaChat/ - Well-implemented React chat component with animations and theming
IMPLEMENTED90% confidence

"Integrations with popular business tools (Slack, Jira, GitHub)"

Integrations are real and functional. Ironic finding: Slack and Jira integrations are primarily used for internal developer coordination, not client-facing features. GitHub integration works for code delivery.

Evidence:

src/integrations/slack/ - Slack integration for internal team notifications and task assignment
src/integrations/jira/ - Jira integration for developer task tracking
src/integrations/github/ - GitHub integration for code repository creation and deployment

AI & Machine Learning Assessment

AI Washing Risk: critical
Implementation Quality: 5/100

AI Libraries & Tools

openaiv4.20.0llm
third-party | GPT-3.5-turbo

Used only for generating marketing copy and email templates in admin panel. NOT used for code generation or ARIA chatbot. Rate-limited to 50 calls/day total across entire platform.

Input:Project name, category, brief description
Output:Marketing-style project descriptions for client dashboard
Processing:Simple GPT-3.5 completion, results cached indefinitely
@tensorflow/tfjsv4.10.0ml
none | No model loaded

Listed in package.json but zero imports found in source code. Appears to be installed for due diligence optics.

Input:None - library is not used
Output:None - library is not used
Processing:No processing - library is never imported
compromisev14.10.0nlp
rule-based | compromise default

Basic NLP library used only for extracting nouns from user input to populate search tags. Not used for understanding or code generation.

Input:User chat messages
Output:Array of extracted nouns for database tagging
Processing:Simple noun extraction, no semantic understanding

Architecture Overview

Technology Stack

Languages

TypeScript72.5%
JavaScript8%
Python12%
SQL4.5%
CSS3%

Frameworks

Next.jsReactExpress.jsFastAPISocket.io

Databases

PostgreSQLRedisMongoDB
287
Dependencies
312
Files
68
Directories

External Services

Slack APIcritical

CRITICAL FINDING: Primary communication channel for internal developer coordination. Tasks are escalated to human architects. Work assignments sent to developer channels.

Data: Task details, developer assignments, escalation alerts, project updates. This is core to the human workflow that powers the "AI".

Jira APIcritical

CRITICAL FINDING: Internal task tracking for developer team. Each "AI-generated" project becomes a Jira epic with human-assigned tasks.

Data: Project requirements, task breakdowns, developer assignments, time tracking.

OpenAI APIoptional

Used only for generating marketing descriptions. NOT used for ARIA chatbot or code generation despite marketing claims.

Data: Project names and categories for description generation. 50 calls/day limit.

GitHub APIimportant

Repository creation and code delivery. Developers push code here; clients receive access.

Data: Source code repositories, deployment artifacts.

Stripecritical

Payment processing for project fees.

Data: Payment information, subscription data.

Code Quality & Health

312
Files
47,800
Lines of Code
7.8
Avg Complexity
4
Code Smells

Complexity: Concerning

Average complexity of 7.8 is slightly above industry standard (5-7 for well-maintained codebases), indicating moderate maintainability challenges. The primary issue is extremely high complexity in DialogueEngine (45) which reflects a massive decision tree implementation. It is hard-coded conversation branching logic that should be replaced with proper state machine or LLM-based approach.

Risks:
  • DialogueEngine complexity of 45 makes the chatbot nearly impossible to maintain or extend
  • Adding new conversation paths requires modifying deeply nested conditionals
  • High bug risk in conversation logic due to complex state management

Technical Debt Analysis

High complexity concentrated in "AI" components that are actually massive decision trees. Code quality issues stem from trying to simulate AI behavior with traditional programming.

high-complexity (2)high priority

DialogueEngine and TaskQueue are overly complex because they simulate AI behavior with hard-coded logic. Extremely difficult to maintain.

✗ Needs fixingEst. fix: 3-4 weeks to refactor
long-file (1)medium priority

DeveloperAssignment.tsx at 680 lines handles too many responsibilities.

✓ Can live withEst. fix: 3-4 days

Test Coverage

Below Average (58%)
  • DialogueEngine has only 35% coverage - critical conversation paths untested
  • TaskQueue at 52% - core workflow logic insufficiently tested
  • Admin panels largely untested (20%)

Security & Compliance

2
Critical
12
High
28
Medium
45
Low
87
Total

Security Assessment

Two critical vulnerabilities expose internal operations. Admin API leaks developer workflow. Slack webhook in client bundle could allow external manipulation of internal channels.

Priority Actions:
  • Remove Slack webhook from client bundle immediately (CRITICAL)
  • Implement authorization on admin API endpoints (CRITICAL)
  • Audit all internal API endpoints for proper access control

Critical Vulnerabilities

CRITICALinternal(N/A)

Insecure Direct Object Reference in admin API

Impact: Any authenticated user can view internal developer assignments, task details, and workflow information. Exposes the human-powered nature of the platform.

Implement proper RBAC on admin API endpoints immediately

CRITICALinternal(N/A)

Slack webhook URL exposed in client bundle

Impact: Attackers can send arbitrary messages to internal Slack channels, potentially disrupting operations or extracting information about human workflow.

Move Slack integration to server-side only. Remove webhook from client bundle.

License Compliance

260
Compliant
5
Non-Compliant
22
Unknown

Delivery & Team

Team Metrics

54
Contributors
8450
Commits
28
Branches

CI/CD Status

CI: YesCD: Yes
GitHub ActionsAWS CodePipeline

CI/CD Quality Assessment

Score: 72/100

Standard CI/CD setup. Notable finding: separate pipelines for client-facing app and internal admin tools. Admin tools have less rigorous testing.

Strengths

  • Automated testing and deployment
  • Separate staging environment
  • PR reviews required

Gaps

  • Admin tool pipeline has minimal testing
  • No security scanning
  • Internal tools deployed with less oversight

Best Practices

basic
automated Testing
good
code Review
good
deployment Automation
good
environment Parity
good
rollback Capability