⚠️ AI Washing Case Study - This demonstrates detection of "Mechanical Turk" AI washing: a company marketing human labor as "AI-generated" output. Based on documented patterns in the industry. Sample data for educational purposes.
Demo AI Platform - Technical Due Diligence
Audit Date: 12/8/2025 |CodeRogue Audit System v1.0.0
Executive Summary
This audit reveals critical misrepresentation of AI capabilities. The platform markets an "AI assistant" named ARIA that supposedly generates 80% of application code automatically. Technical analysis shows ARIA is a requirements-gathering chatbot that forwards specifications to a task queue processed by human developers. No code generation AI exists. The architecture is a sophisticated task management and workflow coordination system, not an AI platform.
Key Insights
- •ARIA chat interface collects requirements and routes them to human developers via internal task queue
- •Found internal admin dashboards for developer assignment and workload management
- •Repository contains commits from 54 individual developers, not automated code generation
- •AI libraries installed but used only for cosmetic features, not core functionality
Top Risks
- •CRITICAL: "AI-generated code" claim is false - code is written by human developers
- •CRITICAL: Marketing materials claim proprietary ML models that do not exist
- •Task queue and developer assignment system proves human-powered workflow
- •Competitive positioning based on AI capabilities that are not real
Recommendations
- •DO NOT PROCEED with investment at current valuation - core technology claims are false
- •If considering investment, demand complete revaluation based on actual architecture (task management platform, not AI)
- •Assess reputational risk from association with company making false AI claims
- •Consider that actual value proposition is pre-built templates + development team coordination, not proprietary AI technology
Claims vs Reality
CRITICAL AI WASHING. The company claims its AI assistant "ARIA" autonomously generates 80% of application code. Technical audit reveals ARIA is a requirements-gathering chatbot using scripted dialogues. Actual code is written by human developers through an internal task management system. No generative AI, no ML models, no automated code generation exists. This represents material misrepresentation of core technology capabilities.
Detailed Results
"ARIA AI assistant autonomously generates 80% of production-ready application code"
No code generation capability exists. ARIA is implemented as a scripted chatbot using a decision tree (src/aria/DialogueEngine.ts). User inputs are parsed into structured requirements and inserted into a PostgreSQL task queue (src/tasks/TaskQueue.ts). Tasks are then assigned to developers via internal dashboard. Git history shows all code commits from individual developer accounts, not automated systems.
Evidence:
⚠️ AI-Washing Indicators:
"Proprietary machine learning models trained on 500,000+ successful applications"
No ML models found in codebase. No training data, no model artifacts, no inference code. The only "learning" is a PostgreSQL database storing past project templates. Claim of 500K applications cannot be verified - database shows 3,400 completed projects total.
Evidence:
⚠️ AI-Washing Indicators:
"Natural language understanding converts requirements to code specifications automatically"
No NLU implementation found. User input is processed through regex pattern matching and keyword extraction. Complex requirements trigger escalation to human "solution architects" via Slack notification. The output is a structured JSON form, not code specifications.
Evidence:
⚠️ AI-Washing Indicators:
"AI delivers complete applications in 2 weeks vs 6 months traditional development"
Fast delivery is real but achieved through pre-built templates and parallel human development teams, not AI. Admin dashboard shows project metrics tracking team assignments and delivery times. Speed comes from reusable component library and parallel developer assignment.
Evidence:
⚠️ AI-Washing Indicators:
"Continuous AI improvement - system learns from every project to improve future outputs"
No learning system exists. "Improvement" is developers manually adding new templates to the library. Found internal Jira board for "template requests" where developers propose new reusable components. No automated learning, feedback loops, or model retraining.
Evidence:
⚠️ AI-Washing Indicators:
"AI-powered quality assurance automatically detects and fixes bugs"
No AI-powered QA found. Quality assurance is performed by human QA team using standard testing tools. Found internal QA assignment system similar to developer task queue. Bug fixes are manually coded by developers.
Evidence:
⚠️ AI-Washing Indicators:
"Intelligent project estimation using historical data analysis"
Estimation exists but uses simple lookup tables, not AI. System matches project type to historical averages stored in config file. Senior project managers manually adjust estimates based on complexity.
Evidence:
⚠️ AI-Washing Indicators:
"Enterprise-grade security with AI-powered threat detection"
No AI-powered security found. Security consists of standard AWS WAF rules and basic rate limiting. No threat detection, anomaly detection, or security AI implementation.
Evidence:
⚠️ AI-Washing Indicators:
"Beautiful, customizable user interface for client communication"
Chat interface is well-designed and functional. ARIA conversation UI is polished. This is genuine - the chatbot frontend works well, it just does not have AI behind it.
Evidence:
"Integrations with popular business tools (Slack, Jira, GitHub)"
Integrations are real and functional. Ironic finding: Slack and Jira integrations are primarily used for internal developer coordination, not client-facing features. GitHub integration works for code delivery.
Evidence:
AI & Machine Learning Assessment
AI Libraries & Tools
Used only for generating marketing copy and email templates in admin panel. NOT used for code generation or ARIA chatbot. Rate-limited to 50 calls/day total across entire platform.
Listed in package.json but zero imports found in source code. Appears to be installed for due diligence optics.
Basic NLP library used only for extracting nouns from user input to populate search tags. Not used for understanding or code generation.
Architecture Overview
Technology Stack
Languages
Frameworks
Databases
External Services
CRITICAL FINDING: Primary communication channel for internal developer coordination. Tasks are escalated to human architects. Work assignments sent to developer channels.
Data: Task details, developer assignments, escalation alerts, project updates. This is core to the human workflow that powers the "AI".
CRITICAL FINDING: Internal task tracking for developer team. Each "AI-generated" project becomes a Jira epic with human-assigned tasks.
Data: Project requirements, task breakdowns, developer assignments, time tracking.
Used only for generating marketing descriptions. NOT used for ARIA chatbot or code generation despite marketing claims.
Data: Project names and categories for description generation. 50 calls/day limit.
Repository creation and code delivery. Developers push code here; clients receive access.
Data: Source code repositories, deployment artifacts.
Payment processing for project fees.
Data: Payment information, subscription data.
Code Quality & Health
Complexity: Concerning
Average complexity of 7.8 is slightly above industry standard (5-7 for well-maintained codebases), indicating moderate maintainability challenges. The primary issue is extremely high complexity in DialogueEngine (45) which reflects a massive decision tree implementation. It is hard-coded conversation branching logic that should be replaced with proper state machine or LLM-based approach.
- • DialogueEngine complexity of 45 makes the chatbot nearly impossible to maintain or extend
- • Adding new conversation paths requires modifying deeply nested conditionals
- • High bug risk in conversation logic due to complex state management
Technical Debt Analysis
High complexity concentrated in "AI" components that are actually massive decision trees. Code quality issues stem from trying to simulate AI behavior with traditional programming.
DialogueEngine and TaskQueue are overly complex because they simulate AI behavior with hard-coded logic. Extremely difficult to maintain.
DeveloperAssignment.tsx at 680 lines handles too many responsibilities.
Test Coverage
Below Average (58%)- • DialogueEngine has only 35% coverage - critical conversation paths untested
- • TaskQueue at 52% - core workflow logic insufficiently tested
- • Admin panels largely untested (20%)
Security & Compliance
Security Assessment
Two critical vulnerabilities expose internal operations. Admin API leaks developer workflow. Slack webhook in client bundle could allow external manipulation of internal channels.
- • Remove Slack webhook from client bundle immediately (CRITICAL)
- • Implement authorization on admin API endpoints (CRITICAL)
- • Audit all internal API endpoints for proper access control
Critical Vulnerabilities
Insecure Direct Object Reference in admin API
Impact: Any authenticated user can view internal developer assignments, task details, and workflow information. Exposes the human-powered nature of the platform.
→ Implement proper RBAC on admin API endpoints immediately
Slack webhook URL exposed in client bundle
Impact: Attackers can send arbitrary messages to internal Slack channels, potentially disrupting operations or extracting information about human workflow.
→ Move Slack integration to server-side only. Remove webhook from client bundle.
License Compliance
Delivery & Team
Team Metrics
CI/CD Status
CI/CD Quality Assessment
Score: 72/100Standard CI/CD setup. Notable finding: separate pipelines for client-facing app and internal admin tools. Admin tools have less rigorous testing.
Strengths
- ✓ Automated testing and deployment
- ✓ Separate staging environment
- ✓ PR reviews required
Gaps
- ✗ Admin tool pipeline has minimal testing
- ✗ No security scanning
- ✗ Internal tools deployed with less oversight