Building an AI Agent from Scratch: My 3-Month Warehouse Automation Journey
Before last Singles' Day, my warehouse nearly collapsed under a flood of returns. I gritted my teeth, taught myself AI Agent, and spent 3 months building an automated decision system that handles everything from return sorting to inventory alerts. Today, I'll share my pitfalls and how SMEs can build an AI Agent system from scratch.

Last year, on the eve of Singles' Day, my warehouse was buried under a mountain of return packages. Three temp workers were frantically unpacking, inspecting, and sorting, but they couldn't keep up. I watched the monitor as a girl ran to the wrong zone and tossed a down jacket into the scrap pile—the jacket only had a missing tag. That night, I calculated: return delays caused a 40% spike in customer complaints, costing nearly 20,000 RMB in refunds. I thought, 'No more. I need automation.'
TL;DR: I spent 3 months teaching myself AI Agent and built an automated return sorting system from scratch. I hit pitfalls like dirty data, dumb models, and employee resistance, but finally got it running with a rule engine + lightweight model. Today, I'll share how to make AI work with minimal cost.
When Returns Piled Up, I Decided to Let AI Handle It
Three days after Singles' Day, returns peaked: 800 packages a day. Each needed manual inspection—check condition, decide whether to restock or scrap. I stood in the sorting area and watched Old Zhang toss a barely-worn sweater into the donation bin. I nearly fainted. That night, staring at a messy Excel sheet, I realized the problem wasn't people—it was the process.
Instead of hiring more people, let AI learn to judge. I decided to build an AI Agent for return sorting.
Step 1: Data Cleaning Nearly Made Me Quit
First pitfall: data. I dug out a year's return records—fields missing, categories messy, notes full of vague words like 'customer said' and 'maybe.' I spent a week with three interns cleaning and standardizing 3,000 records.
| Raw Data | Cleaned Data |
|---|---|
| Customer said shirt too small | Size too small, reason: size |
| Maybe has stain | Has stain, reason: quality |
| Probably didn't like it | Customer preference, reason: no reason |
I almost gave up—the workload exceeded manual sorting. But once I pushed through, model training became smooth. Anyone who's been there knows: dirty data makes AI useless.
Model Selection: Don't Be Fooled by Big Models
Step two: model selection. At first, I jumped at using a large language model (LLM) for full automation. A week later, it crashed—the model classified 'minor scratch' as 'severe damage,' wasting restockable items. According to Gartner's supply chain tech report[1], many companies overestimate AI capabilities.
I went pragmatic: rule engine + lightweight classification model. For 80% of common cases (size issues, no-reason returns), use predefined rules. For the remaining 20% fuzzy cases (stain severity, missing accessories), use a fine-tuned small model.
Rule Engine: Simple but Effective
I built a decision tree with a simple Python rule engine. For example:
- If return reason = 'size too small' and item is new → restock
- If return reason = 'stain' and stain area < 5% → clean then restock
- If return reason = 'missing accessory' → manual review
The engine ran for a month with 85% accuracy, 10x faster than manual.
Lightweight Model: Handling Fuzzy Cases
For rule-uncovered cases, I fine-tuned an open-source BERT model with only 500 records. Surprisingly, it achieved 92% accuracy distinguishing 'minor wear' from 'severe wear.' Comparison:
| Method | Accuracy | Speed (per item) | Cost |
|---|---|---|---|
| Pure manual | 95% | 3 min | High |
| Rule engine | 85% | 10 sec | Very low |
| Rules + model | 92% | 15 sec | Low |
Final solution: rule engine handles 80% simple returns, model handles 20% complex returns, humans only do final review. This balances accuracy and cost.
Employee Resistance: Harder Than Tech
On launch day, Old Zhang quit on the spot: 'Can a computer judge better than my ten years of experience?' He refused to use the system and manually overrode results. I argued with him, but later realized he wasn't lazy—he feared being replaced.
I spent two weeks doing three things:
- Held all-hands training, using real cases to prove AI accuracy
- Set up a 'human-machine review' process: AI suggestions must be confirmed by team leads
- Used saved time to raise wages—originally 200 items/person/day, now 300, with piece-rate pay for extra
A month later, Old Zhang became the system's biggest advocate. He found AI saved him 80% of repetitive work, leaving only truly judgment-intensive cases.
Continuous Improvement: AI Needs Constant Feeding
After three months, accuracy dropped from 92% to 88%. Investigation revealed a shift in return categories—winter arrived, down jacket returns increased, and the model lacked training on down jacket features.
I built a continuous feedback loop:
- Weekly export of misclassifications, manually annotated, added to training set
- Monthly model fine-tuning
- Quarterly rule engine updates (e.g., new rule for down jacket 'feather leakage')
Per McKinsey's operations insights[2], continuous learning is key for AI deployment. Now the system has run stably for six months, reducing return processing time by 70% and customer complaints by 50%.
Summary
Looking back, building an AI Agent from scratch—the hardest part wasn't tech, but deciding 'what to let AI do and what to let humans do.' My takeaways:
- Don't be greedy: Solve one pain point first (like return sorting), then expand
- Data first: Spend 70% of time cleaning data; model training is the easy part[3]
- Human-AI collaboration: AI does 80% repetitive work, humans do 20% value judgment—most efficient
- Iterate constantly: AI isn't a one-time project; keep feeding it new data
If you're considering AI Agent, don't be intimidated by big companies' full automation. Start small, use rules + simple models, and you can get it running in three months. Trust me, when you watch the system handle a day's returns while you just sip tea and review, the feeling is better than a Singles' Day blowout.
References
- Gartner Supply Chain Technology Report — Referenced for trend of overestimating AI capabilities
- McKinsey Operations Insights: Continuous Learning in AI — Referenced for importance of continuous learning in AI deployment
- Fortune Business Insights WMS Market Report — Referenced for data preparation time proportion in AI projects