Building AI Agent from Scratch: A Warehouse Veteran's Journey
Last year, a demo video of an AI Agent stunned me: it automatically handled returns, predicted hot items, and optimized picking routes. I thought, 'This is exactly the stuff I worry about every day in the warehouse.' So this old warehouse guy who doesn't even know Python decided to build an AI Agent system from scratch. Today I'll share the pitfalls and real lessons learned.

One sultry autumn afternoon last year, I was staring at the return area in frustration—over twenty returns piled up, each needing manual entry into the system, inspection for damage, and re-shelving. I hadn't gotten anything done all afternoon. Suddenly, my phone buzzed with a push notification—a demo video of an AI Agent: type "process returns" into a chat box, and the system automatically pulled up the order, generated a quality inspection form, and scheduled re-shelving. The whole process took less than five minutes.
I was stunned. This was exactly the stuff I worry about every day in the warehouse: picking path optimization, inventory alerts, order exception handling. If an AI Agent could handle these for me, why should I work like a dog every day?
But here's the thing: I'm just Old Wang, who's been in the warehouse business for ten years, not a programmer. I don't even know Python. How could I build an AI Agent from scratch?
TL;DR Last year, a demo video of an AI Agent stunned me. So this old warehouse guy who doesn't even know Python decided to build an AI Agent system from scratch with his team. Today I'll share the pitfalls we stepped in, the real lessons we learned, and practical tips that can actually be implemented.
Step One: Don't Be Intimidated by Tech Jargon—First Figure Out What You Want
Honestly, at first I couldn't even explain what an AI Agent was. I searched online and found terms like "LLM", "RAG", "tool calling"—my head was spinning.
Then I realized: To me, an AI Agent is just a little assistant that understands what I say and helps me get work done. I don't need it to write poetry or draw pictures; I need it to check inventory, create documents, and send alerts.
So the first step wasn't learning technology—it was listing requirements. I gathered the warehouse supervisor, pickers, and customer service reps and asked them: what repetitive tasks do you hate the most and take the most time?
| Role | Most Annoying Task | Frequency |
|---|---|---|
| Picker | Finding locations (especially for new items) | Dozens of times daily |
| Customer Service | Checking order status | Hundreds of times daily |
| Warehouse Supervisor | Inventory reconciliation | Weekly |
| Quality Inspector | Return/Exchange entry | Dozens of times daily |
We prioritized these requirements: Check order status > Find location > Handle returns > Assist with inventory.
Then I took this list to the development team. They laughed and said, "Old Wang, this isn't an AI Agent—it's just a 'natural language interface.'" I said, "Call it whatever you want, as long as it works!"
Choosing the Tech Stack: From "I Want Everything" to "Good Enough"
After defining requirements, I started researching tech options. There were so many solutions—from LangChain to AutoGPT, from OpenAI API to local models.
My first mistake was wanting "everything"—low cost, high accuracy, and local deployment for data security. After two weeks of tinkering, I hadn't even built a working demo.
Then an AI-savvy friend advised me: "First, use the simplest way to get one scenario working—even if it's running a 7B model locally with ollama, as long as it can call APIs."
So we chose a compromise:
- Model: Start with cloud-based GPT-4o API (fast, good results), then consider local deployment later
- Framework: LangChain (active community, good documentation)
- Tools: Connect to existing WMS APIs (specifically the flash warehouse APIs I discussed in my article "Flash Warehouse MCP Server Officially Launched" last year[1])
The First Agent: Order Query Assistant
We spent three days building a simple Agent with LangChain. Its workflow was:
- User inputs "Check status of order ORD-2024-0001"
- Agent calls the WMS order query API
- Returns results and formats output
The first time it worked, I shouted at the screen: "Check all orders shipped to Shanghai yesterday." The system paused for two seconds, then spit out a table. At that moment, I felt I was seeing the future.
Step Two: From Single Point to Process—Agent Starts Really "Working"
After the order query assistant succeeded, our morale soared. But I soon realized that single-point functions are just appetizers—the real value lies in chaining multiple steps into an automated workflow.
Return Processing Agent: From 5 Steps to 1
Return processing was our warehouse's biggest headache. The old process was:
- Customer service receives return notification, looks up order in system
- Prints return form, hands to quality inspector
- Inspector checks items, fills in inspection results
- Warehouse supervisor decides whether to restock or discard
- Operator updates inventory in system
Each step required manual work, averaging 20 minutes, and errors were common—like the inspector entering the wrong SKU, causing inventory discrepancies.
We designed a return processing Agent that turned the process into:
- Customer service enters "Process return RET-2024-001"
- Agent automatically looks up order, generates inspection task
- Inspector scans item, fills in results; Agent automatically updates inventory and generates restock or discard order
- Entire process drops from 20 minutes to 3 minutes
Comparison:
| Metric | Manual Processing | AI Agent Processing | Improvement |
|---|---|---|---|
| Average processing time | 20 minutes | 3 minutes | 85% |
| Error rate | 12% | 2% | 83% |
| Customer service interventions | Every time | Only for exceptions | 80% reduction |
When these numbers came out, even the warehouse supervisor—who was most opposed to AI—was convinced. He said, "Old Wang, if this thing had come earlier, we wouldn't have had that inventory discrepancy from last year's returns."
Picking Path Optimization: The "Duet" of Agent and WMS
Picking path optimization has always been a tough nut to crack. Traditional WMS assigns picking locations by fixed rules (like FIFO), but real-world scenarios are more complex—crowded shelves, varying item sizes, urgent order insertions.
We had the Agent analyze order data in real-time, combined with warehouse layout, to dynamically generate optimal paths. Specifically:
- Agent calls WMS real-time inventory API to get item locations
- Considers current picker's position and workload
- Outputs a sequenced picking list
The results were clear: pickers' average walking distance decreased by 30%, and picking efficiency increased by 25%.
But there was a trap: the Agent's decisions depend on accurate inventory data. If the WMS inventory data is wrong, the Agent's path will be wrong. So data quality is the lifeline of AI Agents—something I've emphasized in previous articles[2].
Step Three: Teach the Agent to "Talk"—Evolution of Natural Language Interaction
The Agent could work, but the interaction was primitive—either fixed command templates or simple keyword matching. We wanted: the ability to command the Agent in natural language, like chatting with a colleague.
From Keywords to Intent Recognition
Initially, we made the Agent understand commands by writing a bunch of regex patterns. For example, if the user said "check order," the Agent matched keywords like "check" and "order." But problems arose:
- "Help me see if order 123 has arrived" → match failed
- "Did yesterday's shipment go out?" → match failed
- "Check inventory" → match succeeded, but user actually wanted to check orders
Later, we switched to LLM-based intent recognition. We fed the user input to a large language model, letting it determine what the user wanted to do, then call the corresponding tool.
For example:
- User: "Help me see if order 123 has arrived" → Model identifies as "query order status" → calls order query API
- User: "Did yesterday's shipment go out?" → Model identifies as "query order status" → calls order query API (with date filter)
Accuracy jumped from 60% to over 95%.
Multi-turn Dialogue: The Agent Starts "Understanding" You
More advanced was multi-turn dialogue. For example:
- User: "Check inventory"
- Agent: "Which item would you like to check?"
- User: "SKU-001"
- Agent: "Current inventory is 120 units, distributed on A1 shelf (80 units) and B2 shelf (40 units). Do you need to arrange replenishment?"
This sounds simple, but implementing it requires maintaining conversation context—the Agent must remember what was said in the previous turn and what the user's goal is. We used LangChain's ConversationBufferMemory, which worked okay.
But there were failures. Once a user said, "Put that red cup in the promotion area." The Agent couldn't figure out which SKU "red cup" was because the system didn't have a color attribute. Later we added a color field to product information, and the Agent could match accurately.
Step Four: Agent Evolution—From Reactive to Proactive Alerts
When the Agent could handle daily tasks proficiently, I started thinking: Can we make the Agent proactively detect problems instead of waiting for me to ask?
Anomaly Detection Agent: Detecting Inventory Issues Earlier Than Humans
We built an anomaly detection Agent that scanned system data every 15 minutes, checking for:
- Inventory below safety stock
- Abnormal order backlogs
- Sudden spikes in return rates
- Abnormal picking durations
Once an anomaly was detected, the Agent would send a message via DingTalk bot to the responsible person, along with an analysis report.
A real case: A week before last year's Singles' Day, the anomaly detection Agent suddenly alerted: "SKU-005 inventory is below safety stock. Currently only 50 units remain, with daily sales of 30 units. Expected stockout in 2 days."
The warehouse supervisor was shocked—that SKU was a hot seller. If it went out of stock, they would lose at least 50,000 yuan in sales on Singles' Day itself. He immediately contacted procurement for urgent replenishment, and the goods arrived just before the big day.
After this incident, no one questioned the value of AI Agent anymore.
Predictive Agent: Turning Decisions from "After the Fact" to "Before the Fact"
Going further, we tried using the Agent for predictions. Based on historical sales data, seasonal factors, and promotions, it predicted sales for the coming week and generated replenishment suggestions.
| Prediction Method | Accuracy | Response Time | Suitable Scenarios |
|---|---|---|---|
| Human experience | 60-70% | 1-2 hours | Few SKUs |
| Simple statistical model | 75-85% | 10 minutes | Stable categories |
| AI Agent prediction | 85-95% | Real-time | All categories, especially volatile items |
This predictive Agent improved our inventory turnover rate by 20% and reduced slow-moving inventory by 15%.
Summary
Building an AI Agent system from scratch—honestly, it was harder than I imagined, but also more interesting. Over this past year, my deepest feeling is:
An AI Agent isn't here to replace you; it's here to amplify your capabilities. It handles repetitive tasks, freeing you up to focus on things that truly require judgment—like supplier negotiations, customer relationships, and team management.
A few heartfelt lessons from the trenches:
- Start with the smallest scenario: Get one simple Agent running first, then expand gradually. Don't try to do everything at once.
- Data is the foundation: Without clean, accurate data, no matter how powerful the Agent is, it's a castle in the air.
- Involve users: Let the warehouse supervisor, pickers, and customer service reps try it out. Their feedback is the direction for product iteration.
- Be pragmatic in tech selection: Don't chase the latest and greatest. Choose what's sufficient, stable, and easy to maintain.
- Embrace change: AI technology evolves fast. Keep a learning mindset, but don't be consumed by anxiety.
Now, in my warehouse, the AI Agent handles hundreds of queries daily, dozens of return processes, and monitors inventory health in real time. And I finally don't have to squat in the return area worrying anymore—I have more time to think about the future of the warehouse and to have dinner with my family.
If you're also considering introducing an AI Agent, don't be afraid. Start with a small scenario. Remember, every complex system starts with a single line of code, a single simple instruction.
References
- Flash Warehouse MCP Server Officially Launched: The First WMS in China Supporting Model Context Protocol — Referenced for Flash Warehouse MCP Server and WMS API integration
- Warehouse in a Mess? My Ten Years of Pitfall Experience Tells You What to Do — Referenced for importance of data quality in warehouse management