We Gave ChatGPT 50,000 Tickets – Here’s What Happened

AI is transforming how Managed Service Providers (MSPs) manage IT service tickets, but how well does it actually perform when it comes to categorization? To put it to the test, we ran an experiment: we fed 50,000 real-world tickets into ChatGPT to see how accurately it could classify them. 

What we found was both fascinating and frustrating. 

Why We Did This 

AI is already playing a role in IT service desks—summarizing ticket details, assessing sentiment, and streamlining workflows—with varied results. But one of the most critical (and challenging) tasks when it comes to the efficiency of your help desk is ticket classification. Getting this right is key to: 

  • Faster, more efficient ticket routing 
  • Automating workflows based on ticket type 
  • Improving reporting and overall service desk performance 

We’ve been using a machine learning model for ticket classification since 2020, but we wanted to see if ChatGPT could do it better. 

The Experiment: 50,000 Tickets vs. ChatGPT 

We processed 50,000 service desk tickets—roughly 20 million words—through ChatGPT and asked it to classify them. The goal? To summarize each ticket with a concise category like “User Onboarding Request.” 

Here’s what happened. 

Problem #1: Too Many Categories 

To start, we let ChatGPT generate its own categories. The result? A staggering number of unique categories for just 50,000 tickets. That’s not useful. To fix this, we gave ChatGPT a predefined list of categories. That helped, but errors still crept in. 

Problem #2: AI Hallucinations 

Even when limited to a fixed list, ChatGPT still invented new categories. This is a well-known issue with generative AI—it “fills in the blanks” even when it shouldn’t. 

Problem #3: Multi-Request Tickets 

A single ticket often contained multiple requests: 

“Mandy starts Monday. Can you onboard her? Also, Bob left last week—please disable his account. Oh, and can you reset my printer?” 

ChatGPT struggled with these. Forced to pick just one classification, it often ignored other important details. 

Problem #4: Handling Structured Text 

IT service desks deal with both human-written tickets and structured, system-generated alerts. 

ChatGPT performed well with human-written requests but misclassified structured emails, highlighting its limitations when dealing with templated or machine-generated content. 

Key Takeaways 

What did we learn? AI isn’t a plug-and-play solution. To get useful results, you need structured prompts, well-defined categories, and continuous fine-tuning. 

  1. AI Has Context Limitations: Current AI models can only process a limited amount of text at once. This makes training them on large ticket datasets challenging. 

  1. Confidence Doesn’t Equal Accuracy: ChatGPT doesn’t provide confidence scores for its classifications. This makes it risky to automate ticket routing without human oversight. 

  1. AI Works Best in a Hybrid Model: Our own machine learning model outperformed ChatGPT in accuracy, but it was limited to a fixed set of categories. A blended approach—using machine learning for known categories and AI for unknown ones—could be the most effective path forward. 

The Takeaway for MSPs 

If you’re looking to integrate AI into your service desk, here’s what to keep in mind: 

  • Be strategic: AI isn’t a magic fix. Consider how it fits into your existing workflows. 

  • Test before you trust: AI can (and will) make mistakes. Have a plan for catching them. 

  • Stay curious: AI is evolving rapidly. What doesn’t work today might work perfectly tomorrow. 

Want more insights on how AI is shaping IT service management? Stay tuned. And if you’ve experimented with AI for ticket classification, we’d love to hear about your experience—share your thoughts in the comments! 

More to explore

Pia Pal Highlight. Joe Spillerjpg
Deep dives

Pia Pal Highlight: Meet Joe Spiller

Welcome to Pia Pal, our new series of blogs highlighting the incredible members of our Pia team. In this series, we’ll introduce you to the