Last updated June 2026
One AI tool invented a credit card bonus that doesn’t exist — complete with a specific date, a specific point total, and enough confidence to convince anyone it was real. That single result tells you more about how to use these tools than any feature comparison. We ran 20 identical travel queries through ChatGPT, Perplexity, and Gemini across five categories. Perplexity won. Here’s the full breakdown.
- Perplexity scored 40/50 across five dimensions — the only tool that consistently flagged its own uncertainty instead of papering over gaps with false confidence.
- Gemini produced the most detailed, specific itinerary and logistics answers — and the most dangerous hallucination: a fabricated credit card bonus with an invented refresh date.
- ChatGPT was the most balanced across all categories, but gave genuinely wrong points-and-miles advice on the Maldives question — the kind of error that costs real money.
- For anything time-sensitive — visa rules, card bonuses, insurance, points valuations — use Perplexity. For itinerary depth, use Gemini and verify the specifics.
- No single tool does everything well. Operators use a layered stack.
Get $6,640 in travel gifts — just for saying “maybe”
Try Journo Insider today and unlock The Syndicate 7-week travel course ($899), the Insiders Exclusive Library ($1,337), the Supercharged Travel Fund Challenge ($3,600), and more — free for 14 days. Keep the gifts even if you cancel.
Claim your free gifts → Keep everything even if you cancel.Jump to a section
How We Tested Destination Selection Itinerary Building Budget Optimization Points & Miles Logistics The 3 Failures Final Scores Verdicts FAQHow Did We Run the Test?
Twenty identical travel queries. Three AI tools. One 48-hour window. Each query was entered in a fresh session — no prior context, no conversation history — so each tool was working from a clean slate every time.
The 20 queries span five categories: destination selection, itinerary building, budget optimization, points and miles, and logistics. Each category represents a different kind of travel decision — from “where should I go” to “how do I pay for it” to “what happens at the border.”
Results were scored using The 5-Dimension AI Tool Score: Accuracy, Specificity, Current-Year Data, Actionability, and Hallucination Rate. Each dimension is scored out of 10. Hallucination Rate is inverted — a score of 10 means it never stated something confidently that turned out to be wrong.
The 5-Dimension AI Tool Score — Accuracy, Specificity, Current-Year Data, Actionability, Hallucination Rate (10 = never hallucinated). Each dimension scored out of 10. Maximum total: 50. This framework is also applied in the full Journo AI travel tools guide.
One definition before we get into results: AI confidence and AI accuracy are not the same thing. Every tool we tested sounds certain. The difference between them is what happens when that certainty is tested against reality. That gap is what this test was designed to expose.
Category 1: Destination Selection — Who Picks the Right Place?
Four queries. Two tools tied for one, and Gemini won two outright. The category revealed a meaningful difference in how these tools handle budget constraints and traveler-specific nuance.
| Query | Gemini | ChatGPT | Perplexity | Winner |
|---|---|---|---|---|
| Solo female traveler, SEA, $3K, March | Named specific hotels and airlines, flagged smoke season | 3 tiered options with specific hotels, flagged Bali wet season | 3 solid picks with clear reasoning, no specific hotels | Gemini |
| Warm in January, not touristy | Koh Kood and Oman, named specific resorts and transfer routes | Oman as top pick, Sri Lanka and Mozambique as runners-up | Madeira as top pick — most original and unexpected answer | Perplexity |
| Bali vs Thailand vs Vietnam honeymoon | Named luxury stays for all 3, gave clear recommendation | Built comparison table with scores, declared Thailand winner | Built comparison table with budgets, declared Thailand winner | Tie: ChatGPT / Perplexity |
| Family of 4, kids 6 and 9, wildlife and adventure | Costa Rica, named specific guide companies, exact resorts, rental car company | Costa Rica with 3-region route, named hotels and budget range | Costa Rica, explained why ages 6 and 9 specifically work well there | Gemini |
What the destination category revealed
Gemini’s strength is specificity at the ground level — hotel names, transfer logistics, seasonal flags that most travelers would miss. Perplexity’s Madeira pick for the “warm in January, not touristy” query was genuinely the most original answer of the entire test. Most travelers would never surface it on their own. That one result showed what a well-calibrated tool looks like: it doesn’t just confirm what you expect, it shows you something you hadn’t considered.
ChatGPT performed reliably across all four queries without producing any notable failures here. Its structured tables on the honeymoon comparison were clean and easy to act on.
Category 2: Itinerary Building — Who Plans the Better Trip?
This is where Gemini pulled ahead most convincingly. Three of four queries went to Gemini. The fourth went to Perplexity — and for a practical reason: Perplexity gave the most realistic answer on the East Africa safari budget question, flagging that $8,000 all-in from New York realistically means Kenya only, not a multi-country circuit.
| Query | Gemini | ChatGPT | Perplexity | Winner |
|---|---|---|---|---|
| 10-day Japan, April, off-beaten-path | Kanazawa, specific train times and costs, warned about booking 6+ months ahead | Kanazawa as off-path stop, day-by-day breakdown, included budget range | Kanazawa, clean day structure, noted cherry blossom pricing spike | Gemini |
| 3-week Patagonia road trip, Santiago loop | Fly-and-drive loop, detailed week-by-week plan, named specific lodges | Named route and rental companies, medium-high confidence | Cleanest logical route, warned about cross-border vehicle logistics | Perplexity |
| 5-day Rome, avoid crowds, hit highlights | Named exact entry times, flagged Doria Pamphilj as Vatican alternative, e-bike tip | Solid day-by-day, included The Hoxton as hotel pick | Good day structure, less specific on timing tricks | Gemini |
| East Africa safari, $8K total from NYC | Named specific camps and operators, gave per-person cost breakdown | Named operators, gave detailed budget breakdown per line item | Most realistic — flagged that $8K realistically means Kenya only | Perplexity |
What the itinerary category revealed
Gemini’s Rome answer was the single most impressive result across all 20 queries. It didn’t just name tourist sites and flag crowded areas — it gave specific entry times, named a lesser-known gallery as an alternative to the Vatican, and included a practical e-bike suggestion that most travel writers would consider insider knowledge. That level of on-the-ground specificity is where Gemini separates itself.
Perplexity’s Patagonia cross-border vehicle warning matters more than it sounds. Rental cars in Chile typically can’t cross into Argentina without advance authorization — and many travelers only discover this at the border. Flagging it unprompted is the kind of reliability that has real consequences.
Category 3: Budget Optimization — Who Saves You the Most?
Gemini dominated this category. Three of four wins, with one going to Perplexity for the Southeast Asia budget airlines question. The standout result was Gemini’s Portugal cost comparison — it included cleaning fees, service fees, and true hidden costs in a way that no other tool approached.
| Query | Gemini | ChatGPT | Perplexity | Winner |
|---|---|---|---|---|
| Cheapest month for Greece, daily budget | Named October and April as the sweet spot, detailed euro breakdown | Named two best-value months, gave clear daily breakdown | Said March is often best mix — slightly less specific | Gemini |
| 2-week Europe under $2,500 from US East Coast | Budapest, Prague, Bratislava route — more budget-realistic for the constraint | Lisbon, Porto, Valencia, Seville route with cost breakdown | Gave a flexible formula approach, didn’t commit to a route | Gemini |
| Hotels vs Airbnb vs hostels in Portugal, 10 days | Most detailed — included cleaning fees, service fees, true hidden costs | Clean comparison table with 10-day totals | Noted Airbnb fee trap clearly, recommended by traveler type | Gemini |
| Budget airlines in SEA, hidden fees | Named 4 airlines, gave specific dollar amounts per hidden fee | Named 4 airlines, listed fee types clearly | Named 5 airlines, noted schedule change risk — most complete list | Perplexity |
What the budget category revealed
The $2,500 Europe answer is a useful window into how each tool handles hard constraints. ChatGPT’s Lisbon-to-Seville route is a compelling trip — and will blow a $2,500 budget in most scenarios. Gemini’s Eastern Europe route is less glamorous and significantly more realistic. Operators care about the answer that works, not the answer that sounds good. Gemini gave the right one here.
Category 4: Points and Miles — Where the Stakes Are Highest
This category produced the most consequential results of the test — in both directions. Gemini gave the best answer on hotel versus flight redemptions in the Maldives. It also fabricated a credit card bonus that doesn’t exist. ChatGPT gave genuinely wrong advice on the same Maldives question. Perplexity was the only tool that consistently flagged uncertainty and recommended verification.
| Query | Gemini | ChatGPT | Perplexity | Winner |
|---|---|---|---|---|
| Best credit card signup bonus for Japan trip | Claimed 100,000 UR points from a “June 15, 2026 card refresh” — FABRICATED | Named Chase Sapphire Preferred, added caveat to verify before applying | Named 4 card options, honest medium confidence, flagged time-sensitivity | Perplexity |
| Chase UR points needed for NYC–London business class | Gave partner-by-partner breakdown with cash surcharge amounts | Gave ranges per partner, safe planning number of 140K–180K round-trip | Gave 60K–100K one-way range, noted dynamic pricing caveat | Gemini |
| Points for flights or hotels in the Maldives? | Said hotels — correct, gave strong rationale with 5th Night Free logic | Said flights — contradicts well-established points strategy | Said flights first, but acknowledged hotel sweet spots — most nuanced | Gemini |
| Best Amex partners for Australia flights | Named Aeroplan, Avios and Qatar, ANA — gave specific point costs per route | Named Aeroplan, ANA, Singapore KrisFlyer, Cathay | Named Qantas, Virgin Australia, Singapore, Aeroplan — most Australia-relevant list | Perplexity |
What the points and miles category revealed
The Maldives question has a known correct answer in the points community: redeem on hotels, not flights. Cash flights to the Maldives regularly go on sale for $800–$1,100. Hotel rates at luxury resorts run $1,500–$2,500 per night — which means award nights are worth dramatically more. ChatGPT got this wrong. Gemini got it right. That gap represents real money for anyone who followed the wrong advice.
The fabricated card bonus is covered in detail in the Failures section below. It is the most important result from the entire test.
AI gives you the information. Operators know what to do with it.
The Syndicate inside Journo Insider teaches the full Travel Optimization Stack — which transfer currencies to hold, which partners to use, and how to turn points into business class. AI tools don’t teach strategy. This does.
Try Journo Insider free for 14 days → Free for 14 days. Keep your gifts even if you cancel.Category 5: Logistics — Who Gets the Details Right?
Perplexity won this category clearly. The logistics questions — visas, insurance, money handling, transport — are exactly the kind of time-sensitive, high-consequence queries where hallucinated specificity causes real harm. Perplexity handled them better than either alternative.
| Query | Gemini | ChatGPT | Perplexity | Winner |
|---|---|---|---|---|
| Vietnam visa for US citizens | Stated exact prices ($25/$50) with high confidence — specific but unverified | Correct, advised checking official site | Correct, added passport validity and blank page requirements | Perplexity |
| Travel insurance for South America backpacking | Named World Nomads and SafetyWing, gave specific price ranges | Named World Nomads, SafetyWing, Allianz with coverage targets | Named same 3, specifically flagged altitude trekking exclusions | Perplexity |
| Handle money in Europe, avoid FX fees | Named Schwab debit, explained dynamic currency conversion clearly | Named Schwab, Capital One, Chase — solid DCC warning | Named Schwab plus backup card strategy, clean explanation | Tie: Gemini / Perplexity |
| Tokyo to Kyoto — options, costs, which is best | Named exact train, cost in yen and USD, explained why flying loses | Correct Shinkansen info, noted cherry blossom reservation tip | Named Nozomi, gave yen and USD range, added Hikari as cheaper option | Gemini |
What the logistics category revealed
Gemini’s altitude trekking gap on the insurance question is a meaningful failure — not a hallucination, but an omission on a question where completeness has safety implications. Many travel insurance policies exclude altitude-related incidents above a specific threshold. Perplexity flagged this unprompted. Neither of the other tools did.
For the Tokyo to Kyoto question, Gemini produced the most useful practical answer of the category — specific train names, costs in both currencies, and a clear explanation of why flying looks cheaper on paper but almost never is in practice when you factor in airport transit time and check-in.
The 3 Failures That Tell You Everything
Results tables show you who won. Failure examples show you the risk. These three outputs represent what happens when AI tools go wrong — and how differently they fail.
Failure 1: Gemini Hallucinated a Credit Card Bonus (Hallucinated Specificity)
What Gemini said: It claimed that “as of a major refresh on June 15, 2026, the card is offering a limited-time welcome bonus of 100,000 Ultimate Rewards points after spending $5,000 in the first 3 months.”
Why it matters: Gemini didn’t just state a wrong number. It invented a specific event — a named card refresh with an exact date and a specific bonus amount — that cannot be verified anywhere. The answer sounds authoritative, current, and precise. Someone applying for that card expecting 100,000 points based on a fabricated announcement could be seriously misled. This is the most dangerous type of AI failure: not obviously wrong, just confidently invented.
Failure 2: ChatGPT Gave Wrong Points-and-Miles Logic (Factually Incorrect Recommendation)
What ChatGPT said: It recommended using points for flights to the Maldives.
Why it matters: This is the opposite of the correct answer, and it contradicts well-established points-and-miles strategy. Cash flights to the Maldives regularly go on sale for $800–$1,100. Luxury resort rates at properties like those in the Maldives run $1,500–$2,500 per night in cash — which makes award nights far more valuable. Both Gemini and Perplexity got this right. ChatGPT didn’t. For a traveler who followed this advice and burned points on flights instead of hotel nights, the real cost of that error could easily exceed $3,000.
Failure 3: Gemini Quoted Government Visa Fees as Settled Fact (Overconfident Specificity)
What Gemini said: “US citizens can apply for a 90-day visa available in both single-entry ($25) and multiple-entry ($50) variants.”
Why it matters: The 90-day e-visa information is directionally correct, but visa fees are subject to frequent government revision without notice. Neither ChatGPT nor Perplexity quoted specific prices — both correctly deferred to the official immigration portal. Quoting government fees as settled fact on a high-consequence logistics question — where being wrong can mean a denied boarding or a border problem — is a different category of error than getting a hotel recommendation wrong.
The 5-Dimension AI Tool Score: Final Results
| Dimension | Gemini | ChatGPT | Perplexity |
|---|---|---|---|
| Accuracy | 6/10 | 7/10 | 9/10 |
| Specificity | 9/10 | 7/10 | 7/10 |
| Current-Year Data | 5/10 | 7/10 | 8/10 |
| Actionability | 8/10 | 7/10 | 7/10 |
| Hallucination Rate | 5/10 | 7/10 | 9/10 |
| TOTAL | 33/50 | 35/50 | 40/50 ✓ Winner |
Final Verdict: Which Tool Should You Actually Use?
The more confidently Gemini sounds, the more carefully you should verify it.
Best for: Itinerary depth, granular logistics, and on-the-ground specificity. No other tool matched Gemini’s level of detail on routes, exact costs, timing tricks, and practical travel mechanics. Its Rome itinerary answer was the single best response across all 20 queries. Its critical weakness is overconfidence — it invented a specific card refresh date and bonus amount, and quoted government visa fees as settled fact. The more specific Gemini sounds, the more carefully you should verify the claim before acting on it.
Best for: Structured comparisons, balanced coverage, and clear presentation across multiple categories. It built clean comparison tables, gave reasonable caveats, and avoided the hallucination traps that hurt Gemini. Its meaningful weakness was the Maldives points-and-miles error — a factually incorrect recommendation that could cost a traveler real money. Use ChatGPT for destination research and itinerary structure. Be cautious on financial strategy questions.
Best for: Anything time-sensitive — visa rules, insurance options, card bonuses, points valuations, budget airline fees. Perplexity was the only tool that consistently flagged its own uncertainty rather than filling gaps with invented specificity. It sacrifices some depth and itinerary detail for reliability — but for travel planning, where a wrong answer can mean a denied boarding, a missed points window, or a budget that collapses at the border, that tradeoff is the right one.
How to use all three together: Use Perplexity first for any time-sensitive logistics — visa rules, card bonuses, insurance options, points valuations. Use Gemini for itinerary depth and on-the-ground specifics, and verify any specific claim it makes with an authoritative source. Use ChatGPT for structured comparisons between destinations, accommodation types, or travel approaches. Operators use a layered stack. So does AI research.
How Do Operators Approach AI Travel Research?
What none of the three tools in this test does — what no AI tool currently does — is tell you what to do with the information once you have it.
Knowing that Perplexity flagged its uncertainty on a card bonus is useful. Knowing which transfer partners are actually worth using for a Japan trip, how to position points before you need them, and which redemption patterns give 5× or 10× cash value — that is a different layer of knowledge entirely. A business class seat to Tokyo costs $4,200 in cash. Operators book it for 87,000 points and $150 in taxes. That is what the Travel Optimization System is built around.
For a deeper look at how these tools compare on a broader set of queries — and how to build a research workflow that uses them correctly — see the full Journo AI travel tools guide. And for the specific categories where no AI tool currently performs reliably, see The 9 Things AI Can’t Do For Your Trip (Yet).
How to Start Using AI Tools Correctly for Travel Research
Match the tool to the query type. Before running any travel query through an AI tool, identify whether it is time-sensitive. Visa rules, card bonuses, airline fees, and insurance terms change frequently. Route those queries to Perplexity first. Itinerary and destination research can go to Gemini or ChatGPT.
Verify any specific claim before you act on it. AI confidence is not AI accuracy. When a tool gives you a specific number — a visa fee, a card bonus amount, a points valuation — treat it as a starting point for verification, not a final answer. Government fee pages, bank offer pages, and airline award charts are the authoritative sources. They take 60 seconds to check.
Use AI for research, not strategy. The tools in this test can tell you what a business class seat to Tokyo costs. They cannot tell you how to book it for 87,000 points plus $150 in taxes instead of $4,200 in cash — not reliably, not consistently, and not with the systematic framework that makes the difference repeatable. AI surfaces the information. Strategy is what you do with it.
Your next move: Take the Journo Insider trial. The Syndicate course builds the complete Travel Optimization Stack — the framework that turns AI research into actual results: fewer points wasted, better redemptions, and business class trips that most travelers assume are out of reach.
Perplexity scored 40/50 in our 20-query test across five travel categories, winning on accuracy, current-year data, and hallucination rate. Gemini scored 33/50 — best for itinerary depth and specificity, but produced the most dangerous failure of the test: a fabricated credit card bonus with an invented date. ChatGPT scored 35/50, performing consistently across all categories but giving wrong advice on the Maldives points question. For time-sensitive queries, use Perplexity. For itinerary research, use Gemini — and verify everything specific it tells you.
Frequently Asked Questions
Perplexity scored highest at 40/50 in our 20-query test, primarily because it flags its own uncertainty on time-sensitive information instead of filling gaps with invented specificity. For overall reliability — especially on visa rules, card bonuses, and insurance — Perplexity is the most trustworthy. For itinerary depth, Gemini performs better, but requires more verification.
Use them cautiously. Both Gemini and ChatGPT made significant errors in the points and miles category — Gemini fabricated a card bonus, and ChatGPT gave the wrong recommendation on Maldives hotel versus flight redemptions. Perplexity was the most reliable, consistently flagging time-sensitive information and recommending verification. Always check card offers directly on the issuer’s site before applying.
It is Journo’s framework for evaluating AI travel tools across five dimensions: Accuracy, Specificity, Current-Year Data, Actionability, and Hallucination Rate. Each dimension is scored out of 10. Hallucination Rate is inverted — 10 means the tool never stated something confidently that turned out to be wrong. Maximum total is 50. It was applied to all three tools across 20 identical queries in this test.
ChatGPT performs well on structured destination comparisons and itinerary building, building clean tables and giving balanced coverage across most categories. Perplexity pulls live web data and cites sources directly, making it stronger on time-sensitive queries. In our test, ChatGPT scored 35/50 and Perplexity scored 40/50, with the gap driven primarily by hallucination rate and accuracy on current-year data.
Gemini is genuinely excellent for itinerary building and logistics detail — it produced the most specific, actionable answers across multiple categories and won three of the four itinerary queries. Its critical weakness is overconfidence: it invented a credit card bonus with a specific date and amount, and quoted government visa fees as settled fact. Gemini is most useful when you treat its specific claims as starting points for verification, not final answers.
Use Perplexity for any time-sensitive travel information: visa requirements, travel insurance options, current card signup bonuses, airline fee structures, and points valuations. These are categories where information changes frequently and where being wrong has real consequences. Perplexity’s habit of citing sources and flagging uncertainty makes it the safest choice for these query types.
Use AI tools as a starting point, not a final answer. In our test, Gemini quoted specific Vietnam visa fees with high confidence despite those fees being subject to government revision. Perplexity handled the same query correctly by deferring to the official immigration portal. For any visa or entry requirement, always verify against the official government immigration website or the destination country’s embassy page before booking.
AI tools accelerate research: destination comparisons, itinerary structures, logistics questions, and budget estimates all benefit from fast AI input. What they do not provide is a systematic framework for extracting maximum value from points, building the right transfer currency stack, or knowing which redemption patterns give 5× to 10× cash value. That layer of strategy requires a different kind of knowledge — which is what the Travel Optimization System is built around.
Try Journo Insider — keep the gifts no matter what
14 days free. Over $6,640 in travel resources including The Syndicate course, the Exclusives Library, and the Supercharged Travel Fund Challenge. Cancel and keep everything — no questions asked.
Say “maybe” and claim your gifts → Free for 14 days. Cancel anytime — gifts are yours to keep.