cross-posted from: https://libretechni.ca/post/302171

The websites of trains, planes, buses, and ride shares have become bot-hostile and also tor-hostile. This forces us to make a manual labor-intensive effort of pointing and clicking through shitty proprietary GUIs. We cannot simply query for the cheapest trip over a span of time for specified parameters of our choice. We typically must also search one day per query.

Suppose I want to go to Paris, Lyon, Lille, or Marseilles, and I can leave any morning in the next 2 weeks. Finding the cheapest ticket requires 56 manual web queries (4 destinations × 14 days). And that’s for just one carrier. If I want to query both Flixbus and BlaBlaCar, we’re talking 112 queries. Then I have to keep notes - a shortlist of prospective tickets. Fuck me. Why do people tolerate this? (They probably just search less and take a suboptimal deal).

If we write web scraping software, the websites bogart their inventory with anti-bot protectionist mechanisms that would blacklist your IP address. Thereafter, we would not even be able to do manual searches. So of course a bot would have to run over Tor or a VPN. But those IPs are generally blocked outright anyway.

The solution: MitM software

We need some browser-independent middleware that collects the data and shares it. Ideally it would work like a special purpose socat command. It would have to do the TLS handshake with the travel site and offer a local unencrypted port for the GUI browser to connect to. That would be a generic tool comparable to Wireshark (or perhaps #Wireshark can even serve this purpose?) Then a site-specific program could monitor the traffic, parse it, and populate a local SQLite DB. Another tool could sync the local DB with a centralised cloud DB. A fourth tool could provide a UI to the DB that gives us the queries we need.

A browser extension that monitors and shares would be an alternative solution – but not as good. It would impose a particular browser. And it would be impossible to make the connection to the central DB over Tor while making the browser connection over a different network.

Fares often change daily, so the DB would of course timestamp fares. Perhaps an AI mechanism could approximate the price based on past pricing trends for a particular route. A Flixbus fare will start at 10 but climb to 40 on the day of travel. Stale price quotes would obviously be inexact but when the DB shows an interesting price and you search it manually, the DBs would be updated. The route and schedule info would of course be quite useful (and unlikely stale).

The end result would be an Amadeus DB of sorts, but with the inclusion of environmentally sound ground transport. It could give a direct comparison and perhaps even cause air travelers to switch to ground travel. It could even give us a Matrix ITA Software UI/query tool that’s more broad.