Why MISP Tells You Microsoft Teams Is a VPN (And How We Fixed It)

I'm Jose, the solo founder building Reput.io. This is a build-in-public engineering post about a false positive a user hit this week, what it revealed about how community VPN blocklists fail, and how we fixed it without sacrificing detection signal. If you build threat intel pipelines, you've probably stepped on this exact landmine.

The bug report

A test lookup of 150.171.109.145 came back like this (trimmed):

{
  "indicator": "150.171.109.145",
  "provider": { "name": "Commercial VPN", "type": "vpn_service" },
  "risk_context": "VPN Service",
  "categories": ["Anonymity Network", "Cloud Provider", "VPN Service", "Proxy Service", ...],
  "verdict": "investigate",
  "reasons": [
    "IPv4 ranges used by known commercial VPN service providers.",
    "Known datacenter IP range commonly used for proxies and VPNs."
  ],
  "geo": { "asn": 8075, "asn_org": "Microsoft Corporation", "country": "South Africa" }
}

The geo block already tells you what's wrong. AS8075 is Microsoft. The /18 containing this IP is very likely Teams/Skype/Office 365 infrastructure running in Azure South Africa. Labelling it "Commercial VPN" is straightforwardly false.

If you ship this to a SOC analyst, one of two bad things happens. Either they waste ten minutes figuring out that "VPN from Cape Town" is actually their users' Teams calls, or — worse — they stop trusting your tool and start ignoring its verdicts.

Tracing the lie

Querying our database for the CIDR that matched:

SELECT value, reasons, categories, source_type
FROM iocs_cidr
WHERE '150.171.109.145' << value;

-- Result:
-- value:      150.171.64.0/18
-- reasons:    [b36aecba, f5ceb89d]
-- categories: [Anonymity Network, Cloud Provider, Datacenter,
--              Network Infrastructure, Privacy Service, Proxy Service, VPN Service]

The two reason IDs point to two upstream feeds we ingest:

MISP vpn-ipv4 warninglist — github.com/MISP/misp-warninglists/.../vpn-ipv4/list.json
X4BNet lists_vpn — datacenter IPv4 — github.com/X4BNet/lists_vpn/.../datacenter/ipv4.txt

Both are widely-used community lists. I pulled both live to verify, and both contain 150.171.64.0/18 verbatim:

$ curl -s https://raw.githubusercontent.com/MISP/misp-warninglists/main/lists/vpn-ipv4/list.json \
    | jq -r '.list[]' | grep '150.171'
150.171.64.0/18

So this is an upstream data-quality issue, not a bug in our parsers. The community feed authors presumably inferred "large datacenter block → probably used by VPN providers", which is sometimes true but happens to be false here — this block is Microsoft first-party infrastructure.

Dropping these feeds isn't the answer. They catch real commercial VPNs we need to flag. Patching them upstream isn't the answer either — they're curated on human timescales and the blast radius of waiting is real.

The fix: ASN as override, but asymmetric

The authoritative signal for "who owns this IP" is the ASN from the regional internet registries, not a GitHub list. We geolocate every IP via DB-IP ASN Lite at lookup time, so the ASN number is already sitting in the response. The fix is to let a trusted ASN override the categorical provider detection.

Here's the naive version of that idea and why it's wrong:

# NAIVE (don't do this):
HYPERSCALERS = {8075: "Microsoft", 16509: "AWS", 15169: "Google", 13335: "Cloudflare"}
if asn in HYPERSCALERS:
    return {"provider": HYPERSCALERS[asn], "verdict": "likely_benign"}

This gets rid of the "Commercial VPN" false positive. It also creates a far worse problem: a false negative on any real C2 traffic hosted inside AWS, Azure, GCP, or Cloudflare Workers. An attacker can rent an Azure VM in five minutes. Labelling every AS8075 IP as "likely benign" means the moment someone runs a phishing panel or a Cobalt Strike team server in Azure, your tool is actively telling the analyst to ignore it.

This is the asymmetric-trust problem at the core of whitelist intelligence: who owns the infrastructure is a stronger signal than a community list says it's a VPN, but it is not the same as the traffic is safe. The hosting brand tells you nothing about what the customer is doing with their VM.

The fix has to treat "Microsoft" differently from "Apple". Apple's AS714 and AS6185 do not sell general-purpose compute to third parties — traffic there is near-certainly first-party Apple service traffic, and likely_benign is defensible. Microsoft's AS8075 covers both Teams and every Azure customer — investigate is the correct verdict even when we're confident about the provider name.

Here's the shape of what we actually shipped (simplified from provider_detection.py in the repo):

ASN_PROVIDER_MAP = {
    # Customer-controlled cloud — provider name corrected, verdict stays "investigate"
    8075:  "azure",          # Microsoft (Teams AND Azure customer)
    16509: "aws",            # Amazon
    15169: "gcp",            # Google (Search AND GCP customer)
    13335: "cloudflare",     # Cloudflare (CDN AND Workers/Pages/Tunnels)
    14061: "digitalocean",

    # First-party brands (no general hosting on these ASNs) — likely_benign OK
    714:   "apple_services",
    6185:  "apple_services",
    32934: "meta_services",
}

def detect_provider(categories, source_names=None, reason_texts=None, asn=None):
    if asn is not None and asn in ASN_PROVIDER_MAP:
        return PROVIDER_PROFILES[ASN_PROVIDER_MAP[asn]]
    # ...fall through to category-based detection

And the azure profile deliberately keeps verdict: "investigate":

"azure": {
    "provider_name": "Microsoft Azure",
    "base_trust": 50,
    "verdict": "investigate",
    "fp_likelihood": "medium",
    "rationale": (
        "Microsoft Azure cloud infrastructure. Azure resources are customer-"
        "controlled and can host both legitimate enterprise services and "
        "malicious infrastructure. Investigate the subscription owner."
    ),
    "investigation_hint": "Azure VMs often have predictable DNS patterns — check reverse DNS.",
}

What the response looks like now

Same lookup, after the fix:

{
  "indicator": "150.171.109.145",
  "provider": {
    "name": "Microsoft Azure",
    "type": "cloud_provider",
    "services": ["Virtual Machines", "Azure Functions", "AKS", "Storage"]
  },
  "verdict": "investigate",
  "confidence_score": 30,
  "risk_description": "Microsoft Azure cloud infrastructure. Azure resources are customer-controlled and can host both legitimate enterprise services and malicious infrastructure. Investigate the subscription owner.",
  "recommendation": {
    "action": "investigate",
    "false_positive_likelihood": "medium",
    "investigation_hint": "Azure VMs often have predictable DNS patterns — check reverse DNS."
  },
  "categories": ["Anonymity Network", "Cloud Provider", "Datacenter", "Network Infrastructure", "Privacy Service", "Proxy Service", "VPN Service"],
  "reasons": [
    "IPv4 ranges used by known commercial VPN service providers.",
    "Known datacenter IP range commonly used for proxies and VPNs."
  ],
  "geo": { "asn": 8075, "asn_org": "Microsoft Corporation", ... }
}

Three things are deliberate about this:

The provider label is correct. Analyst no longer sees "Commercial VPN" on Teams traffic.
The verdict stays investigate. We do not claim this is safe. A real Azure VM running C2 lights up identically to Teams at the network layer, and a SOC tool should never pretend otherwise.
The reasons and categories from the upstream feeds are preserved. Nothing is hidden. If the analyst wants to know why their tool is unsure, they can see that MISP and X4BNet both flagged this block, and make their own judgement.

Cloudflare deserves its own note

You'll notice Cloudflare (AS13335) is in the customer-controlled bucket, not the first-party bucket. A lot of tools reflexively mark Cloudflare as likely_benign — it's a CDN giant, you can't block it without breaking the internet, so just trust it. This is wrong.

Cloudflare Workers, Pages, R2, and Zero Trust tunnels are arbitrary-code-execution platforms. Attackers proxy phishing kits through Cloudflare daily precisely to hide origin IPs and benefit from the reputation halo. "Cloudflare" tells you the proxy, not the site. The correct move is to surface that the IP is Cloudflare infrastructure (useful context), keep verdict: investigate, and tell the analyst to check the Host header or SNI for the actual destination.

We ship this as base_trust: 60 and an investigation hint that explicitly calls out the Workers/Pages risk. Less noise than calling it VPN. No blank check.

The product principle behind all of this

I wrote this up internally as the product ideology, because it comes up in every design decision:

Inform, don't decide. The goal is to reduce noise for the analyst, never to hide signal that could mask a real threat. When a provider is ambiguous, tell the analyst what's known, what's likely benign, what's plausibly malicious, and what specifically to check. Asymmetric trust: authoritative signals may correct misleading labels, but they do not upgrade customer-controlled infrastructure to "safe". Preserve raw feed evidence in the response so the analyst can overrule us when they have context we don't.

Every whitelist intelligence tool has to solve the same two objectives at once: reduce SOC noise and never cause a false negative. They pull in opposite directions. The honest answer is asymmetric trust rules that treat "I know who owns this" as different from "I know this is safe".

Try it yourself

The fix is live in production right now. You can reproduce the before/after from the post with:

curl -X POST https://reput.io/lookup \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"indicators": ["150.171.109.145"]}'

Free tier gives you 500 queries/day, which is plenty to kick the tires on your own SOC's real alerts. Grab a key here.

If you hit a false positive, or you disagree with any of this — especially the Cloudflare take — I want to hear about it. hello@reput.io.

Jose Martin is the solo founder of Reput.io. Background in backend engineering. Currently writing code, running the ingestor, answering support email, and questioning every design decision in public.