Or “How I found out about – Tenant Attribution – and what the heck it is…”
I’ve been working with a client of mine to deploy a fairly decently sized Office 365 migration track, and one of those workloads that we are migrating is (you guessed it!) their Microsoft Exchange environment. And like all projects, it has its ups, its downs, and its roadblocks. I’ve been doing this for a while now, so I’ve seen quite a few of them… But this one I never encountered before.
The Situation
Our lovely customer decided that they wanted to host their hybrid on existing Exchange servers (which is fine, these servers weren’t actively hosting mailboxes anyways), want all our their inbound and outbound mail to route through their on-premises smart hosts, and apply whitelisting to the Office 365 URLS, the mail gateways have certificates for TLS (but do not sit in between EXO and the hybrid servers). So far, so good. No weird requests for now…
All this leaves their environment to look a bit like this:

Now pictures might speak a thousand words, but let me sum this up for you:
- The Exchange Online Tenant
- The Exchange Online Protection Tenant linked to the Exchange Online Tenant. Yes, even if you don’t use EOP, you still get EOP.
- An External mail system that receives (or sends) email to the customers environment…
- The edge/DMZ network mail gateways
- The hybrid servers
- The main Exchange infrastructure
The Red lines show outgoing SMTP traffic (coincidentally, this is also the path for incoming email)
The Blue(ish) line represent mail flow between the main infrastructure and the hybrid servers.
Centralized mail transport
Centralized mail transport is this nifty little option hidden away in the Hybrid Configuration Wizard that allows you to specify that “All in- and outbound email should traverse the on-premises mail systems“. This is usually very handy for customers that have some sort of compliance reasons to have mail flow that way, and it doesn’t require a lot of special configuration… Now this customer had been really good and set their SPF records properly. All email flow worked perfectly fine until the day we ran the Hybrid Configuration Wizard.
The issue at hand
So just now I said everything was fine until we ran the HCW. Remember? Well after we ran the HCW, we started getting reports of outbound email being rejected due to SPF failures. This was odd, since nothing had changed on the way that outbound email was sent. We were a bit dumbstruck. Yet it gets better… This was not happening to all outbound email. Oh-No… That would have been too easy! It was happening to seemingly random domains!
At that point I did what any one of us in this situation would do. I picked myself up from the floor, and asked for logging. Specifically email headers for failed mails, and attempted to recreate the problem and analyze the heck out of it!
Narrowing it down
After some extensive testing (sending mails to GMAIL can be considered extensive, no?) we determined that there was a pattern to these SPF failures: They all happened to recipients that were hosted on Exchange Online. The really curious piece is that this was not happening to all recipients hosted in Exchange Online. Only a subset of them.
When analyzing the headers of those emails that hard failed on SPF, we noticed a similarity: They all had the tenant of the customer in the header. Mails that got successfully delivered did not have the customers tenant in the header.
Now this is particularly odd?! Why the heck is the customers tenant being placed in the header, when centralized mail transport is enabled?
Expected behavior
The routing behavior we expected to see (and were seeing on most Exchange Online hosted recipients!) is illustrated below:

- An email gets send from our on-premises environment (or comes to our on-premises via the hybrid servers…). Since we’re using mail gateways, Exchange routes this message to the mail gateways living in the DMZ
- The mail gateways do their DNS lookup, see that the mail should be routed to EOP, and establish a connection to deliver the mail (TLS secured, since this is enabled on the mail gateways!)
- EOP receives the message, goes throught its rule list and delivers the message to Exchange Online
- Exchange Online delivers the message to the intended recipient…

-
An email gets send from our on-premises environment (or comes to our on-premises via the hybrid servers…). Since we’re using mail gateways, Exchange routes this message to the mail gateways living in the DMZ
- The mail gateways do their DNS lookup, see that the mail should be routed to EOP, and establish a connection to deliver the mail (TLS secured, since this is enabled on the mail gateways!)
- EOP receives the message, goes throught its rule list and for some reason of black magic decides this email should be sent to the customers EOP tenant????
- The customers EOP tenant decides this email is not for him (doesn’t match any of its inbound connectors) so routes the mail to the appropriote tenant.
What the heck?!
Still no explanation as to why this is randomly happening to some tenants in Exchange Online, but not all of them. At least we’re tightening the noose, right?
- EOP tenants are hosted in different “forests”
- The first (or one of the first) rules to be processed is wether or not this is an email to on-premises (determined by the name on the TLS certificate)
- These rules exist on the forest level, not the EOP tenant level
- What we are experiencing is an “Undocumented Feature” called Tenant Attribution
- We’re using centralized mail transport
- We did not update our SPF record to include spf.protection.outlook.com (because we’re not using EOP to send emails…)
- Our mail gateways have a certificate with .contoso.com as the subject name.
- Our inbound connector from EXO to on-premises is listed as *.contoso.com
- EOP always sits before Exchange Online, even for the on-premises hybrid connector
- EOP does not apply fancy magic to mails destined to on-premises
- EOP determines a mail is for on-premises by the TLS cert subject name?
- EOP does not process further rules if the on-premises rule is matched
- Rules live on the forest level in EOP
- There are many forests…
What are my options?
- Do nothing: Maybe you don’t care about the recipients receiving those emails. Maybe you’re planning to move to EOP in the near future and you just can’t be bothered… Personally, I couldn’t do it, but kudos to you if you could just sit back and let these poor messages be slaughtered!
- Disable the inbound connector: Disabling it causes the rule to be no longer applied, and mail will flow correctly again. Unfortunately this will also break mail flow between Exchange Online and on-premises…
- Add include:spf.protection.outlook.com to your SPF record: Possibly the easiest solution you have. Adding this does not mean that someone in Office 365 would be able to spoof your domain. There are checks and balances built in to the service to avoid anyone who does not have the domain added to the tenant to send email as that domain. So it’s a low impact solution… But some security teams will balk at this, and why wouldn’t they? After all, this is a fix that should be unnecessary from a logical point of view
- Add hybrid.contoso.com as a subdomain to O365 and rerun the hybrid configuration wizard: Theoretically, this should resolve the issue as the inbound connector would no longer get stamped with *.contoso.com, but with hybrid.contoso.com. That way, the EOP rule would not trigger since the mail gateways do not have a certificate with hybrid.contoso.com. (If they do, the problem will not go away…). Unfortunately I have never tested this, so I don’t know the long term repercussions of this method.
- Wait for Microsoft to fix this: I don’t think they will, considering there’s a low impact solution. But you can always try holding your breath :)!
Conclusion
In the end we figured out what caused our problem, discovered an undocumented feature, and had some major fun doing so! At least I had… I live for these weird fringe cases that let me activate my little grey cells…
I’m very likely going to advise all future customers to add the “include” option to their SPF record, and now I can explain why this is needed!