All posts

A Stupid Salesforce Story for Engineers:

(small) Prime Factorization as a Validation Rule

Published

Mar 18, 2025

Topic

Math & Programming

A Salesforce Story for Engineers: (small) Prime Factorization as a Validation Rule

This is how prime numbers saved me from writing 576 billion Salesforce reports. It started when my CTO was off at a conference on the other side of the country, and the CEO casually asked me to "just run a quick query." Little did we know, that innocent request would uncover an architectural flaw and spark a combinatorial nightmare.

My first real job was doing RevOps (being a glorified Salesforce admin) at a legal tech startup, Broughton Partners. It was a great place to wet your ears, as the leadership team was composed of excellent managers who had solid business fundamentals. Within a year, thanks to their guidance, I went from basically useless to pretty decent.

Broughton Partners' business model involved finding potential plaintiffs who had been injured by products from negligent companies, then matching those plaintiffs with law firms to litigate their cases (very proud of the work I did there, BP is/was able to give restitution to thousands of Americans in the form of billions of dollars). Ensuring that no plaintiff was sent to multiple law firms was critically important. With hundreds of thousands of clients and hundreds of law firms—each with their unique criteria—this was a complex sorting challenge.

You basically have to run a sorting algorithm to send cases to different opportunities from the most restricted to the least restrictive and find the optimal arrangement of cases.

Our entire system ran on Salesforce, and case delivery was structured around two objects, Client and Opportunity, connected by a relationship record. While this is standard in databases generally—and here comes the stupid part—Salesforce comes with a major limitation: Salesforce's query language, SOQL, requires you to specify the exact number of child records and the precise values you’re looking for to return any matching records; dynamic querying was nearly impossible.

The problem began when my boss asked me to audit our system. Some automations had inadvertently created invalid child records with problematic status combinations, like "active" paired with "active" or "sent" with "active." Initially, I was asked to run 10 reports, then another 15, then another 35. Eventually, I did the math for my boss and realized that combinatorially—with 15 statuses and up to 10 child records per client—I'd need to potentially generate a maximum of 576,650,390,625 reports to do 1 system audit. That meant Salesforce would have to perform approximately ~172,995,117,187,500,000 operations to look at 300k clients ( lmao not going to happen).

There were several million potential error combinations where the exact constellation of statuses on child records should not exist, and in Salesforce these child records are siloed from each other. I was told to work on a script that would query Salesforce for a depth of 4 child records (50,625 reports) indefinitely while we figure out a way to re-architect the system.

The Constraint

I didn't want to do that and I was granted the time to figure out a better solution, which was very cool of them since this was business critical; sending the same case to two different law firms probably would have been the end of us.

Since I couldn’t directly query the child records I needed to figure out a way to do the validation operation on the Client object, the issue is the parent is mostly ignorant of the values inside the fields of its children. It just had a list of pointers to those child records instead of real data. However, you could do exactly 1 roll up operation from the children to the parent if it was a field with a number. In the rollup, you could then sum that list of numbers such that it would output a single value for that field. The developers of Salesforce must have put that in there for inventory management or something mundane, but I now had to commit a war crime on this formula field and shove a lot of information through this pipe.

We needed a system to compress the state of the child record into a single number and then for that number to mean something after it was added with other numbers from other records. I started by just having the numbers 1,2,3,4 … 15 represent each status on each record and then ordering them in such a way so that after addition and then dividing by the number of records we could probabilistically assume if something was wrong. This wasn’t really going to work….

A Lesson From Cryptographic Accumulators

On day 3 I was basically just clicking random on different math articles until I came across “Cryptographic Accumulators,” I then realized that prime numbers had precisely the property I needed. In a world of only prime numbers, you can create unique combinations through multiplication. Specifically, a semiprime number—one formed by multiplying two prime numbers—is guaranteed to be unique to those two primes, and if you multiply another prime number the same property holds true. Uniqueness is super important for this problem since we need to ensure we are getting exactly the correct set of child statuses.

This meant that each combination of child record statuses could correspond uniquely to a specific product of prime numbers. No two different sets of statuses could produce the same product. For example, if one status was assigned the prime 2, and another status was assigned the prime 3, their combination (2 × 3 = 6) would never be confused with any other prime combinations. This property is fundamental to prime factorization:

Each integer greater than 1 has exactly one prime factorization.
Thus, the product of prime numbers represents a unique identifier for every distinct combination of statuses.
So if you had a list of child records with the statuses 2, 3, 5, 7, 11, 11, 17, 23 and 31 it is impossible to find a 43 or a 53 inside this product, this ensures the combination is uniquely identifiable.
Prime numbers also allow for multisets to exist in a way that normal composite numbers do not:
- In a (normal) set each distinct element can appear at most once. No duplicates allowed.
  - Example: the product of {6,6,6,6,6} could be factorized in many ambiguous ways e.g., {3,4,6,6,6,3} making it impossible to reconstruct the original set of factors.
- In a prime multiset, elements can be repeated, and the count (multiplicity) is tracked. {5,7,7} is always {5,7,7} when restricted to prime numbers.

By associating each status with a prime number we could derive the set of numbers that comprises the product. What this also means is we can establish error combinations that could work for an infinite number of child records.

Working backwards, If a prime product of the child records is divisible by the prime product error code, without remainder, then we knew of the presence of a set of statuses that were in error. So if you had a list of error code prime products then you could divide the parent record’s product by all of them and if we had any hits then we could add that parent record to an audit report. Just to be clear these error codes would work for an infinite number of child records.

Great, it is mathematically feasible to do this operation, however we still cannot pass these child record’s numbers to the parent record since the parent record can only add numbers together …

Evil Addition

Now we need to figure out a way to shove all of that information through a summation rollup without losing any data. Thank god Salesforce allows users to create custom functions with natural logs. We can actually compress the prime numbers into their natural logs on the child record and then add them all on the parent record, turning multiplication into addition.

ln(a) + ln(b) = ln(a × b)

This trick meant that Salesforce’s "sum-only" roll-up feature could do multiplication. Instead of multiplying primes—which Salesforce couldn't do—I stored the natural log of each prime number on each child record. The parent record then summed these logs, effectively encoding the entire status combination into a single numeric field.

Here’s a quick example to illustrate how beautifully this worked. Imagine a client had child records with statuses mapped to primes as follows:

"Active" (2) = LN(2) ≈ 0.693
"Sent" (3) = LN(3) ≈ 1.0986
"Closed" (5) = LN(5) ≈ 1.6094

On the parent record, the summed field would simply be:

3.401197 ≈ 0.693147 + 1.098612 + 1.609438

Exponentiating this sum using a formula on the parent record returns the product of all child records:

EXP(3.401197) ≈ 30

With this single numerical field, we effectively encoded all the child record statuses.

Divisibility Test: The presence of any status combination can easily be validated by checking divisibility. For example, if we never wanted duplicate "Active" statuses (prime 2) on the same client, a quick modulo check MOD(Product, 2*2) would instantly confirm this invalid state if the result was zero.

Any clients which returned an error would be put into a report and what would have required billions upon billions of queries or hundreds of thousands of separate reports or some insane logic puzzle became a simple numeric field on the parent record that encoded all needed validation information. Maintenance was minimal: adding or removing statuses just required updating a single prime-status mapping table.

It turns out we only had like 40 clients in an error state and none of those were catastrophic. However, this was an extremely robust system that performed for nearly 5 years without tweaking.

Moral of the story: if you are a programmer (or do any IT work) you should learn and use a bit of math, programming a work around to accomplish the same goal in Salesforce would have been an absolute nightmare. A surprising amount of problems you encounter are greatly simplified by figuring out the correct mathematical tools for the job rather than brute forcing a solution.

Any other use cases?

Not really! Databases have way better ways of doing validation and filtering of this type so this isn’t really applicable outside of Salesforce or some other system with arbitrary restrictions (but this is actually dumb simple to implement so at the cost of compute you can do a lot of checking quickly, please try this prime trick for quick development and nothing else!). The only other application I can think of is publicly exposing information via the prime product so a large number of people can validate the state of information specific to them.

*Frankly cryptocurrencies via Merkle trees and hashes straight up do this better. But this is a fun thought experiment.

Imagine you have a ledger that you want to expose to the world, but you don’t want everyone to know everyone else’s business. For each ‘block’ of work done you can have 1 massive prime product, let’s say 100 people have information in this block that they want to extract. If they know their unique prime number in that block they can derive their information purely from division.

You can represent metadata as smaller prime numbers and then you can then validate the status of information inside the block via division.

If at the end you have no remainders then you can assume that the state of your information on the block is valid and what you were expecting.

Extra: Math and Proof

I had to give a proof before implementing this at Broughton partners. I’ll just copy and paste parts of the original doc here. Since we went over the laymen’s explanation already grokking this shouldn’t be too terrible.

** I cannot be bothered to make my website render LaTeX so I’m just going to take pictures of the math:

Fundamental Theorem of Arithmetic

Theorem (Uniqueness of Prime Factorization). Every integer greater than 1 can be expressed as a product of prime numbers in exactly one way (up to the order of the factors):

where each pi is prime and ei is a positive integer.

Proof Sketch.

Existence: Every integer can be divided by prime numbers repeatedly until only 1
remains.
Uniqueness: Assume there are two distinct prime factorizations of n. By comparing prime factors and using the fact that any prime must divide at least one of the factors on the other side, it can be shown that both factorizations must contain exactly the same primes to the same powers.

Hence, the factorization is unique.

Logarithmic Encoding

To encode the product of primes p1 ×p2 ×. . .×pn in a system restricted to addition operations, we use the property:

Then, if each child record stores ln(pi), a parent record can sum these values:

Exponentiating the result:

Thus, the entire combination of child statuses is captured by a single roll-up field holding the sum of logs.

Divisibility Test

If we want to check whether a product of primes (representing some status combination) contains a particular “error product” E = pa × pb × · · · (itself a product of primes), we verify divisibility:

Equivalently in Salesforce:

Thanks Euclid, couldn’t do it without you.