Cantor Expansion
Before introducing the Cantor Expansion, let me show you a problem.
Given a group of numbers, such as nums = [1, 2, 3]
, there will be 3! = 6
different permutations. And we sort them in lexicographical order, then each of these permutations will have an unique index. That is:
idx sequence
0 : [1, 2, 3]
1 : [1, 3, 2]
2 : [2, 1, 3]
3 : [2, 3, 1]
4 : [3, 1, 2]
5 : [3, 2, 1]
Now, we have 2 problems:
- Given
n
, and a sequencenums[0, ..., n - 1]
, and1 <= nums[i] <= n
, output the index ofnums
among all the permutations. - Given
n
and an indexidx
(which is in range of[0, n!)
), output the corresponding sequencenums
.
And this is what Cantor Expansion wants to solve.
Please note that each element in nums
is distinct here. We will consider the case that elements are not unique in the "Follow Up" chapter.
Cantor Expansion
You can read Baidu baike, not good article, but good examples.
- Given
n
, and a sequencenums[0, ..., n - 1]
, and1 <= nums[i] <= n
. - Let \(a_i\) denote the number of "Reverse-Order-Pairs" of
nums[i]
.- A "Reverse-Order-Pair" is a pair
<i, j>
that satisfiesi < j && nums[i] > nums[j]
.
- A "Reverse-Order-Pair" is a pair
Then there is a bijection ("双射" in Chinese) between indice and permutations.
That is to say, the index of nums
is:
It seems quite simple and naive 😄. Let us take a look why this formula work!
Permutation to Index
Description - Given n
, and a sequence nums[0, ..., n - 1]
, and 1 <= nums[i] <= n
, output the index of nums
.
Our algorithm:
- Init returned value
index = 0
. - For
nums[i]
, we find the number of "Reverse-Order-Pairs" of it, which is denoted byrevCnt
.- A "Reverse-Order-Pair" is a pair
<i, j>
that satisfiesi < j && nums[i] > nums[j]
. - This is similar to "LCOF 51. 数组中的逆序对".
- Let
index += f(n - i - 1) * revCnt
, wheref(x)
denote the valuex!
, andn - i - 1
denote the remained length.
- A "Reverse-Order-Pair" is a pair
Let me explain this algorithm by an exmaple.
input : [4, 3, 1, 2]
expected: 22
revCnt : [3, 2, 0, 0]
offset : [18,4, 0, 0]
---------
When i = 0, nums[i] = 4, revCnt = 3
- Before "4XXX", there should be permutations in patterns like "3XXX", "2XXX", "1XXX".
- The remained length is `n - i - 1 = 3` (i.e. the length of "XXX").
- The total number of permutations in those patterns is 3 * 3! = 18.
---------
When i = 1, nums[i] = 3, revCnt = 2
- Before "43XX", there should be permutations in patterns like "41XX", "42XX".
- The remained length is 2.
- The total number of permutations in those patterns is 2 * 2! = 4.
---------
When i = 2, nums[i] = 1, revCnt = 0
- Since there is no `nums[j]` less than `1`, hence there is no permutation before "431X"
---------
When i = 3, nums[i] = 2, revCnt = 0
- Same as `i = 2`
The revCnt
means that "there are revCnt
numbers are smaller than me on my right side".
Here is my code, including tests.
class PermIndex {
private:
unordered_map<int, vector<int>> bench;
unordered_map<int, vector<int>> benchmark(vector<int> vec) {
unordered_map<int, vector<int>> res;
sort(vec.begin(), vec.end());
int idx = 0;
do {
res[idx++] = vec;
} while (next_permutation(vec.begin(), vec.end()));
return res;
}
public:
PermIndex(vector<int> &nums) { bench = benchmark(nums); }
int permToIndex(vector<int> &nums) {
int n = nums.size();
// fact[k] = k!
vector<int> fact(n, 1);
for (int i = 2; i < n; ++i) fact[i] = fact[i - 1] * i;
int index = 0;
for (int i = 0; i < n; ++i) {
int revCnt = 0;
for (int j = i + 1; j < n; ++j)
revCnt += (nums[j] < nums[i]);
index += fact[n - i - 1] * revCnt;
}
assert(nums == bench[index]);
return index;
}
};
int main() {
vector<int> vec = {1, 2, 3, 4, 5, 6, 7};
PermIndex permIndex(vec);
do {
cout << permIndex.permToIndex(vec) << endl;
} while (next_permutation(vec.begin(), vec.end()));
}
Obviously, the time complexity is \(O(n^2)\).
Index to Permutation
Description - Given n
and an index idx
(which is in range of [0, n!)
), output the corresponding sequence nums
.
The problem here is same as "60. Permutation Sequence".
class Solution {
public:
string getPermutation(int n, int k) {
string res;
vector<int> fact(n + 1, 1);
vector<char> nums(n, '0');
for (int i = 1; i <= n; ++i) {
fact[i] = fact[i - 1] * i;
nums[i - 1] += i;
}
k -= 1; // our index starts with 0
for (int i = 0; i < n; ++i) {
int cnt = k / fact[n - i - 1];
res.push_back(nums[cnt]);
nums.erase(nums.begin() + cnt);
k -= cnt * fact[n - i - 1]; // or k %= fact[n - i - 1]
}
return res;
}
};
This algorithm is actually an inverse-process of above "Permutation To Index".
- Remind that we let
index += fact[n - i - 1] * revCnt
above, whereindex
is denoted byk
here. - Here, we let
cnt = k / fact[n - i - 1]
, wherecnt
means that, "there arecnt
numbers smaller than me". Hence we putnums[cnt]
into the result string, sincenums[0, ... cnt - 1]
is smaller thannums[cnt]
. - Let
k = k % fact[n - i - 1]
ork -= cnt * fact[n - i - 1]
. - Continue the above steps until
k
become zero (it is inevitable sincefact[1] = fact[0] = 1
), or untilnums
is empty.
Same as above, let me explain this algorithm by an example.
input : n = 4, k = 22
expected : [4, 3, 1, 2]
nums : [1, 2, 3, 4] (index starts with 0)
factorial: [1, 1, 2, 6, 24]
--------
When i = 1, k = 22:
- cnt = 22 / fact[3] = 3, among the `nums`, there are 3 numbers smaller than me on the right side
- hence current number should be 4, and the result should be '4XXX'
- k = 22 % 6 = 4
When i = 2, k = 4:
- cnt = 4 / fact[2] = 2, among the `nums`, there are 2 numbers smaller than me on the right side
- hence the result should be '43XX'
- k = 4 % 2 = 0
When i = 3, k = 0:
- cnt = 0, among the `nums`, there is no number smaller than me
- hence the result should be '431X'
When i = 4, k = 0:
- cnt = 0, the result is '4312'
Follow Up
Given a certain permutation of n literals (e.g. "BCDA" is a permutation of literals "ABCD"). Find out its index of this permutation in the full permutation of these n (n <= 1000)
literals when sorted in alphabetical order.
Please note that:
- The letters in
str
are not unique, there maybe exist duplicate letters, such ass = "AABBSDFS"
. - The numbers of permutations is less than \(n!\) .
Actually, the total number of permutations for the given str
is:
where \(n\) is the length of str
, \(k\) is the number of unique characters of str
, and \(p_i\) is the number of occurrences of set(str)[i]
.
Since there are duplicate characters in the input string
, the key point of this problem is to remove the duplicate permutations.
Still, let me show you an example.
input : "A A B C B C C"
revCnt : [0 0 0 1 0 0 0]
expected: 1
--------
Count occurrences of input: {{A:2}, {B:2}, {C:3}}
When i = 0, 1, 2:
- The number of Reverse-Order-Pairs of `nums[i]` is zero, i.e. revCnt = 0
When i = 3, we are at the status of "AABC???"
- revCnt = 1, that means, before "AABC???", there are permutations in a pattern "AABB???"
- The remained length is 3.
- The number of such permutation is (3!) / (3!) = 1
- index += 1
When i = 4, 5, 6:
- revCnt = 0, do nothing
Let me make the case i = 3
more clear.
- Obviously, there are 3 empty position in the remained length
"???"
in"AABB???"
, hence we can have3!
permutations. - But there exists duplicate permutations, since the remained letters we can use is
{C : 3}
. See the formula at the begin of this section.
Here is the code 😛.
class PermIndex {
private:
unordered_map<int, string> bench;
unordered_map<int, string> benchmark(string str) {
unordered_map<int, string> res;
sort(str.begin(), str.end());
int idx = 0;
do {
res[idx++] = str;
} while (next_permutation(str.begin(), str.end()));
return res;
}
public:
PermIndex(string &str) { bench = benchmark(str); }
/**
* compute(cnt, len) aim to solve such a problem, using the chars in the
* `cnt` (maybe have duplicate characters, but total numbers of chars is
* `len`) to combine a permutation of length `len`, how many permutations
* can we get?
*/
int compute(unordered_map<char, int> &cnt, vector<int> &fact, int len) {
int res = fact[len];
for (auto [ch, val] : cnt)
res /= fact[val];
return res;
}
int permToIndex(const string &str) {
int n = str.length();
vector<int> fact(n, 1);
for (int i = 2; i < n; ++i)
fact[i] = fact[i - 1] * i;
unordered_map<char, int> cnt;
for (char x : str)
cnt[x]++;
int index = 0;
for (int i = 0; i < n; ++i) {
unordered_set<char> smaller;
for (int j = i + 1; j < n; ++j)
if (str[j] < str[i])
smaller.emplace(str[j]);
for (char x : smaller) {
cnt[x] -= 1;
index += compute(cnt, fact, n - i - 1);
cnt[x] += 1;
}
cnt[str[i]] -= 1;
}
assert(str == bench[index]);
return index;
}
};
int main() {
// string str = "AABBSDABFS";
string str = "BBSAASDABFS";
PermIndex permIndex(str);
int idx = 0;
do {
printf("%d : %s \n", idx++, str.c_str());
permIndex.permToIndex(str);
} while (next_permutation(str.begin(), str.end()));
}
And it's quite challenging to solve problem "Index To Permutation" when the input has duplicate characters.
But it's still the inverse-process of the algorithm above, we just need to pay attention to the details.
To be done.