Huawei Sweden Hackthon 2022 (华为瑞典2022黑客马拉松)参赛心得体验

有幸在斯德哥尔摩的T-Centralen的World Trade Center参加了Huawei Sweden Hackthon 2022,五光十色、美伦美奂、冬天的氛围满满。

即使是作为一支入围水队,也收获了宝贵的经验。

--- 查看相册 on Flickr ---

比赛体验是相当不错,有自助晚餐、早餐和午餐披萨供应,以及自主的茶水、水果等;不过由于时间非常长(1个晚上+1个上午),同时如果无法有效优化、求解的话则毫无进展,所以过程依然相当煎熬。

Downlink channel allocation problem

Qualification Contest

Background Description

Physical Background

  • Many users connected to a base station need to send and receive data in real-time.
  • As we have limited bandwidth, efficient allocation of data channel resources to the users has a significant impact on the quality of experience of the users.
  • Here, we focus on the simplified version of the downlink data channel allocation problem as a key building block of mobile networks.

Problem statement

Modeling

  • 5G massive MIMO resources can be divided into M*N grids where in each grid only a single user can be placed.
  • We can have multiple instances of each user based on the size of its data. A user with bigger size of data potentially needs more instances (more grids) to send its data. For example, in the following figure, we have 5 instances of user U1, 3 instances of U2 and 2 instances of U3.
  • We cannot place more than one instance of the same user in the same column, for example, two U1s cannot be placed in the same column. But we can have multiple instances of the same user in the same row.

\(M\): Number of rows

\(N\): Number of Columns

\(U\): Set of users, {U1, U2, …}

\(|U|\): Number of users

Analysis

Each user is denoted by a tuple including: {initial speed, data size, factor}

  • Initial speed: the speed of sending the user’s data when there is no conflict with other users
  • Data size: the total amount of data that the user wants to send.
  • Factor: the factor by which the data speed of the user reduces as a result of collocating with other users in the same column.
  • For example, we can have three users as follows:
    U1={20, 5000,0.3}, This means that user U1 wants to send 5000 bytes data and if there is no conflict with other users the data will be sent by speed 20.
    U2={15,3500,0.25},
    U3={26,4200,0.6}.
    The number of instances of each user depends on both the size of its data and the speed. When a user’s data cannot be sent entirely in a single grid, it needs multiple grids.

Users placed in the same column negatively affect each other in terms of data speed, while users on the same row have no effect on each other.

The speed of each user in case of not collocating with other users is equal to its initial speed, however, in case of collocating with other users in the same column, it is reduced to

\[Speed_i=initial\_{Speed}_i \times \left( 1- collocated\_{factor}_i \right) \]

Where:

\[ collocated\_{factor}_i = factor_i \times \sum_{\forall j\neq i} factor_j \]

which Users \(j\) are assianed to the same column.

  • Note 1: The speed cannot be zero. Thus, if \(Speed_i\) leads to a negative value (i.e., the collocated factor gets bigger than 1), consider the speed equal to zero.
    For a single column and three users, we can have four different scenarios where the users speeds are calculated as follows:

Mapping speed to data transmission rate

There is a corresponding relationship between the speed and the amount of data that a user can send, shown by the speed to data map table. Here is an example of such a table.

  • For example, when the speed of a user is 8, 2158 bytes data can be sent.

  • Notes:

    1. To map the speed to data, use the given F function than rounds the speed considering the floating point error. For example, if the speed U1 collocated with U2 is 10.65, its data is 2696.
    2. the F function is not exactly the same as the know round function because it may roundup a number which is extremly close to the next integer number to avoid the effect of floating point error, for example if the speed is 12.999999, F functions returns 13 rather than 12.
  • For the example in the previous slide, the data sent by each user, in each scenario is calculated as follows:
    U1={20, 5000,0.3}, U2={15,3500,0.25}, U3=

Objective and Constraints

Objective: Maximise the sum of average speed of all users (\(Avg\_Speed_i\)), formulated as

\[ Objective\_function= \frac{\sum_{\forall Users} Avg\_Speed_i}{BestSpeedUsers} \]

Where \(BestSpeedUsers\) is sum of maximum speed of all users and can be calculated by \(\sum_{\forall Users} Init\_{Speed}_i\)
Indeed objective function is a float number between 0 and 1
Constraint: More than one instance of the same user cannot be placed in the same column.

Penalty term: As we would like to send all the data, we apply a penalty term that shows which portion of total data of all users cannot be sent (𝐷𝑎𝑡𝑎_𝐿𝑜𝑠𝑠). Indeed, the penalty term is a float number in range 0 to 1, where 0 indicates no data loss. A solution with zero data loss is called as ‘feasible solution’.

\[ Penalty\_term =\frac{\sum_{\forall Users} Data\_Loss_i}{Total\_{Data}\_{of}\_{All}\_{Users}} \]

\(Total\_{Data}\_{of}\_{All}\_{Users}\) =Sum of total data size of all users

Score Function

The Score_function includes both the objective and the penalty function. It is formulated as:

Where:

\[ Score = Objective\_{function} - \alpha \times Penalty\_{term} \]

\(\alpha =\) Data loss penalty coefficient set by the system designer based on the importance of data users loss. The value of alpha is given in each the test case.

The best scores are achived when the penalty term is zero, which in this case the score is equal to the objective function. The worst value of score is achived when all the data is lost, which in this case the score is equal to –α.

Range of score is [–α , 1]

Limitations

  1. Write a code in C++ or Python that can find a placement for all the test cases,
  2. The execution time of the code, for each test case, should be less than 1 second on your own machine.
  3. Only basic libraries can be used, using optimization libraries/tools are not allowed.

Note:
At the end of the qualification phase, to validate the codes, we check the execution time of your codes on a VM with

  • Intel processor 3.0 GHz eight-cores and 16GB RAM.

If the execution time of your code on the VM is far more than 1 second, it is counted as an invalid solution. As most of today's laptops are not more powerful than the considered VM, if the execution time of your code is in the scale of a few milliseconds on your laptop, you can ensure that it doesn't take longer on the considered VM.

Input and Output Data

  • Input:
  1. A speed to data map as a csv file, used for all test cases
  2. A set of input files each of which corresponds to a test case where each test case includes
    1. Grid size (M, N)
    2. Number of users (|U|)
    3. Value of 𝛼
    4. Users’ information (initial speed, data size, factor)
  • Submission file: It should be a zip file including:
  1. A csv file for each test case that includes:
    1. Grid placement
    2. Penalty_term
    3. Objective_function
    4. Score
    5. Execution time of the code
  2. A single source file with either .py or .cpp or .cc extension

Output CSV Template

  • The first part is called ASSIGNMENT PART that includes M rows, number 0 to M-1. This part is used for showing assigment of users to grids.

  • The next part is called as COMPLEMENTARY PART including four next rows, used to show data loss, user speeds, the goal function value, and the execution time of the algorithm respectively.

  • Here we assume we have only 6 users U1, U2, U3, U4, U5 and U6. But generally we could have any given number of users.

  • Please use comma ’,’ as the seprator of the elements, not other sybmols. We don’t need a comma at the end of rows. Please also use a dash ’-’ to show empty grids in the assignment part, but for other empty parts dash is not needed.
  • As noted in the example, the number of columns of the csv file could be bigger than N. Indeed, it is equal to Max (N , |U|+1)

Evaluation in Leaderboard

  1. We primarily rank the teams based on the number of feasible submitted test-cases
    • A team with more number of feasible test-cases gets higher position in the leaderboard
  2. In case of equal number of feasible test-cases:
    • The team with more number of valid submitted test-cases gets higher position,
  3. In case of equal number of submitted test-cases and the same number of feasible test-cases,
    • The team with higher total score (sum of score of all submitted solutions) gets higher positions in the leaderboard
  4. In case of equal number of submitted test-cases, the same number of feasible test-cases, also equal total scores, the shorter sum of execution times is prioritised.

Notes:

  • A valid submission is the submission that passes all the requirements in terms of file name, file type, size, layout of the csv file and correct calculations of score, objective, penalty term while the execution time of the program is less than 1 second. Otherwise, it is an invalid solution.
  • An infeasible solution is a valid solution with a non-zero value of the penalty term.
  • A Feasible solution is a valid solution with zero value of the penalty term.

Final Contest

Changes to problem

Adding Users Weight to represent the prior of users.

Each user is denoted by a tuple including: {initial speed, data size, factor, weight}

Objective and Constraints

\[ Objective\_function= \frac{\sum_{\forall Users} Avg\_Speed_i \cdot Weight_i}{BestSpeedUsers} \]

where

\[BestSpeedUsers = \sum_{\forall Users} Init\_{Speed}_i\cdot Weight_i \]

解题思路

数学化优化

和ACM问题不同的是,此问题是NP hard问题,难以在多项式时间内用固定算法求得最优解,一般不断尝试求得可行解并对优化目标进行优化,从而求得一定范围内的较优解(称为Feasible Solutions)。

尝试写出数学形式的优化目标:

Find \(\mathbf{e}_{U\times N}\), where \(e_{ij} = \{0,1\}\).
即,将问题表示为求得0-1矩阵,表示第\(i\)用户在第\(j\)列信道是否存在。

Goal Function

\[\begin{aligned} f(\mathbf{e}_{U\times N}) &=\sum_{i=1}^{U} \left( \frac{1}{\sum_{j=1}^{N} e_{ij}}\sum_{j=1}^{N} e_{ij}\cdot v_{ij} \right) \cdot w_i\\ &= \sum_{i=1}^{U}w_i \frac{\sum_{j=1}^{N} e_{ij}\cdot {v_0}_i \left[ 1-\left(\sum_{l=1}^{U} e_{lj}\cdot k_l -e_{ij}k_i \right) \cdot k_i \right] }{\sum_{j=1}^{N} e_{ij}} \\ &= \sum_{i=1}^{U} w_i {v_{0}}_i \frac{(1-k_{i}^2)\sum_{j=1}^{N} e_{ij} - k_i \sum_{j=1}^{N} \sum_{l=1}^{U} e_{ij}e_{lj}k_l}{\sum_{j=1}^{N} e_{ij}}\\ &= \sum_{i=1}^{U} w_i {v_{0}}_i \left[ (1-k_{i}^2)- \frac{k_i\sum_{j=1}^{N} \sum_{l=1}^{U} e_{ij}e_{lj}k_l}{\sum_{j=1}^{N} e_{ij}} \right]\\ &= \sum_{i=1}^{U} w_i {v_{0}}_i\cdot (1-k_{i}^2)- g(\mathbf{e}_{U\times N}) \end{aligned} \]

where

\[ \begin{aligned} g(\mathbf{e}_{U\times N}) &= \sum_{i=1}^{U} w_i {v_{0}}_i k_i \frac{\sum_{j=1}^{N} \sum_{l=1}^{U} e_{ij}e_{lj}k_l}{\sum_{j=1}^{N} e_{ij}}\\ &= \sum_{i=1}^{U} \beta_i \frac{\sum_{j=1}^{N} \sum_{l=1}^{U} e_{ij}e_{lj}k_l}{\sum_{j=1}^{N} e_{ij}}\\ \end{aligned} \]

\[ \begin{aligned} v_{ij} &= {v_{0}}_i \left[ 1-\left(\sum_{l=1, l \neq i}^{U} e_{lj}\cdot k_l \right) \cdot k_i \right] \\ &={v_{0}}_i \left[ 1-\left(\sum_{l=1}^{U} e_{lj}\cdot k_l -e_{ij}k_i \right) \cdot k_i \right] \end{aligned} \]

subject to

Data size Limit (main):

\[ \sum_{j=1}^{N} e_{ij}\cdot D(v_{ij}) \geq d_i,\ \forall i \in \left\{ 1, \ldots , U \right\} \]

where \(d'_{ij} = D(v_{ij})\) is a mapping function, \(\mathbb{R} \mapsto \mathbb{D}\).

User number Limit (strict):

\[ \sum_{i=1}^{U} e_{ij} \leq M,\ \forall j \in \left\{ 1, \ldots , N \right\} \]

For fesible solutions, find \(\arg \max_{\mathbf{e}} f(\mathbf{e}) = \arg \min_{\mathbf{e}} g(\mathbf{e})\).

If Data size Limit cannot be fulfilled, counted as infesible solutions.

Penalty term (related to data loss):

\[ P(\mathbf{e}_{U\times N}) = -\alpha\cdot \frac{\sum_{i=1}^{U} \left[d_i - \sum_{j=1}^{N} e_{ij}\cdot D(v_{ij}) \right]}{\sum_{i=1}^{U} d_i} \]

Goal Function with Penalty term:

\[ f_p(\mathbf{e}_{U\times N}) = f(\mathbf{e}_{U\times N}) + P(\mathbf{e}_{U\times N}) \]

Test Case Range

id \(M\) \(N\) \(U\) \(\alpha\)
1 16 25 80 10000
2 16 32 40 1000
3 16 100 80 10000
4 16 128 300 1000
5 16 250 400 1000
6 20 800 200 1000
7 20 1600 1000 1000
8 20 1600 50 1000
9 20 1600 2000 1000
10 16 275 440 100

Mapping Function
\(D(v_{ij}) = \tilde{d}_{\left\lfloor \tilde{v}_{ij} \right\rfloor +1}\) where \(\tilde{v}_{ij} = \min(\max(v_{ij},0), |\tilde{\mathbf{d}}|-1)\) .

\(\tilde{\mathbf{d}} = [\) 0, 290, 575, 813, 1082, 1351, 1620, 1889, 2158, 2427, 2696, 2965, 3234, 3503, 3772, 4041, 4310, 4579, 4848, 5117, 5386, 5655, 5924, 6093, 6360, 6611, 6800 \(].\)

优化算法

获奖队伍给了很多有意思的启发,首先:对于主目标的可行解,大部分是通过随机算法shuffle-relax、启发式目标(例:按factor可能性排序)得到。

而在优化目标函数方面,则

  • 模拟退火 Simulated Annealing, SA
  • 启发式搜索
  • ...

数据结构

数据结构影响代码的规范化与执行效率,设计上也应遵循正反向索引查找、增删查改等。

总结

总之,通过一天的锤炼还是收获不少。争取下次组成有力的队伍再试一次!

下图奉上正好诺贝尔之夜的斯德哥尔摩市政厅灯光:

谢谢观看。

posted @ 2022-12-14 23:01  Chiron-zy  阅读(168)  评论(0编辑  收藏  举报