如何用有限内存读取千万级Sqlite?
当sqlite足够大,matlab一次性读取往往会出现OOM的错误。为了能够成功读取其中数据,我首次的尝试是用Sql的LIMIT
和OFFSET
参数,但事实证明这样做并不能缓解内存占用过高的情况。笔者猜测Matlab在offset跳过前面的数据时会将跳过的数据以某种方式放入内存。换言之,越到后面,跳过的内容越多,内存占用会越高。
一种比较好的处理方法是用主键结合BETWEEN AND
查询,主键的唯一性可以确保查询速度,同时内存占用只取决于查询条目的个数,非常好用。
我的代码如下:
function sqlite_data = read_large_sqlite(sqlite_path, splite_num)
conn = sqlite(sqlite_path, 'readonly');
len_quiry = 'SELECT COUNT(*) FROM path_utd';
total_len = table2array(fetch(conn,len_quiry));
sqlite_data = zeros(total_len, 4);
segments = divide_sequence(total_len,splite_num);
for i = 1:splite_num
query = sprintf('SELECT * FROM path_utd WHERE path_utd_id BETWEEN %d AND %d', segments{i}(1), segments{i}(2));
data = fetch(conn,query);
sqlite_data(segments{i}(1):segments{i}(2),:) = [cast(data.path_id,'double'),...
cast(data.utd_instance_id, 'double'),...
data.received_power, data.cir_phs];
fprintf('read finish %d/%d \n', i, splite_num);
end
conn.close();
end
其中使用的函数divide_sequence
如下
function segments = divide_sequence(N,splite_num)
N = cast(N, 'double');
indices = round(linspace(1, N+1, splite_num+1)); % 均匀生成边界索引
segments = cell(1, splite_num); % 初始化分段
for i = 1:splite_num
segments{i} = [indices(i),(indices(i+1)-1)]; % 每一段的索引范围
end
end