前端下载超大文件的完整方案

本文从前端方面出发实现浏览器下载大文件的功能。不考虑网络异常、关闭网页等原因造成传输中断的情况。分片下载采用串行方式（并行下载需要对切片计算hash，比对hash，丢失重传，合并chunks的时候需要按顺序合并等，很麻烦。对传输速度有追求的，并且在带宽允许的情况下可以做并行分片下载）。

测试发现存一两个G左右数据到IndexedDB后，浏览器确实会内存占用过高导致退出 (我测试使用的是chrome103版本浏览器)

实现步骤

使用分片下载： 将大文件分割成多个小块进行下载，可以降低内存占用和网络传输中断的风险。这样可以避免一次性下载整个大文件造成的性能问题。
断点续传： 实现断点续传功能，即在下载中途中断后，可以从已下载的部分继续下载，而不需要重新下载整个文件。
进度条显示： 在页面上展示下载进度，让用户清晰地看到文件下载的进度。如果一次全部下载可以从process中直接拿到参数计算得出（很精细），如果是分片下载，也是计算已下载的和总大小，只不过已下载的会成片成片的增加（不是很精细）。
取消下载和暂停下载功能： 提供取消下载和暂停下载的按钮，让用户可以根据需要中止或暂停下载过程。
合并文件： 下载完成后，将所有分片文件合并成一个完整的文件。

以下是一个基本的前端大文件下载的实现示例：

可以在类里面增加注入一个回调函数，用来更新外部的一些状态，示例中只展示下载完成后的回调

class FileDownloader {
  constructor({url, fileName, chunkSize = 2 * 1024 * 1024, cb}) {
    this.url = url;
    this.fileName = fileName;
    this.chunkSize = chunkSize;
    this.fileSize = 0;
    this.totalChunks = 0;
    this.currentChunk = 0;
    this.downloadedSize = 0;
    this.chunks = [];
    this.abortController = new AbortController();
    this.paused = false;
    this.cb = cb
  }

  async getFileSize() {
    const response = await fetch(this.url, { signal: this.abortController.signal });
    const contentLength = response.headers.get("content-length");
    this.fileSize = parseInt(contentLength);
    this.totalChunks = Math.ceil(this.fileSize / this.chunkSize);
  }

  async downloadChunk(chunkIndex) {
    const start = chunkIndex * this.chunkSize;
    const end = Math.min(this.fileSize, (chunkIndex + 1) * this.chunkSize - 1);

    const response = await fetch(this.url, {
      headers: { Range: `bytes=${start}-${end}` },
      signal: this.abortController.signal
    });

    const blob = await response.blob();
    this.chunks[chunkIndex] = blob;
    this.downloadedSize += blob.size;

    if (!this.paused && this.currentChunk < this.totalChunks - 1) {
      this.currentChunk++;
      this.downloadChunk(this.currentChunk);
    } else if (this.currentChunk === this.totalChunks - 1) {
      this.mergeChunks();
    }
  }

  async startDownload() {
    if (this.chunks.length === 0) {
      await this.getFileSize();
    }
    this.downloadChunk(this.currentChunk);
  }

  pauseDownload() {
    this.paused = true;
  }

  resumeDownload() {
    this.paused = false;
    this.downloadChunk(this.currentChunk);
  }

  cancelDownload() {
    this.abortController.abort();
    this.reset();
  }

  async mergeChunks() {
    const blob = new Blob(this.chunks, { type: "application/octet-stream" });
    const url = window.URL.createObjectURL(blob);
    const a = document.createElement("a");
    a.href = url;
    a.download = this.fileName;
    document.body.appendChild(a);
    a.click();
    setTimeout(() => {
      this.cb && this.cb({
        downState: 1
      })
      this.reset();
      document.body.removeChild(a);
      window.URL.revokeObjectURL(url);
    }, 0);
  }
  
  reset() {
    this.chunks = [];
    this.fileName = '';
    this.fileSize = 0;
    this.totalChunks = 0;
    this.currentChunk = 0;
    this.downloadedSize = 0;
  }
}


// 使用示例
const url = "https://example.com/largefile.zip";
const fileName = "largefile.zip";

const downloader = new FileDownloader({url, fileName, cb: this.updateData});

// 更新状态
updateData(res) {
  const {downState} = res
  this.downState = downState
}

// 开始下载
downloader.startDownload();

// 暂停下载
// downloader.pauseDownload();

// 继续下载
// downloader.resumeDownload();

// 取消下载
// downloader.cancelDownload();

分片下载怎么实现断点续传？已下载的文件怎么存储？

浏览器的安全策略禁止网页（JS）直接访问和操作用户计算机上的文件系统。

在分片下载过程中，每个下载的文件块（chunk）都需要在客户端进行缓存或存储，方便实现断点续传功能，同时也方便后续将这些文件块合并成完整的文件。这些文件块可以暂时保存在内存中或者存储在客户端的本地存储（如 IndexedDB、LocalStorage 等）中。

一般情况下，为了避免占用过多的内存，推荐将文件块暂时保存在客户端的本地存储中。这样可以确保在下载大文件时不会因为内存占用过多而导致性能问题。

在上面提供的示例代码中，文件块是暂时保存在一个数组中的，最终在mergeChunks()方法中将这些文件块合并成完整的文件。如果你希望将文件块保存在本地存储中，可以根据需要修改代码，将文件块保存到 IndexedDB 或 LocalStorage 中。

IndexedDB本地存储

IndexedDB文档：IndexedDB_API

IndexedDB 浏览器存储限制和清理标准

无痕模式是浏览器提供的一种隐私保护功能，它会在用户关闭浏览器窗口后自动清除所有的浏览数据，包括 LocalStorage、IndexedDB 和其他存储机制中的数据。

IndexedDB 数据实际上存储在浏览器的文件系统中，是浏览器的隐私目录之一，不同浏览器可能会有不同的存储位置，普通用户无法直接访问和手动删除这些文件，因为它们受到浏览器的安全限制。可以使用 deleteDatabase 方法来删除整个数据库，或者使用 deleteObjectStore 方法来删除特定的对象存储空间中的数据。

原生的indexedDB api 使用起来很麻烦，稍不留神就会出现各种问题，封装一下方便以后使用。

这个类封装了 IndexedDB 的常用操作，包括打开数据库、添加数据、通过 ID 获取数据、获取全部数据、更新数据、删除数据和删除数据表。

封装indexedDB类

class IndexedDBWrapper {
  constructor(dbName, storeName) {
    this.dbName = dbName;
    this.storeName = storeName;
    this.db = null;
  }

  openDatabase() {
    return new Promise((resolve, reject) => {
      const request = indexedDB.open(this.dbName);
      
      request.onerror = () => {
        console.error("Failed to open database");
        reject();
      };

      request.onsuccess = () => {
        this.db = request.result;
        resolve();
      };

      request.onupgradeneeded = () => {
        this.db = request.result;
        
        if (!this.db.objectStoreNames.contains(this.storeName)) {
          this.db.createObjectStore(this.storeName, { keyPath: "id" });
        }
      };
    });
  }

  addData(data) {
    return new Promise((resolve, reject) => {
      const transaction = this.db.transaction([this.storeName], "readwrite");
      const objectStore = transaction.objectStore(this.storeName);
      const request = objectStore.add(data);

      request.onsuccess = () => {
        resolve();
      };

      request.onerror = () => {
        console.error("Failed to add data");
        reject();
      };
    });
  }

  getDataById(id) {
    return new Promise((resolve, reject) => {
      const transaction = this.db.transaction([this.storeName], "readonly");
      const objectStore = transaction.objectStore(this.storeName);
      const request = objectStore.get(id);

      request.onsuccess = () => {
        resolve(request.result);
      };

      request.onerror = () => {
        console.error(`Failed to get data with id: ${id}`);
        reject();
      };
    });
  }

  getAllData() {
    return new Promise((resolve, reject) => {
      const transaction = this.db.transaction([this.storeName], "readonly");
      const objectStore = transaction.objectStore(this.storeName);
      const request = objectStore.getAll();

      request.onsuccess = () => {
        resolve(request.result);
      };

      request.onerror = () => {
        console.error("Failed to get all data");
        reject();
      };
    });
  }

  updateData(data) {
    return new Promise((resolve, reject) => {
      const transaction = this.db.transaction([this.storeName], "readwrite");
      const objectStore = transaction.objectStore(this.storeName);
      const request = objectStore.put(data);

      request.onsuccess = () => {
        resolve();
      };

      request.onerror = () => {
        console.error("Failed to update data");
        reject();
      };
    });
  }

  deleteDataById(id) {
    return new Promise((resolve, reject) => {
      const transaction = this.db.transaction([this.storeName], "readwrite");
      const objectStore = transaction.objectStore(this.storeName);
      const request = objectStore.delete(id);

      request.onsuccess = () => {
        resolve();
      };

      request.onerror = () => {
        console.error(`Failed to delete data with id: ${id}`);
        reject();
      };
    });
  }

  deleteStore() {
    return new Promise((resolve, reject) => {
      const version = this.db.version + 1;
      this.db.close();

      const request = indexedDB.open(this.dbName, version);

      request.onupgradeneeded = () => {
        this.db = request.result;
        this.db.deleteObjectStore(this.storeName);
        resolve();
      };

      request.onsuccess = () => {
        resolve();
      };

      request.onerror = () => {
        console.error("Failed to delete object store");
        reject();
      };
    });
  }
}

使用indexedDB类示例：

const dbName = "myDatabase";
const storeName = "myStore";

const dbWrapper = new IndexedDBWrapper(dbName, storeName);

dbWrapper.openDatabase().then(() => {
  const data = { id: 1, name: "John Doe", age: 30 };

  dbWrapper.addData(data).then(() => {
    console.log("Data added successfully");

    dbWrapper.getDataById(1).then((result) => {
      console.log("Data retrieved:", result);

      const updatedData = { id: 1, name: "Jane Smith", age: 35 };
      dbWrapper.updateData(updatedData).then(() => {
        console.log("Data updated successfully");

        dbWrapper.getDataById(1).then((updatedResult) => {
          console.log("Updated data retrieved:", updatedResult);

          dbWrapper.deleteDataById(1).then(() => {
            console.log("Data deleted successfully");

            dbWrapper.getAllData().then((allData) => {
              console.log("All data:", allData);

              dbWrapper.deleteStore().then(() => {
                console.log("Object store deleted successfully");
              });
            });
          });
        });
      });
    });
  });
});

indexedDB的使用库 - localforage

这个库对浏览器本地存储的几种方式做了封装，自动降级处理。但是使用indexedDB上感觉不是很好，不可以添加索引，但是操作确实方便了很多。

文档地址： localforage

下面展示 LocalForage 中使用 IndexedDB 存储引擎并结合 async/await 进行异步操作

const localforage = require('localforage');

// 配置 LocalForage
localforage.config({
  driver: localforage.INDEXEDDB, // 使用 IndexedDB 存储引擎
  name: 'myApp', // 数据库名称
  version: 1.0, // 数据库版本
  storeName: 'myData' // 存储表名称
});

// 使用 async/await 进行异步操作
(async () => {
  try {
    // 存储数据
    await localforage.setItem('key', 'value');
    console.log('数据保存成功');

    // 获取数据
    const value = await localforage.getItem('key');
    console.log('获取到的数据为:', value);

    // 移除数据
    await localforage.removeItem('key');
    console.log('数据移除成功');
    
    // 关闭 IndexedDB 连接
    await localforage.close();
    console.log('IndexedDB 已关闭');
  } catch (err) {
    console.error('操作失败', err);
  }
})();

现代的浏览器会自动管理 IndexedDB 连接的生命周期，包括在页面关闭时自动关闭连接，在大多数情况下，不需要显式地打开或关闭 IndexedDB 连接。

如果你有特殊的需求或者对性能有更高的要求，可以使用 localforage.close() 方法来关闭连接。

使用 LocalForage 来删除 IndexedDB 中的所有数据

import localforage from 'localforage';

// 使用 clear() 方法删除所有数据
localforage.clear()
  .then(() => {
    console.log('IndexedDB 中的所有数据已删除');
  })
  .catch((error) => {
    console.error('删除 IndexedDB 数据时出错：', error);
  });

IndexedDB内存暂用过高问题

使用 IndexedDB 可能会导致浏览器内存占用增加的原因有很多，以下是一些可能的原因：

数据量过大：如果你在 IndexedDB 中存储了大量数据，那么浏览器可能需要消耗更多内存来管理和处理这些数据。尤其是在读取或写入大量数据时，内存占用会显著增加。
未关闭的连接：如果在使用完 IndexedDB 后未正确关闭数据库连接，可能会导致内存泄漏。确保在不再需要使用 IndexedDB 时正确关闭数据库连接，以释放占用的内存。
索引和查询：如果你在 IndexedDB 中创建了大量索引或者执行复杂的查询操作，都会导致浏览器内存占用增加，特别是在处理大型数据集时。
缓存：浏览器可能会对 IndexedDB 中的数据进行缓存，以提高访问速度。这可能会导致内存占用增加，尤其是在大规模数据操作后。
浏览器实现：不同浏览器的 IndexedDB 实现可能存在差异，某些浏览器可能会在处理 IndexedDB 数据时占用更多内存。