[Design Pattern] Upload big file - 3. Code Design - part 1

Code design

SDK has 3 layers:

Upload Protocol: Defines the communication format between the frontend and backendt
upload-core: Protocol-based API that provides core functionalities such as creating and reading protocol fields, as well as common utility functions for both frontend and backend.
upload-client: SDK for client
upload-server: SDK for BFF

upload-core: resuable functions

EvnetEmitter

Unify frontend and backend event handling by using a publish-subscribe pattern to provide a standardized EventEmitter class.

Frontend Events:

Upload Progress Changed Event: Triggered when the upload progress updates.

Upload Paused/Resumed Event: Triggered when the upload is paused or resumed.

Backend Events:

Chunk Write Completed Event: Triggered when a chunk is successfully written to storage.

Chunk Merge Completed Event: Triggered when all chunks are successfully merged into the final file.

export class EventEmitter<T extends string> {
  private events: Map<T, Set<Function>>;
  constructor() {
    this.events = new Map();
  }

  on(event: T, listener: Function) {
    if (!this.events.has(event)) {
      this.events.set(event, new Set());
    }
    this.events.get(event)!.add(listener);
  }

  off(event: T, listener: Function) {
    if (!this.events.has(event)) return;
    this.events.get(event)!.delete(listener);
  }

  emit(event: T, ...args: any[]) {
    if (!this.events.has(event)) return;
    this.events.get(event)!.forEach((listener) => listener(...args));
  }
}

TaskQueue

To support concurrent execution of multiple tasks for both frontend and backend, a TaskQueue class can be implemented.

1. Potential concurrent execution on the frontend: concurrent requests
2. Potential concurrent execution on the backend: concurrent chunk hash verification

import { EventEmitter } from "./EventEmitter";

export class Task {
  fn: Function;
  payload?: any;
  constructor(fn: Function, payload?: any) {
    this.fn = fn;
    this.payload = payload;
  }

  run() {
    return this.fn(this.payload);
  }
}

export class TaskQueue extends EventEmitter<"start" | "pause" | "drain"> {
  private tasks: Set<Task> = new Set();
  private currentCount = 0;
  private status: "running" | "paused" = "paused";
  // max concurrency allowed
  private concurrency: number = 4;

  constructor(concurrency: number = 4) {
    super();
    this.concurrency = concurrency;
  }

  add(...tasks: Task[]) {
    tasks.forEach((task) => this.tasks.add(task));
  }

  addAndStart(...tasks: Task[]) {
    this.add(...tasks);
    this.start();
  }

  start() {
    if (this.status === "running") return;

    if (this.tasks.size === 0) {
      this.emit("drain");
      return;
    }

    this.status = "running";
    this.currentCount++;
    this.runNext();
  }

  private takeHeadTask() {
    const task = this.tasks.values().next().value;
    if (task) {
      this.tasks.delete(task);
    }
    return task;
  }

  private runNext() {
    if (this.status !== "running") return;

    if (this.currentCount >= this.concurrency) return;

    const task = this.takeHeadTask();
    if (!task) {
      this.status = "paused";
      this.emit("pause");
      return;
    }
    this.currentCount++;

    Promise.resolve(task.run()).finally(() => {
      this.currentCount--;
      this.runNext();
    });
  }

  pause() {
    this.status = "paused";
    this.emit("pause");
  }
}

Complex issues in frontend code

The frontend involves two core issues:

1. How to split files into chunks
2. How to control requests

How to split files into chunks

First, implement the handling of chunk objects

export interface Chunk {
  blob: Blob;
  start: number;
  end: number;
  hash: string;
  index: number;
}

// Create a chunk with empty hash
export function createChunk(
  file: File,
  index: number,
  chunkSize: number
): Chunk {
  const start = index * chunkSize;
  const end = Math.min((index + 1) * chunkSize, file.size);
  const blob = file.slice(start, end);
  return {
    blob,
    start,
    end,
    hash: "",
    index,
  };
}

export function calcChunkHash(chunk: Chunk): Promise<string> {
  return new Promise((resolve, reject) => {
    const spark = new SparkMD5.ArrayBuffer();
    const fileReader = new FileReader();
    fileReader.onload = (e) => {
      spark.append(e.target?.result as ArrayBuffer);
      resolve(spark.end());
    };
    fileReader.onerror = reject;
    fileReader.readAsArrayBuffer(chunk.blob);
  });
}

Next, the entire file needs to be chunked. There are various chunking methods, such as:

Standard chunking
Multithreaded chunking: navigator.hardwareConcurrency

Main thread time-sliced chunking: requestIdleCallback
Other chunking patterns

To ensure versatility, the implementation must provide different chunking modes to the upper layer while also allowing for custom chunking patterns. For this reason, the design employs a template pattern based on an abstract class to handle the process.

Template pattern

The template pattern is a behavioral design pattern that defines the skeleton of an algorithm in a base class (abstract or concrete) and allows derived classes to override specific steps without changing the overall algorithm structure.

Key is define the common steps, for example
abstract class Chess {
  move(x, y) {
    // Boundary Checking
    // Determining Valid Moves in a Game
    // Check rule
    if (rule(x, y)) {
      // finish movment
    }
  }
    
  abstract rule(x: y): boolean
}

class Horse extends Chess {
    rule(x, y): boolean {....}
}

// chunkSplitor.ts
import { EventEmitter } from "../upload-core/EventEmitter";
import { Chunk, createChunk } from "./chunk";

export type ChunkSplitorEvents = "chunks" | "wholeHash" | "drain";

export abstract class ChunkSplitor extends EventEmitter<ChunkSplitorEvents> {
  protected file: File;
  protected chunkSize: number;
  protected chunks: Chunk[] = [];
  protected hash?: string; // hash of the whole file

  private handleChunkCount = 0; // the chunks that have been handled
  private spark = new SparkMD5();
  private hasSplitted = false; // chunked or not

  constructor(file: File, chunkSize: number = 1024 * 1024 * 5) {
    super();
    this.file = file;
    this.chunkSize = chunkSize;
    const chunkCount = Math.ceil(file.size / chunkSize);
    this.chunks = new Array(chunkCount)
      .fill(0)
      .map((_, index) => createChunk(this.file, index, this.chunkSize));
  }

  split() {
    if (this.hasSplitted) return;
    this.hasSplitted = true;
    const emitter = new EventEmitter<"chunks">();
    const chunksHandler = (chunks: Chunk[]) => {
      this.emit("chunks", chunks);
      chunks.forEach((chunk) => {
        this.spark.append(chunk.hash);
      });
      this.handleChunkCount += chunks.length;
      if (this.handleChunkCount === this.chunks.length) {
        emitter.off("chunks", chunksHandler);
        this.emit("wholeHash", this.spark.end());
        this.spark.destroy();
        this.emit("drain");
      }
    };
    emitter.on("chunks", chunksHandler);
    this.calcHash(this.chunks, emitter);
  }

  abstract calcHash(chunks: Chunk[], emitter: EventEmitter<"chunks">): void;
}

Based on this abstract class, various chunking modes can be implemented. Each mode only needs to inherit from ChunkSplitor and implement the calculation of the chunk hash.

For example, a multithreaded chunking class can be implemented very simply.

// multiThreadSplitor.ts
import { EventEmitter } from "../upload-core/EventEmitter";
import { Chunk } from "./chunk";
import { ChunkSplitor } from "./chunkSplitor";

export class MultiThreadSplitor extends ChunkSplitor {
  private workers: Worker[] = new Array(navigator.hardwareConcurrency || 4)
    .fill(0)
    .map(
      () =>
        new Worker(new URL("./splitWorker.ts", import.meta.url), {
          type: "module",
        })
    );

  calcHash(chunks: Chunk[], emitter: EventEmitter<"chunks">): void {
    const workerSize = Math.ceil(chunks.length / this.workers.length);
    for (let i = 0; i < this.workers.length; i++) {
      const worker = this.workers[i];
      const start = i * workerSize;
      const end = Math.min((i + 1) * workerSize, chunks.length);
      const workerChunks = chunks.slice(start, end);
      worker.postMessage(workerChunks);
      worker.onmessage = (e) => {
        emitter.emit("chunks", e.data);
      };
    }
  }

  dispose() {
    this.workers.forEach((worker) => worker.terminate());
  }
}

// splitWorker.ts

import { calcChunkHash, Chunk } from "./chunk";

onmessage = (e) => {
  const chunks = e.data as Chunk[];
  chunks.forEach((chunk) => {
    calcChunkHash(chunk).then((hash) => {
      chunk.hash = hash;
      postMessage([chunk]);
    });
  });
};

posted @ 2024-12-05 03:20 Zhentiw 阅读(25) 评论(0) 收藏举报

刷新页面返回顶部

Answer1215

[Design Pattern] Upload big file - 3. Code Design - part 1

Code design

upload-core: resuable functions

Complex issues in frontend code

How to split files into chunks

公告