[Design Pattern] Upload big file - 3. Code Design - part 1
Code design
SDK has 3 layers:
- Upload Protocol: Defines the communication format between the frontend and backendt
- upload-core: Protocol-based API that provides core functionalities such as creating and reading protocol fields, as well as common utility functions for both frontend and backend.
- upload-client: SDK for client
- upload-server: SDK for BFF
upload-core: resuable functions
EvnetEmitter
Unify frontend and backend event handling by using a publish-subscribe pattern to provide a standardized EventEmitter
class.
Frontend Events:
- Upload Progress Changed Event: Triggered when the upload progress updates.
- Upload Paused/Resumed Event: Triggered when the upload is paused or resumed.
Backend Events:
- Chunk Write Completed Event: Triggered when a chunk is successfully written to storage.
- Chunk Merge Completed Event: Triggered when all chunks are successfully merged into the final file.
export class EventEmitter<T extends string> {
private events: Map<T, Set<Function>>;
constructor() {
this.events = new Map();
}
on(event: T, listener: Function) {
if (!this.events.has(event)) {
this.events.set(event, new Set());
}
this.events.get(event)!.add(listener);
}
off(event: T, listener: Function) {
if (!this.events.has(event)) return;
this.events.get(event)!.delete(listener);
}
emit(event: T, ...args: any[]) {
if (!this.events.has(event)) return;
this.events.get(event)!.forEach((listener) => listener(...args));
}
}
TaskQueue
To support concurrent execution of multiple tasks for both frontend and backend, a TaskQueue
class can be implemented.
1. Potential concurrent execution on the frontend: concurrent requests
2. Potential concurrent execution on the backend: concurrent chunk hash verification
import { EventEmitter } from "./EventEmitter";
export class Task {
fn: Function;
payload?: any;
constructor(fn: Function, payload?: any) {
this.fn = fn;
this.payload = payload;
}
run() {
return this.fn(this.payload);
}
}
export class TaskQueue extends EventEmitter<"start" | "pause" | "drain"> {
private tasks: Set<Task> = new Set();
private currentCount = 0;
private status: "running" | "paused" = "paused";
// max concurrency allowed
private concurrency: number = 4;
constructor(concurrency: number = 4) {
super();
this.concurrency = concurrency;
}
add(...tasks: Task[]) {
tasks.forEach((task) => this.tasks.add(task));
}
addAndStart(...tasks: Task[]) {
this.add(...tasks);
this.start();
}
start() {
if (this.status === "running") return;
if (this.tasks.size === 0) {
this.emit("drain");
return;
}
this.status = "running";
this.currentCount++;
this.runNext();
}
private takeHeadTask() {
const task = this.tasks.values().next().value;
if (task) {
this.tasks.delete(task);
}
return task;
}
private runNext() {
if (this.status !== "running") return;
if (this.currentCount >= this.concurrency) return;
const task = this.takeHeadTask();
if (!task) {
this.status = "paused";
this.emit("pause");
return;
}
this.currentCount++;
Promise.resolve(task.run()).finally(() => {
this.currentCount--;
this.runNext();
});
}
pause() {
this.status = "paused";
this.emit("pause");
}
}
Complex issues in frontend code
The frontend involves two core issues:
1. How to split files into chunks
2. How to control requests
How to split files into chunks
First, implement the handling of chunk objects
export interface Chunk {
blob: Blob;
start: number;
end: number;
hash: string;
index: number;
}
// Create a chunk with empty hash
export function createChunk(
file: File,
index: number,
chunkSize: number
): Chunk {
const start = index * chunkSize;
const end = Math.min((index + 1) * chunkSize, file.size);
const blob = file.slice(start, end);
return {
blob,
start,
end,
hash: "",
index,
};
}
export function calcChunkHash(chunk: Chunk): Promise<string> {
return new Promise((resolve, reject) => {
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
fileReader.onload = (e) => {
spark.append(e.target?.result as ArrayBuffer);
resolve(spark.end());
};
fileReader.onerror = reject;
fileReader.readAsArrayBuffer(chunk.blob);
});
}
Next, the entire file needs to be chunked. There are various chunking methods, such as:
- Standard chunking
- Multithreaded chunking:
navigator.hardwareConcurrency
- Main thread time-sliced chunking:
requestIdleCallback
- Other chunking patterns
To ensure versatility, the implementation must provide different chunking modes to the upper layer while also allowing for custom chunking patterns. For this reason, the design employs a template pattern based on an abstract class to handle the process.
Template pattern
The template pattern is a behavioral design pattern that defines the skeleton of an algorithm in a base class (abstract or concrete) and allows derived classes to override specific steps without changing the overall algorithm structure.
Key is define the common steps, for example
abstract class Chess { move(x, y) { // Boundary Checking // Determining Valid Moves in a Game // Check rule if (rule(x, y)) { // finish movment } } abstract rule(x: y): boolean } class Horse extends Chess { rule(x, y): boolean {....} }
// chunkSplitor.ts
import { EventEmitter } from "../upload-core/EventEmitter";
import { Chunk, createChunk } from "./chunk";
export type ChunkSplitorEvents = "chunks" | "wholeHash" | "drain";
export abstract class ChunkSplitor extends EventEmitter<ChunkSplitorEvents> {
protected file: File;
protected chunkSize: number;
protected chunks: Chunk[] = [];
protected hash?: string; // hash of the whole file
private handleChunkCount = 0; // the chunks that have been handled
private spark = new SparkMD5();
private hasSplitted = false; // chunked or not
constructor(file: File, chunkSize: number = 1024 * 1024 * 5) {
super();
this.file = file;
this.chunkSize = chunkSize;
const chunkCount = Math.ceil(file.size / chunkSize);
this.chunks = new Array(chunkCount)
.fill(0)
.map((_, index) => createChunk(this.file, index, this.chunkSize));
}
split() {
if (this.hasSplitted) return;
this.hasSplitted = true;
const emitter = new EventEmitter<"chunks">();
const chunksHandler = (chunks: Chunk[]) => {
this.emit("chunks", chunks);
chunks.forEach((chunk) => {
this.spark.append(chunk.hash);
});
this.handleChunkCount += chunks.length;
if (this.handleChunkCount === this.chunks.length) {
emitter.off("chunks", chunksHandler);
this.emit("wholeHash", this.spark.end());
this.spark.destroy();
this.emit("drain");
}
};
emitter.on("chunks", chunksHandler);
this.calcHash(this.chunks, emitter);
}
abstract calcHash(chunks: Chunk[], emitter: EventEmitter<"chunks">): void;
}
Based on this abstract class, various chunking modes can be implemented. Each mode only needs to inherit from ChunkSplitor
and implement the calculation of the chunk hash.
For example, a multithreaded chunking class can be implemented very simply.
// multiThreadSplitor.ts
import { EventEmitter } from "../upload-core/EventEmitter";
import { Chunk } from "./chunk";
import { ChunkSplitor } from "./chunkSplitor";
export class MultiThreadSplitor extends ChunkSplitor {
private workers: Worker[] = new Array(navigator.hardwareConcurrency || 4)
.fill(0)
.map(
() =>
new Worker(new URL("./splitWorker.ts", import.meta.url), {
type: "module",
})
);
calcHash(chunks: Chunk[], emitter: EventEmitter<"chunks">): void {
const workerSize = Math.ceil(chunks.length / this.workers.length);
for (let i = 0; i < this.workers.length; i++) {
const worker = this.workers[i];
const start = i * workerSize;
const end = Math.min((i + 1) * workerSize, chunks.length);
const workerChunks = chunks.slice(start, end);
worker.postMessage(workerChunks);
worker.onmessage = (e) => {
emitter.emit("chunks", e.data);
};
}
}
dispose() {
this.workers.forEach((worker) => worker.terminate());
}
}
// splitWorker.ts
import { calcChunkHash, Chunk } from "./chunk";
onmessage = (e) => {
const chunks = e.data as Chunk[];
chunks.forEach((chunk) => {
calcChunkHash(chunk).then((hash) => {
chunk.hash = hash;
postMessage([chunk]);
});
});
};