[GenAI] Combine DeepSeek R1 Reasoning with GPT 3.5 Turbo for the Cheapest, Fastest, and Best AI

DeepSeek Reasoning

A CLI tool that combines DeepSeek's reasoning capabilities with GPT's summarization power. This project demonstrates how to:

Use DeepSeek's reasoning model to analyze questions in detail
Stream the reasoning process in real-time
Use GPT to create concise, single-sentence summaries of the reasoning

🎓 What You'll Learn

This project is tied to an egghead.io lesson that teaches you how to:

Extract and utilize DeepSeek's reasoning phase from their R1 model
Stream and parse API responses in real-time
Optimize costs by using different models for different tasks
Combine multiple AI models in a single workflow
Build a practical CLI tool that demonstrates these concepts

Why This Matters

DeepSeek's R1 model provides detailed reasoning before summarization. By isolating this reasoning phase and using a faster model (like GPT-3.5-turbo) for summarization, you can:

Get high-quality reasoning from DeepSeek
Optimize costs by using a cheaper model for summarization
Improve response times in your applications
Create more efficient AI workflows

Features

Interactive CLI using Clack
Real-time streaming of DeepSeek's reasoning process
Automatic logging of all interactions
Clean summarization of complex reasoning
Cost-effective hybrid model approach

Requirements

You'll need API key for:

OpenRouter API (for GPT & DeepSeek access)

Setup

Clone the repo:

git clone https://github.com/johnlindquist/deepseek-reasoning.git
cd deepseek-reasoning

Install dependencies:

pnpm install

Create a .env file with your API keys:

OPENROUTER_API_KEY=your_openrouter_api_key

Run the CLI:

pnpm tsx index.ts

How It Works

The CLI prompts you for a question
DeepSeek's reasoning model analyzes your question, streaming its thought process in real-time
The reasoning is captured and logged
GPT 3.5 creates a concise, single-sentence summary of the reasoning
Both the reasoning and summary are saved to timestamped log files

Implementation Details

The code demonstrates several key patterns:

Streaming API responses with proper TypeScript types
Early stream termination to capture only reasoning content
Efficient model switching for cost optimization
Structured logging for debugging and analysis
Clean CLI interactions with proper error handling

Tech Stack

TypeScript
OpenAI SDK (for OpenRouter)
Clack (for CLI interactions)
dotenv (for environment management)

Code:

import dotenv from "dotenv";
dotenv.config();

import { log, spinner, text } from "@clack/prompts";
import { existsSync } from "node:fs";
import { appendFile, mkdir } from "node:fs/promises";
import OpenAI from "openai";

const OPEN_ROUTER_API_URL = "https://openrouter.ai/api/v1";
const REASON_MODEL = "deepseek/deepseek-r1" // or "google/gemini-2.0-flash-thinking-exp:free"
const SUMMARIZER_MODEL = "openai/gpt-4o-mini" // or "gpt-3.5-turbo-0613"

const s = spinner();
const timestamp = new Date()
	.toISOString()
	.replace("T", "-")
	.replace(/:/g, "-")
	.split(".")[0];
const logFile = `logs/${timestamp}.log`;

const appendLog = async (data: unknown) => {
	if (!existsSync("logs")) {
		await mkdir("logs");
	}
	appendFile(logFile, `---\n\n${JSON.stringify(data, null, 2)}\n\n`);
}

declare global {
	namespace NodeJS {
		interface ProcessEnv {
			OPENROUTER_API_KEY: string;
		}
	}
}

const question = (await text({
	message: "How can I help?",
})) as string;

log.info("Thinking...");
const deepseek = new OpenAI({
	baseURL: OPEN_ROUTER_API_URL,
	apiKey: process.env.OPENROUTER_API_KEY,
});

const reasoningResponse = await deepseek.chat.completions.create({
	model: REASON_MODEL,
	messages: [{ role: "user", content: question }],
	stream: true,
	stop: "</think>", // for stopping right after the reasoning
	include_reasoning: true, // not in types yet
});

let reasoning = "";

for await (const chunk of reasoningResponse) {
	const reasoningContent = (
		chunk.choices?.[0]?.delta as { reasoning: string }
	)?.reasoning;

	if (reasoningContent !== null) {
		const content = reasoningContent;
		reasoning += content;
		process.stdout.write(content);
	} else {
		reasoningResponse.controller.abort(); // stop the stream before it summarizes
		log.success("Reasoning done!");
		break;
	}
}

await appendLog(`
REASONING:
${reasoning}
---------
`);

s.start("Summarizing...");
const openai = new OpenAI({
	baseURL: OPEN_ROUTER_API_URL,
	apiKey: process.env.OPENROUTER_API_KEY,
});

const gptResponse = await openai.chat.completions.create({
	model: SUMMARIZER_MODEL,
	messages: [
		{
			role: "system",
			content:
				"Answer the initial <QUESTION> in a single sentence based on the <REASONING>",
		},
		{
			role: "user",
			content: `
<QUESTION>
${question}
</QUESTION>

<REASONING>
${reasoning}
</REASONING>
`,
		},
	],
});

const summary = gptResponse.choices[0]?.message.content ?? "";

s.stop();
log.info(summary);

await appendLog(`
SUMMARY:
${summary}
---------
`);

posted @ 2025-02-03 22:05 Zhentiw 阅读(94) 评论(0) 收藏举报

刷新页面返回顶部

Answer1215