基于GPT4的AI镜子

一、树莓派系统搭建

1. 搭建系统

两种方法，一种是直接使用Raspberry Pi Imager安装，这种相关于是自动安装系统，好处是比较方便，但是问题是比较慢；另一种是自行下载镜像，然后再把镜像安装到内存卡中，这种步骤相对来说稍繁琐，但是因为已经提前下载好了镜像，安装相对比较快。

第一种方法：

从官网下载自动安装软件，Raspberry Pi OS – Raspberry Pi
然后双击软件，选择合适的系统，然后选择内存卡，然后等待系统安装完成
系统做好之后将内存卡拔出，放入树莓派中即可启动系统

第二种方法：

下载操作系统镜像，Operating system images – Raspberry Pi
然后使用Win32 Disk Imager将镜像烧写到内存卡中
（使用Win32 Disk Imager还能将内存卡中的系统拷贝出来，这样方便批量化搭建环境，https://blog.csdn.net/qq_29373285/article/details/99629369）

2. 设置VNC远程访问

树莓派操作：

依次选择首选项——Raspberry Pi Configuration——interfaces
然后将VNC选项打开
重启树莓派

windows操作：

下载 VNC-Viewer,Download VNC Viewer | VNC® Connect
打开VNC Viwer，输入树莓派的ip地址，然后就可以远程连接了

二、miniconda环境搭建

注：树莓派上面的miniconda和pyqt有冲突，pyqt无法用minianaconda搭建

首先下载合适的软件包，Index of /anaconda/miniconda/ | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror
笔者项目中使用的是树莓派4 8G版本的，前后尝试了多个版本，最后能够正常安装使用的版本是:Miniconda3-py39_4.9.2-Linux-aarch64.sh
下载完成之后进入下载的目录进行安装sudo sh Miniconda3-py39_4.9.2-Linux-aarch64.sh
安装期间会提示安装的路径,笔者安装的路径为：/opt/miniconda3
设置为打开终端默认进入(base)环境，需要deactivate退出。若改为false，打开终端需要activate环境。我设置为false。conda config --set auto_activate_base false
安装完成之后，这个miniconda和anaconda是一样使用的，这边不在赘述。

三、VPN配置

VPN的配置和在linux下配置是一样，用的也是clash，然后选择代理，最后购买节点完成kx上网。

因为树莓派是arm架构的，所以选择arm64的linux版本，不能选择x86架构的，Releases · Fndroid/clash_for_windows_pkg · GitHub，下载下来之后安装一下就行了
配置代理，可以参考这个，Linux设置网络代理_linux 代理_Dancen的博客-CSDN博客，主要还是需要根据购买的节点来设置，配置完需要重启。
设置桌面快捷启动方式：可以参考一下这个，方法大同小异：linux 创建桌面快捷方式和命令启动程序_将启动软件命令放到桌面_william~的博客-CSDN博客
导入购买节点，然后就能kx上网了。

四、语音识别

语音识别部分主要使用的是speechrecognition。
安装库：pip install speechrecognition。
这边插一条，使用pip下载嫌慢可以使用国内镜像下载，如：pip3 install speechrecognition -i https://pypi.tuna.tsinghua.edu.cn/simple

一个测试例程：

import speech_recognition as sr

def real_time_speech_to_text():
    # 创建一个Recognizer对象
    recognizer = sr.Recognizer()

    # 使用麦克风实时监听语音并转换为文本
    with sr.Microphone() as source:
        recognizer.adjust_for_ambient_noise(source)  # 将该行代码放在with语句块内
        print("请开始说话...")
        try:
          while True:
              audio = recognizer.listen(source, timeout=5)  # 设置超时时间为5秒
              if not audio:
                  print("没有检测到声音，程序结束。")
                  break
              text = recognizer.recognize_google(audio, language='zh-CN')
              print("识别结果：", text)
              if "退出" in text:
                  print("检测到退出指令，程序结束。")
                  break
        except KeyboardInterrupt:
            print("监听结束")
        except sr.UnknownValueError:
            print("无法识别语音")
        except sr.RequestError as e:
            print("无法连接到Google Speech Recognition服务；{0}".format(e))

if __name__ == "__main__":
    real_time_speech_to_text()

五、语音合成

这边有两个选择：
一：使用'pyttsx3'，简单方便，但是声音可选择空间小，声音比较僵硬，特别是linux环境下
二：使用第三方的API，声音可选择空间大，但是使用需要充钱。

1. pyttsx3

文字转语音（Text-to-Speech，TTS）可以使用Python的pyttsx3库来实现。pyttsx3是一个简单易用的文字转语音库，它可以将文字转换为语音，并通过系统的声音输出设备进行播放。

以下是一个使用pyttsx3库实现文字转语音的例程：

import pyttsx3

def text_to_speech(text):
    # 初始化语音引擎
    engine = pyttsx3.init()

    # 设置要转换的文本
    engine.say(text)

    # 播放语音
    engine.runAndWait()

if __name__ == "__main__":
    text_to_speech("你好，这是一个文字转语音的例程。")

在上述例程中，我们首先通过pyttsx3.init()来初始化语音引擎。然后使用engine.say(text)将要转换的文本传递给引擎。最后通过engine.runAndWait()方法来播放语音。

运行该例程后，你应该能听到计算机通过声音输出设备播放出文字转换为语音的结果。

pyttsx3库支持多种声音引擎，可以根据你的系统配置和需求来选择合适的声音引擎。默认情况下，pyttsx3使用的是SAPI5引擎，但你可以设置其他支持的引擎，例如nsss（MacOS）或espeak（Linux）。可以通过pyttsx3.init(driverName)来设置使用的引擎。

2. 使用阿里云API

这边给出相关例程：

ALiYunConfig.py

# -*- coding: UTF-8 -*-
# Python 2.x引入httplib模块。
# import httplib
# Python 3.x引入http.client模块。
import http.client
import io
import pydub
from pydub.playback import play
# Python 2.x引入urllib模块。
# import urllib
# Python 3.x引入urllib.parse模块。
import urllib.parse
import json

import pygame
# 初始化pygame
pygame.init()


class ttsConfig():

    def __init__(self):
        self.appKey = 'appKey'         # 从阿里云复制appkey
        self.token = 'token'           # 从阿里云复制token
        self.audioSaveFile = 'syAudio0.wav'
        self.format = 'wav'
        self.sampleRate = 16000

        pygame.mixer.init()

    # def processGETRequest(self, appKey, token, text, audioSaveFile, format, sampleRate) :  # 原
    def processGETRequest(self, text) :   # 只剩一个text参数
        appKey = self.appKey
        token = self.token
        audioSaveFile = self.audioSaveFile
        format = self.format
        sampleRate = self.sampleRate

        host = 'nls-gateway-cn-shanghai.aliyuncs.com'
        url = 'https://' + host + '/stream/v1/tts'
        # 设置URL请求参数
        url = url + '?appkey=' + appKey
        url = url + '&token=' + token
        url = url + '&text=' + text
        url = url + '&format=' + format
        url = url + '&sample_rate=' + str(sampleRate)
        # voice 发音人，可选，默认是xiaoyun。
        # url = url + '&voice=' + 'xiaoyun'
        # volume 音量，范围是0~100，可选，默认50。
        url = url + '&volume=' + str(100)
        # speech_rate 语速，范围是-500~500，可选，默认是0。
        # url = url + '&speech_rate=' + str(300)
        # pitch_rate 语调，范围是-500~500，可选，默认是0。
        # url = url + '&pitch_rate=' + str(0)
        # print(url)
        # Python 2.x请使用httplib。
        # conn = httplib.HTTPSConnection(host)
        # Python 3.x请使用http.client。
        conn = http.client.HTTPSConnection(host)
        conn.request(method='GET', url=url)
        # 处理服务端返回的响应。
        response = conn.getresponse()
        # print('Response status and response reason:')
        # print(response.status ,response.reason)
        contentType = response.getheader('Content-Type')
        # print(contentType)
        body = response.read()
        if 'audio/mpeg' == contentType :
            with open(audioSaveFile, mode='wb') as f:
                f.write(body)

            print('The GET request succeed!')
        else :
            print('The GET request failed: ' + str(body))
        conn.close()
    
    def processPOSTRequest(self, appKey, token, text, audioSaveFile, format, sampleRate) :
        host = 'nls-gateway-cn-shanghai.aliyuncs.com'
        url = 'https://' + host + '/stream/v1/tts'
        # 设置HTTPS Headers。
        httpHeaders = {
            'Content-Type': 'application/json'
            }
        # 设置HTTPS Body。
        body = {'appkey': appKey, 'token': token, 'text': text, 'format': format, 'sample_rate': sampleRate}
        body = json.dumps(body)
        # print('The POST request body content: ' + body)
        # Python 2.x请使用httplib。
        # conn = httplib.HTTPSConnection(host)
        # Python 3.x请使用http.client。
        conn = http.client.HTTPSConnection(host)
        conn.request(method='POST', url=url, body=body, headers=httpHeaders)
        # 处理服务端返回的响应。
        response = conn.getresponse()
        print('Response status and response reason:')
        print(response.status ,response.reason)
        contentType = response.getheader('Content-Type')
        print("contentType: \n", contentType)

        # stream = io.BytesIO(contentType)  
        # # 将字节流转换为音频片段对象
        # audio_segment = pydub.AudioSegment.from_wav(stream)
        # play(audio_segment)
        # body = response.read()

        if 'audio/mpeg' == contentType :
            with open(audioSaveFile, mode='wb') as f:
                f.write(body)
            print('The POST request succeed!')
        else :
            print('The POST request failed: ' + str(body))
        conn.close()

ALiYun_module.py

from ALiYunConfig import *


def say(config, text):
    # 采用RFC 3986规范进行urlencode编码。
    textUrlencode = text

    textUrlencode = urllib.parse.quote_plus(textUrlencode)
    textUrlencode = textUrlencode.replace("+", "%20")
    textUrlencode = textUrlencode.replace("*", "%2A")
    textUrlencode = textUrlencode.replace("%7E", "~")

    # GET请求方式
    config.processGETRequest(textUrlencode)

    # 设置音频参数
    # pygame.mixer.init()
    pygame.mixer.music.load(config.audioSaveFile)

    # 播放音频
    pygame.mixer.music.play()
    # 等待音频播放完毕
    while pygame.mixer.music.get_busy():
        continue
    # 关闭pygame
    # pygame.quit()

# test
if __name__ == '__main__':
    myTTS = ttsConfig()
    say(myTTS, 'config.processGETRequest(设置音频参数)')

六、GPT API使用

首先安装相关库：pip install openai

例程（这个例程是不能记忆上下文的）

import os
import openai

# Load your API key from an environment variable or secret management service
# openai.api_key = "you_api_key" # 填入你的API key

while True:
  content = input()
  # content= "hellow!"
  response = chat_completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", max_tokens=100, temperature=0.6  messages=[{"role": "user", "content": content}])
  # response = chat_completion = openai.ChatCompletion.create(model="gpt-4", messages=[{"role": "user", "content": content}])
  print(response.choices[0].message.content)

然后这个仓库给出的相关程序是可以进行上下文记忆的：https://github.com/ChristopheZhao/ChaGPT-API-Call

七、yolov5算法部署

1. 部署使用

yolo使用onnx进行部署的方法可以参考这个仓库：GitHub - hpc203/yolov5-v6.1-opencv-onnxrun: 分别使用OpenCV、ONNXRuntime部署yolov5-v6.1目标检测，包含C++和Python两个版本的程序。支持yolov5s，yolov5m，yolov5l，yolov5n，yolov5x，yolov5s6，yolov5m6，yolov5l6，yolov5n6，yolov5x6的十种结构的yolov5-v6.1

仓库中基于python的主要代码改写如下：

import cv2
import argparse
import numpy as np

class yolov5():
    def __init__(self, modelpath, confThreshold=0.5, nmsThreshold=0.5, objThreshold=0.5):
        # with open('class.names', 'rt') as f:
        with open('.\\class.names', 'rt') as f:
            self.classes = f.read().rstrip('\n').split('\n')
        self.num_classes = len(self.classes)
        if modelpath.endswith('6.onnx'):
            self.inpHeight, self.inpWidth = 1280, 1280
            anchors = [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542],
                       [436, 615, 739, 380, 925, 792]]
            self.stride = np.array([8., 16., 32., 64.])
        else:
            self.inpHeight, self.inpWidth = 640, 640
            anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
            self.stride = np.array([8., 16., 32.])
        self.nl = len(anchors)
        self.na = len(anchors[0]) // 2
        self.grid = [np.zeros(1)] * self.nl
        self.anchor_grid = np.asarray(anchors, dtype=np.float32).reshape(self.nl, -1, 2)
        self.net = cv2.dnn.readNet(modelpath)
        self.confThreshold = confThreshold
        self.nmsThreshold = nmsThreshold
        self.objThreshold = objThreshold
        self._inputNames = ''

    def resize_image(self, srcimg, keep_ratio=True, dynamic=False):
        top, left, newh, neww = 0, 0, self.inpWidth, self.inpHeight
        if keep_ratio and srcimg.shape[0] != srcimg.shape[1]:
            hw_scale = srcimg.shape[0] / srcimg.shape[1]
            if hw_scale > 1:
                newh, neww = self.inpHeight, int(self.inpWidth / hw_scale)
                img = cv2.resize(srcimg, (neww, newh), interpolation=cv2.INTER_AREA)
                if not dynamic:
                    left = int((self.inpWidth - neww) * 0.5)
                    img = cv2.copyMakeBorder(img, 0, 0, left, self.inpWidth - neww - left, cv2.BORDER_CONSTANT,
                                             value=(114, 114, 114))  # add border
            else:
                newh, neww = int(self.inpHeight * hw_scale), self.inpWidth
                img = cv2.resize(srcimg, (neww, newh), interpolation=cv2.INTER_AREA)
                if not dynamic:
                    top = int((self.inpHeight - newh) * 0.5)
                    img = cv2.copyMakeBorder(img, top, self.inpHeight - newh - top, 0, 0, cv2.BORDER_CONSTANT,
                                             value=(114, 114, 114))
        else:
            img = cv2.resize(srcimg, (self.inpWidth, self.inpHeight), interpolation=cv2.INTER_AREA)
        return img, newh, neww, top, left

    def _make_grid(self, nx=20, ny=20):
        xv, yv = np.meshgrid(np.arange(ny), np.arange(nx))
        return np.stack((xv, yv), 2).reshape((-1, 2)).astype(np.float32)

    def preprocess(self, img):
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = img.astype(np.float32) / 255.0
        return img

    def postprocess(self, frame, outs, padsize=None):
        frameHeight = frame.shape[0]
        frameWidth = frame.shape[1]
        newh, neww, padh, padw = padsize
        ratioh, ratiow = frameHeight / newh, frameWidth / neww
        # Scan through all the bounding boxes output from the network and keep only the
        # ones with high confidence scores. Assign the box's class label as the class with the highest score.

        confidences = []
        boxes = []
        classIds = []
        for detection in outs:
            if detection[4] > self.objThreshold:
                scores = detection[5:]
                classId = np.argmax(scores)
                confidence = scores[classId] * detection[4]
                if confidence > self.confThreshold:
                    center_x = int((detection[0] - padw) * ratiow)
                    center_y = int((detection[1] - padh) * ratioh)
                    width = int(detection[2] * ratiow)
                    height = int(detection[3] * ratioh)
                    left = int(center_x - width * 0.5)
                    top = int(center_y - height * 0.5)

                    confidences.append(float(confidence))
                    boxes.append([left, top, width, height])
                    classIds.append(classId)
        # Perform non maximum suppression to eliminate redundant overlapping boxes with
        # lower confidences.
        indices = cv2.dnn.NMSBoxes(boxes, confidences, self.confThreshold, self.nmsThreshold).flatten()
        for i in indices:
            box = boxes[i]
            left = box[0]
            top = box[1]
            width = box[2]
            height = box[3]
            frame = self.drawPred(frame, classIds[i], confidences[i], left, top, left + width, top + height)
        return frame

    def drawPred(self, frame, classId, conf, left, top, right, bottom):
        if classId == 0:
            # Draw a bounding box.
            cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), thickness=4)

            label = '%.2f' % conf
            label = '%s:%s' % (self.classes[classId], label)

            # Display the label at the top of the bounding box
            labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
            top = max(top, labelSize[1])
            # cv.rectangle(frame, (left, top - round(1.5 * labelSize[1])), (left + round(1.5 * labelSize[0]), top + baseLine), (255,255,255), cv.FILLED)
            cv2.putText(frame, label, (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), thickness=2)
            return frame

    def detect(self, srcimg):
        img, newh, neww, padh, padw = self.resize_image(srcimg)
        blob = cv2.dnn.blobFromImage(img, scalefactor=1 / 255.0, swapRB=True)
        # blob = cv2.dnn.blobFromImage(self.preprocess(img))
        # Sets the input to the network
        self.net.setInput(blob, self._inputNames)

        # Runs the forward pass to get output of the output layers
        outs = self.net.forward(self.net.getUnconnectedOutLayersNames())[0].squeeze(axis=0)

        # inference output
        row_ind = 0
        for i in range(self.nl):
            h, w = int(self.inpHeight / self.stride[i]), int(self.inpWidth / self.stride[i])
            length = int(self.na * h * w)
            if self.grid[i].shape[2:4] != (h, w):
                self.grid[i] = self._make_grid(w, h)

            outs[row_ind:row_ind + length, 0:2] = (outs[row_ind:row_ind + length, 0:2] * 2. - 0.5 + np.tile(
                self.grid[i], (self.na, 1))) * int(self.stride[i])
            outs[row_ind:row_ind + length, 2:4] = (outs[row_ind:row_ind + length, 2:4] * 2) ** 2 * np.repeat(
                self.anchor_grid[i], h * w, axis=0)
            row_ind += length
        srcimg = self.postprocess(srcimg, outs, padsize=(newh, neww, padh, padw))
        return srcimg

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--imgpath', type=str, default='.\\yolov5-v6.1-opencv-onnxrun\\opencv\\images\\bus.jpg', help="image path")
    parser.add_argument('--modelpath', type=str, default='.\\yolov5-v6.1-opencv-onnxrun\\opencv\\weights\\yolov5n.onnx')
    parser.add_argument('--confThreshold', default=0.3, type=float, help='class confidence')
    parser.add_argument('--nmsThreshold', default=0.5, type=float, help='nms iou thresh')
    parser.add_argument('--objThreshold', default=0.3, type=float, help='object confidence')
    args = parser.parse_args()

    yolonet = yolov5(args.modelpath, confThreshold=args.confThreshold, nmsThreshold=args.nmsThreshold,
                     objThreshold=args.objThreshold)
    
    cap = cv2.VideoCapture(0)

    while True:
        ret, frame = cap.read()
        frame = cv2.flip(frame, 1)     # 绕y轴翻转

        srcimg = yolonet.detect(frame)

        winName = 'Deep learning object detection in OpenCV'
        # cv2.imshow(winName, srcimg)
        
        cv2.imshow(winName, frame)
        # 按esc键退出循环
        if cv2.waitKey(1) & 0xff == 27:
            break

    #释放资源
    cap.release()
    cv2.destroyAllWindows()

2. 调参

parser.add_argument('--confThreshold', default=0.3, type=float, help='classconfidence')
parser.add_argument('--nmsThreshold', default=0.5, type=float, help='nms iou thresh')
parser.add_argument('--objThreshold', default=0.3, type=float, help='object confidence')

针对yolov5参数的配置记录如下:

含义：

这几个参数是用于目标检测算法中的YOLO模型的阈值设置，用于控制检测的灵敏度和准确性。下面解释一下每个参数的含义：

--confThreshold：类别置信度阈值（Class Confidence Threshold）。该阈值用于过滤掉模型对于检测框所属类别置信度较低的预测。通常情况下，模型会对每个检测框预测其所属类别以及置信度，--confThreshold指定了一个阈值，低于这个置信度的检测结果将被过滤掉。较小的--confThreshold值会保留更多的检测结果，但可能包含更多的误检，而较大的值会过滤掉低置信度的检测结果，但可能丢失一些真实目标。
--nmsThreshold：非极大值抑制（NMS）的IoU阈值（Intersection over Union Threshold）。NMS用于抑制重叠的检测框，保留置信度最高的一个框。--nmsThreshold指定了一个IoU阈值，当两个检测框的IoU大于这个阈值时，较低置信度的框将被抑制。较小的--nmsThreshold值会保留更多的重叠框，较大的值会过滤掉重叠度较高的框。
--objThreshold：目标检测的置信度阈值（Object Confidence Threshold）。该阈值用于过滤掉检测框中预测目标的置信度较低的预测。较小的--objThreshold值会保留更多的检测结果，但可能包含更多的误检，而较大的值会过滤掉低置信度的检测结果。

调参方法：

调整这些参数可以影响目标检测算法的性能和效果。调参通常需要通过反复试验和评估来找到最优的参数组合。以下是一些调参的方法：

初始值：通常可以使用作者建议的默认值作为初始值，例如--confThreshold 0.3、--nmsThreshold 0.5和--objThreshold 0.3。
网格搜索：可以选择一组参数值范围，然后使用网格搜索或随机搜索来在这个范围内搜索最优的参数组合。比如可以尝试不同的阈值范围，例如--confThreshold [0.1, 0.3, 0.5]，--nmsThreshold [0.3, 0.5, 0.7]，--objThreshold [0.1, 0.3, 0.5]。
调整顺序：通常建议先调整--confThreshold和--objThreshold，再调整--nmsThreshold。因为--nmsThreshold主要影响重叠框的抑制，它的影响会受到前两个阈值的影响。
数据集评估：使用标注好的验证集数据，对不同参数组合的模型进行评估，观察其在验证集上的性能指标（如精度、召回率等），并选择表现最好的组合。
视觉调试：使用不同的参数值进行目标检测，观察其在图像上的结果，通过直观观察来判断检测效果是否满足需求。

需要注意的是，调参是一个相对复杂的过程，需要根据具体的数据集和任务需求进行适配。建议结合实际应用场景和数据特点，进行综合考虑和调整。同时，建议使用验证集来进行评估，以确保调整的参数能够在未知数据上表现良好。

八、双目测距算法

九、pyqt系统搭建

直接使用pip安装有问题，所有选择以下方式安装，这样安装的pyqt不能在miniconda虚拟环境中运行。所以在本地的python环境中使用。

【树莓派】：pyqt5安装（附code）_树莓派安装pyqt5_时间之里的博客-CSDN博客

1.更新源（可选）

sudo apt-get update
sudo apt-get upgrade

下载安装pyqt5需要的依赖库：

sudo apt-get install python3-pyqt5
sudo apt-get install pyqt5-dev-tools
sudo apt-get install qttools5-dev-tools

验证是否安装成功,运行一个以下命令

python3 -c "import PyQt5"

如果安装了anaconda环境，则运行

/bin/python -c "import PyQt5"

测试程序

from PyQt5.QtWidgets import QApplication, QLabel

app = QApplication([])
label = QLabel('Hello PyQt5')
label.show()
app.exec_()

十、AI数字人

AI数字人的制作过程主要如下链接：

1. AI绘图神器使用——Midjourney

2. 做AI数字人 D-ID

D-ID Creative Reality Studio

posted @ 2023-08-26 21:27 乞力马扎罗山的雪阅读(171) 评论(0) 编辑收藏举报

刷新页面返回顶部

乞力马扎罗山的雪

学习记录，主要方便自己回看

基于GPT4的AI镜子

基于GPT4的AI镜子

一、树莓派系统搭建

1. 搭建系统

2. 设置VNC远程访问

二、miniconda环境搭建

三、VPN配置

四、语音识别

五、语音合成

1. pyttsx3

2. 使用阿里云API

六、GPT API使用

七、yolov5算法部署

1. 部署使用

2. 调参

含义：

调参方法：

八、双目测距算法

九、pyqt系统搭建

十、AI数字人

1. AI绘图神器使用——Midjourney

2. 做AI数字人 D-ID

公告