webrtc && aiortc
WebRTC_API
https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API
WebRTC (Web Real-Time Communication) is a technology that enables Web applications and sites to capture and optionally stream audio and/or video media, as well as to exchange arbitrary data between browsers without requiring an intermediary. The set of standards that comprise WebRTC makes it possible to share data and perform teleconferencing peer-to-peer, without requiring that the user install plug-ins or any other third-party software.
WebRTC consists of several interrelated APIs and protocols which work together to achieve this. The documentation you'll find here will help you understand the fundamentals of WebRTC, how to set up and use both data and media connections, and more.
WebRTC concepts and usage
WebRTC serves multiple purposes; together with the Media Capture and Streams API, they provide powerful multimedia capabilities to the Web, including support for audio and video conferencing, file exchange, screen sharing, identity management, and interfacing with legacy telephone systems including support for sending DTMF (touch-tone dialing) signals. Connections between peers can be made without requiring any special drivers or plug-ins, and can often be made without any intermediary servers.
Connections between two peers are represented by the
RTCPeerConnection
interface. Once a connection has been established and opened usingRTCPeerConnection
, media streams (MediaStream
s) and/or data channels (RTCDataChannel
s) can be added to the connection.Media streams can consist of any number of tracks of media information; tracks, which are represented by objects based on the
MediaStreamTrack
interface, may contain one of a number of types of media data, including audio, video, and text (such as subtitles or even chapter names). Most streams consist of at least one audio track and likely also a video track, and can be used to send and receive both live media or stored media information (such as a streamed movie).You can also use the connection between two peers to exchange arbitrary binary data using the
RTCDataChannel
interface. This can be used for back-channel information, metadata exchange, game status packets, file transfers, or even as a primary channel for data transfer.Interoperability
WebRTC is in general well supported in modern browsers, but some incompatibilities remain. The adapter.js library is a shim to insulate apps from these incompatibilities.
https://www.apizee.com/what-is-webrtc.php
What is WebRTC?
WebRTC is an open-source project that enables real-time communication and data transfer across browsers and devices. It makes real-time voice, text, and—importantly—video communication possible.
One of the key components of WebRTC is that it allows browsers and clients to communicate directly without having to send information to an intermediary server. It effectively cuts out the middle step so that data is kept between the users.
This is different from a bidirectional communication protocol like WebSockets, where real-time communication goes through the server before it reaches the end client.
The World Wide Web Consortium (W3C) developed browser-based WebRTC to allow secure, direct communication between browsers and devices. Most popular browsers (e.g., Google Chrome, Apple Safari, Mozilla Firefox, and Microsoft Edge) support this technology.
https://telecom.altanai.com/category/web-realtimecomm-webrtc/webrtc-standards/
https://www.researchgate.net/figure/WebRTC-Datachannel-Establish-Flows_fig6_308671323
P2P EXAMPLE
https://webrtc.github.io/samples/src/content/peerconnection/pc1/
OTHER
https://webrtc.github.io/samples/
aiortc
https://github.com/aiortc/aiortc
aiortc
is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python. It is built on top ofasyncio
, Python's standard asynchronous I/O framework.The API closely follows its Javascript counterpart while using pythonic constructs:
- promises are replaced by coroutines
- events are emitted using
pyee.EventEmitter
To learn more about
aiortc
please read the documentation.The main WebRTC and ORTC implementations are either built into web browsers, or come in the form of native code. While they are extensively battle tested, their internals are complex and they do not provide Python bindings. Furthermore they are tightly coupled to a media stack, making it hard to plug in audio or video processing algorithms.
In contrast, the
aiortc
implementation is fairly simple and readable. As such it is a good starting point for programmers wishing to understand how WebRTC works or tinker with its internals. It is also easy to create innovative products by leveraging the extensive modules available in the Python ecosystem. For instance you can build a full server handling both signaling and data channels or apply computer vision algorithms to video frames using OpenCV.Furthermore, a lot of effort has gone into writing an extensive test suite for the
aiortc
code to ensure best-in-class code quality.
中文教程
https://zhuanlan.zhihu.com/p/387772163
DEMO(默认与aiohttp集成)
https://github.com/aiortc/aiortc/tree/main/examples/server
REACT VERSION
https://github.com/silverbulletmdc/py_webrtc_react_video_demo
与fastapi集成
提供docker compose部署
https://github.com/zoetaka38/fastapi-aiortc/tree/main
识别人脸、眼镜、笑容
https://github.com/DJWOMS/webrtc_opencv_fastapi/tree/main
import asyncio import os import cv2 from av import VideoFrame from imageai.Detection import VideoObjectDetection from aiortc import MediaStreamTrack, RTCPeerConnection, RTCSessionDescription from aiortc.contrib.media import MediaPlayer, MediaRelay, MediaBlackhole from fastapi import FastAPI from fastapi.staticfiles import StaticFiles from starlette.requests import Request from starlette.responses import HTMLResponse from starlette.templating import Jinja2Templates from src.schemas import Offer ROOT = os.path.dirname(__file__) app = FastAPI() app.mount("/static", StaticFiles(directory="static"), name="static") templates = Jinja2Templates(directory="templates") faces = cv2.CascadeClassifier(cv2.data.haarcascades+"haarcascade_frontalface_default.xml") eyes = cv2.CascadeClassifier(cv2.data.haarcascades+"haarcascade_eye.xml") smiles = cv2.CascadeClassifier(cv2.data.haarcascades+"haarcascade_smile.xml") class VideoTransformTrack(MediaStreamTrack): """ A video stream track that transforms frames from an another track. """ kind = "video" def __init__(self, track, transform): super().__init__() self.track = track self.transform = transform async def recv(self): frame = await self.track.recv() if self.transform == "cartoon": img = frame.to_ndarray(format="bgr24") # prepare color img_color = cv2.pyrDown(cv2.pyrDown(img)) for _ in range(6): img_color = cv2.bilateralFilter(img_color, 9, 9, 7) img_color = cv2.pyrUp(cv2.pyrUp(img_color)) # prepare edges img_edges = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) img_edges = cv2.adaptiveThreshold( cv2.medianBlur(img_edges, 7), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 2, ) img_edges = cv2.cvtColor(img_edges, cv2.COLOR_GRAY2RGB) # combine color and edges img = cv2.bitwise_and(img_color, img_edges) # rebuild a VideoFrame, preserving timing information new_frame = VideoFrame.from_ndarray(img, format="bgr24") new_frame.pts = frame.pts new_frame.time_base = frame.time_base return new_frame elif self.transform == "edges": # perform edge detection img = frame.to_ndarray(format="bgr24") img = cv2.cvtColor(cv2.Canny(img, 100, 200), cv2.COLOR_GRAY2BGR) # rebuild a VideoFrame, preserving timing information new_frame = VideoFrame.from_ndarray(img, format="bgr24") new_frame.pts = frame.pts new_frame.time_base = frame.time_base return new_frame elif self.transform == "rotate": # rotate image img = frame.to_ndarray(format="bgr24") rows, cols, _ = img.shape M = cv2.getRotationMatrix2D((cols / 2, rows / 2), frame.time * 45, 1) img = cv2.warpAffine(img, M, (cols, rows)) # rebuild a VideoFrame, preserving timing information new_frame = VideoFrame.from_ndarray(img, format="bgr24") new_frame.pts = frame.pts new_frame.time_base = frame.time_base return new_frame elif self.transform == "cv": img = frame.to_ndarray(format="bgr24") face = faces.detectMultiScale(img, 1.1, 19) for (x, y, w, h) in face: cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) eye = eyes.detectMultiScale(img, 1.1, 19) for (x, y, w, h) in eye: cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) # smile = smiles.detectMultiScale(img, 1.1, 19) # for (x, y, w, h) in smile: # cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 5), 2) new_frame = VideoFrame.from_ndarray(img, format="bgr24") new_frame.pts = frame.pts new_frame.time_base = frame.time_base return new_frame else: return frame def create_local_tracks(play_from=None): if play_from: player = MediaPlayer(play_from) return player.audio, player.video else: options = {"framerate": "30", "video_size": "1920x1080"} # if relay is None: # if platform.system() == "Darwin": # webcam = MediaPlayer( # "default:none", format="avfoundation", options=options # ) # elif platform.system() == "Windows": # webcam = MediaPlayer("video.mp4") webcam = MediaPlayer( "video=FULL HD 1080P Webcam", format="dshow", options=options ) # else: # webcam = MediaPlayer("/dev/video0", format="v4l2", options=options) # audio, video = VideoTransformTrack(webcam.video, transform="cv") relay = MediaRelay() return None, relay.subscribe(webcam.video) @app.get("/", response_class=HTMLResponse) async def index(request: Request): return templates.TemplateResponse("index.html", {"request": request}) @app.get("/cv", response_class=HTMLResponse) async def index(request: Request): return templates.TemplateResponse("index_cv.html", {"request": request}) @app.post("/offer") async def offer(params: Offer): offer = RTCSessionDescription(sdp=params.sdp, type=params.type) pc = RTCPeerConnection() pcs.add(pc) recorder = MediaBlackhole() @pc.on("connectionstatechange") async def on_connectionstatechange(): print("Connection state is %s" % pc.connectionState) if pc.connectionState == "failed": await pc.close() pcs.discard(pc) # open media source audio, video = create_local_tracks() # handle offer await pc.setRemoteDescription(offer) await recorder.start() # send answer answer = await pc.createAnswer() await pc.setRemoteDescription(offer) for t in pc.getTransceivers(): if t.kind == "audio" and audio: pc.addTrack(audio) elif t.kind == "video" and video: pc.addTrack(video) await pc.setLocalDescription(answer) return {"sdp": pc.localDescription.sdp, "type": pc.localDescription.type} @app.post("/offer_cv") async def offer(params: Offer): offer = RTCSessionDescription(sdp=params.sdp, type=params.type) pc = RTCPeerConnection() pcs.add(pc) recorder = MediaBlackhole() relay = MediaRelay() @pc.on("connectionstatechange") async def on_connectionstatechange(): print("Connection state is %s" % pc.connectionState) if pc.connectionState == "failed": await pc.close() pcs.discard(pc) # open media source # audio, video = create_local_tracks() @pc.on("track") def on_track(track): # if track.kind == "audio": # pc.addTrack(player.audio) # recorder.addTrack(track) if track.kind == "video": pc.addTrack( VideoTransformTrack(relay.subscribe(track), transform=params.video_transform) ) # if args.record_to: # recorder.addTrack(relay.subscribe(track)) @track.on("ended") async def on_ended(): await recorder.stop() # handle offer await pc.setRemoteDescription(offer) await recorder.start() # send answer answer = await pc.createAnswer() await pc.setRemoteDescription(offer) await pc.setLocalDescription(answer) return {"sdp": pc.localDescription.sdp, "type": pc.localDescription.type} pcs = set() args = '' @app.on_event("shutdown") async def on_shutdown(): # close peer connections coros = [pc.close() for pc in pcs] await asyncio.gather(*coros) pcs.clear()
OTHER
https://github.com/tsonglew/webrtc-stream
https://github.com/jtboing/liveness-web-demo/tree/master
https://github.com/AllanGallop/Webcam_Censoring-webRTC-YOLOv3/tree/master