语音识别,语义理解一站式解决之智能照相机(人脸识别,olami)

 

转载请注明CSDN博文地址:http://blog.csdn.net/ls0609/article/details/76546716

olami sdk实现了把录音或者文字转化为用户可以理解的json字符串从而实现语义理解,用户可以定义自己的

语义,通过这种方式可以实现用户需要的语义理解。前面写了两篇语音识别,语义理解的博文,分别是语音

在线听书和语音记帐软件,本篇是语音智能照相机。

1.智能照相机的功能

手机后摄像头像素比较高,如果用后设想头对准自己自拍,那么看不到屏幕的情况下怎么知道

自己在不在镜头中呢?而本篇做的智能照相机就可以为您解决这个问题。

想要做的是这样一个照相机app,可以语音切换摄像头,人脸识别并语音播报识别的人脸是否在屏幕中央,

是偏向哪里,当人脸居中的时候,提示用户可以拍照了,用户说“拍照”,“茄子”就会自动抓拍并保存图

片在手机中。

抓了两张应用运行时的图片: 

 

 

 

 

 

 

2.eclipse中的lib目录结构如下

 

 

assets下面的事tts播报的资源文件 
libs目录下, 
libtts.so tts播报所需的库文件 
libspeex.so 语音识别所需的库文件 
libolamsc.so 语音识别所需的库文件

tts.jar tts播报所需的库文件 
voicesdk_android.jar 语音识别所需的库文件

 

3.AndroidManifest.xml

<?xml version="1.0" encoding="utf-8"?>

<manifest xmlns:android="http://schemas.android.com/apk/res/android"

    package="com.olami"

    android:versionCode="1"

    android:versionName="1.0" >

 

    <uses-sdk

        android:minSdkVersion="8"

        android:targetSdkVersion="14" />

 

    <uses-permission android:name="android.permission.RECORD_AUDIO"/>

    <uses-permission android:name="android.permission.INTERNET"/>

    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/>

    <uses-permission android:name="android.permission.ACCESS_WIFI_STATE"/>

    <uses-permission android:name="android.permission.READ_PHONE_STATE"/>

    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

    <uses-permission android:name="android.permission.MOUNT_UNMOUNT_FILESYSTEMS"/>

    <uses-permission android:name="android.permission.CAMERA" />

 

    <application

        android:allowBackup="true"

        android:icon="@drawable/ic_launcher"

        android:label="@string/app_name"

        android:theme="@style/AppTheme" >

        <activity

            android:name=".MainActivity"

            android:label="@string/app_name" >

            <intent-filter>

                <action android:name="android.intent.action.MAIN" />

 

                <category android:name="android.intent.category.LAUNCHER" />

            </intent-filter>

        </activity>

    </application>

 

</manifest>

 

需要录音,网络,读写sd卡,拍照等权限。

 

4.layout布局

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"

    xmlns:tools="http://schemas.android.com/tools"

    android:layout_width="match_parent"

    android:layout_height="match_parent">

 

    <FrameLayout

        android:layout_width="match_parent"

        android:layout_height="match_parent">

    <SurfaceView android:id="@+id/sView"

        android:layout_width="match_parent"

        android:layout_height="wrap_content"/>

 

        <com.olami.FaceView

            android:id="@+id/faceView"

            android:layout_width="match_parent"

            android:layout_height="match_parent"/>

    </FrameLayout>                   

 

    <Button

        android:id="@+id/btn_start"

        android:layout_width="wrap_content"

        android:layout_height="wrap_content" 

        android:layout_alignParentBottom="true"

        android:layout_centerHorizontal="true"

        android:text="开始" />

</RelativeLayout>

 

 

 

 

在surfaceview中自定义了一个FaceView,faceview用来显示抓拍的人脸。 
屏幕最下方有个button,因为这个版本暂时不支持语音唤醒功能(后续添加后再更新),添加一个button用于用户想随时说拍照的时候点击触发用。

 

5.MainActivity.java 和FaceView.java

- 1.MainActivity.Java

@Override

    protected void onCreate(Bundle savedInstanceState) {

        super.onCreate(savedInstanceState);

        setContentView(R.layout.layout_camera);

 

        initHandler();//用于处理录音状态回调的消息

 

        initView(); //初始化界面

 

        initViaVoiceRecognizerListener(); //初始化olami语音回调监听

 

        init();  //初始化olami语音识别sdk

 

        initTts(); //初始化tts语音播报

 

        DisplayMetrics dm = new DisplayMetrics();//定义DisplayMetrics对象

        getWindowManager().getDefaultDisplay().getMetrics(dm);//取得窗口属性

        mScreenCenterx = dm.widthPixels/2;//窗口的宽度

        mScreenCentery = dm.heightPixels/2; //窗口的高度

}

 

以下是olamisdk的初始化

 

public void init()

{

        mOlamiVoiceRecognizer = new OlamiVoiceRecognizer(MainActivity.this);

        TelephonyManager telephonyManager=

                                   (TelephonyManager) this.getSystemService(

                                    this.getBaseContext().TELEPHONY_SERVICE);

        String imei=telephonyManager.getDeviceId();

        mOlamiVoiceRecognizer.init(imei);//设置身份标识,可以填null

 

        //设置识别结果回调listener

        mOlamiVoiceRecognizer.setListener(mOlamiVoiceRecognizerListener);

 

        //设置支持的语音类型,优先选择中文简体       

        mOlamiVoiceRecognizer.setLocalization(

                           OlamiVoiceRecognizer.LANGUAGE_SIMPLIFIED_CHINESE);

         mOlamiVoiceRecognizer.setAuthorization(

               "51a4bb56ba954655a4fc834bfdc46af1",

               "asr",

               "68bff251789b426896e70e888f919a6d",

               "nli");

        //注册Appkey,在olami官网注册应用后生成的appkey

        //注册api,请直接填写“asr”,标识语音识别类型

        //注册secret,在olami官网注册应用后生成的secret

        //注册seq ,请填写“nli”  

 

    //录音时尾音结束时间,建议填//2000ms        

    mOlamiVoiceRecognizer.setVADTailTimeout(2000);

    //设置经纬度信息,不愿上传位置信息,可以填0

    mOlamiVoiceRecognizer.setLatitudeAndLongitude(

                                         31.155364678184498,121.34882432933009);

}

 

定义OlamiVoiceRecognizerListener,此处代码就不贴了。

onError(int errCode)//出错回调,可以对比官方文档错误码看是什么错误 
onEndOfSpeech()//录音结束 
onBeginningOfSpeech()//录音开始 
onResult(String result, int type)//result是识别结果JSON字符串 
onCancel()//取消识别,不会再返回识别结果 
onUpdateVolume(int volume)//录音时的音量,1-12个级别大小音量

 

 

 

 

 

 

以下是handler消息处理,包含语义解析

 

private void initHandler()

    {

        mHandler = new Handler(){

            @Override

            public void handleMessage(Message msg)

            {

                switch (msg.what){

                case MessageConst.CLIENT_ACTION_START_RECORED:

                    mBtnStart.setText("录音中");

                    break;

                case MessageConst.CLIENT_ACTION_STOP_RECORED:

                    mBtnStart.setText("识别中");

                    break;

                case MessageConst.CLIENT_ACTION_CANCEL_RECORED:

                    mBtnStart.setText("开始");

                    break;

                case MessageConst.CLIENT_ACTION_ON_ERROR:

                    mBtnStart.setText("开始");

                    break;

                case MessageConst.CLIENT_ACTION_UPDATA_VOLUME:

                    //mTextViewVolume.setText("音量: "+msg.arg1);

                    break;

                case MessageConst.SERVER_ACTION_RETURN_RESULT:

                    mBtnStart.setText("开始");

                    try{

                        String message = (String) msg.obj;

                        String input = null;

                        JSONObject jsonObject = new JSONObject(message);

                        JSONArray jArrayNli =

                        jsonObject.optJSONObject("data").optJSONArray("nli");

                        JSONObject jObj = jArrayNli.optJSONObject(0);

                        JSONArray jArraySemantic = null;

                        if(message.contains("semantic"))

                        {

                          jArraySemantic = jObj.getJSONArray("semantic");

                          String modifier =

                              jArraySemantic.optJSONObject(0).optJSONArray(

                                                   "modifier").optString(0);

                          if("take_photo".equals(modifier))

                              capture();   

                          else if("switch_camera".equals(modifier))

                              switchCamera();

                        }

                        else{

                          Log.i("ppp","result error");

                        }

                    }

                    catch(Exception e)

                    {

                        e.printStackTrace();

                    }                  

                    break; 

                case MessageConst.CLIENT_ACTION_UPDATA_FACEDECTION_DATA:

                    if(mIsRecording)

                        break;

                    RectF rect = (RectF) msg.obj;

                    mLeft = rect.left;

                    mRight = rect.right;

                    mTop = rect.top;

                    mBottom = rect.bottom;

                    float centerx = mLeft +(mRight - mLeft)/2;

                    float centery = mTop + (mBottom-mTop)/2;

                    String promptString = "";

                    if(centerx<mScreenCenterx && Math.abs(mScreenCenterx-centerx) >100)

                        promptString = "位置偏左,";

                    else if((centerx > mScreenCenterx)&&

                                            (Math.abs(centerx -mScreenCenterx)>100))               

                        promptString = "位置偏右,";

                    if((centery < mScreenCentery)&&(

                                              Math.abs(mScreenCentery-centery) >200))

                    {

                        if("".equals(promptString))

                            promptString = "位置偏上";

                        else

                            promptString += "并且偏上";

                    }

                    else if((centery > mScreenCentery)&&

                    (Math.abs(centery -mScreenCenterx)>200))

                    {

                        if("".equals(promptString))

                            promptString = "位置偏下";

                        else

                            promptString += "并且偏下";

                    }

                    if("".equals(promptString))

                    {

                        promptString = "位置已经居中,可以拍照了";

                        mIsCenter = true;

                    }

                    else

                    {

                        mIsCenter = false;

                    }

 

                    ITtsListener ttsListener = new ITtsListener()

                    {

 

                        @Override

                        public void onPlayEnd() {

                            if(mIsCenter)

                            {

                                if(mOlamiVoiceRecognizer != null)

                                    mOlamiVoiceRecognizer.start(); 

                            }

                        }

 

                        @Override

                        public void onPlayFlagEnd(String arg0) {

 

                        }

 

                        @Override

                        public void onTTSPower(long arg0) {

 

                        }

 

                    };

                    TtsPlayer.playText(MainActivity.this,

                          promptString, ttsListener,Tts.TTS_SYSTEM_PRIORITY);  

                    break;

                }

            }

        };

    }

 

在MessageConst.SERVER_ACTION_RETURN_RESULT消息中,通过解析服务器返回的json字符串,可以找到modifier这个字段的值,如果是take_photo表示拍照,如果是switch_camera表示切换摄像头。

当用户说拍照或者茄子的时候,服务器返回如下json字符串:

[

  {

    "desc_obj": {

      "status": 0

    },

    "semantic": [

      {

        "app": "camera",

        "input": "拍照",

        "slots": [

 

        ],

        "modifier": [

          "take_photo"

        ],

        "customer": "58df512384ae11f0bb7b487e"

      }

    ],

    "type": "camera"

  }

]

 

这个拍照,茄子等语法都是自己定义的,详细请看:

olami开放平台语法编写简介:http://blog.csdn.net/ls0609/article/details/71624340 
olami开放平台语法官方介绍:https://cn.olami.ai/wiki/?mp=nli&content=nli2.html

  • 2.人脸识别FaceView.java

 

 

 

 

 

public class FaceView extends View {

    private Camera.Face[] mFaces;

    private Paint mPaint;

    private Matrix matrix = new Matrix();

    private RectF mRectF = new RectF();

    private Handler mHandler;

    private long mCurrentTime;

    public void setFaces(Camera.Face[] faces) {

        mFaces = faces;

        invalidate();

    }

 

    public FaceView(Context context) {

        super(context);

        init(context);

    }

 

    public FaceView(Context context, AttributeSet attrs) {

        super(context, attrs);

        init(context);

    }

 

    public FaceView(Context context, AttributeSet attrs, int defStyleAttr) {

        super(context, attrs, defStyleAttr);

        init(context);

    }

 

    public void init(Context context) {

        mPaint = new Paint();

        mPaint.setColor(Color.RED);

        mPaint.setStrokeWidth(5f);

        mPaint.setStyle(Paint.Style.STROKE);

 

    }

 

    public void setHandler(Handler handler)

    {

        mHandler = handler;

    }

 

    @Override

    protected void onDraw(Canvas canvas) {

        super.onDraw(canvas);

        if (mFaces == null || mFaces.length < 0) {

            return;

        }

        //准备矩形框

        MainActivity.prepareMatrix(matrix, false, 270, getWidth(), getHeight());

        canvas.save();

        matrix.postRotate(0);

        canvas.rotate(-0);

        RectF tempRectF = new RectF();

        long tempTime = System.currentTimeMillis();

        for (int i = 0; i < mFaces.length; i++) {

            mRectF.set(mFaces[i].rect);//获取face矩形框值

            float temp = mRectF.top;

            mRectF.top = -mRectF.bottom;

            mRectF.bottom = - temp;  //上下交换        

            matrix.mapRect(mRectF);

            canvas.drawRect(mRectF, mPaint);//绘制矩形框

            tempRectF.set(mRectF);

            if((mCurrentTime == 0) ||((tempTime-mCurrentTime)/1000) >= 4)

            {//超过4秒,发送一次识别face矩形框值

                mHandler.sendMessage(mHandler.obtainMessage(

                MessageConst.CLIENT_ACTION_UPDATA_FACEDECTION_DATA, tempRectF));

                mCurrentTime = tempTime;

            }

            Log.i("ppp","mRectF.left = "+mRectF.left+"   mRectF.right = "+mRectF.right);

        }

        canvas.restore();

    }

}

 

自定义FaceView中,由于旋转了270度,所以需要face矩形框上下值进行交换,不然人脸识别总是左右或者上下不能追踪。每隔4秒发送一次矩形框的值,在MainActivity.java的handler中收到这个消息并进行是否居中的判断。

 

case MessageConst.CLIENT_ACTION_UPDATA_FACEDECTION_DATA:

    if(mIsRecording)

        break;

    RectF rect = (RectF) msg.obj;

    mLeft = rect.left;

    mRight = rect.right;

    mTop = rect.top;

    mBottom = rect.bottom;//保存上下左右的矩形框值

    float centerx = mLeft +(mRight - mLeft)/2;//获取矩形框横向中心点位置

    float centery = mTop + (mBottom-mTop)/2;//获取矩形框纵向中心点位置

    String promptString = "";

    if(centerx<mScreenCenterx && Math.abs(mScreenCenterx-centerx) >100)

        promptString = "位置偏左,";

    else if((centerx > mScreenCenterx)&&

                            (Math.abs(centerx -mScreenCenterx)>100))               

        promptString = "位置偏右,";

    if((centery < mScreenCentery)&&(

                              Math.abs(mScreenCentery-centery) >200))

    {

        if("".equals(promptString))

            promptString = "位置偏上";

        else

            promptString += "并且偏上";

    }

    else if((centery > mScreenCentery)&&

    (Math.abs(centery -mScreenCenterx)>200))

    {

        if("".equals(promptString))

            promptString = "位置偏下";

        else

            promptString += "并且偏下";

    }

    if("".equals(promptString))

    {

        promptString = "位置已经居中,可以拍照了";

        mIsCenter = true;

    }

    else

    {

        mIsCenter = false;

    }

 

    ITtsListener ttsListener = new ITtsListener()

    {

 

        @Override

        public void onPlayEnd() {

            if(mIsCenter)

            {

                if(mOlamiVoiceRecognizer != null)

                    mOlamiVoiceRecognizer.start(); 

            }

        }

 

        @Override

        public void onPlayFlagEnd(String arg0) {

 

        }

 

        @Override

        public void onTTSPower(long arg0) {

 

        }

 

    };

    TtsPlayer.playText(MainActivity.this,

          promptString, ttsListener,Tts.TTS_SYSTEM_PRIORITY);  

break;

 

可以获得屏幕的中心点和人脸识别的矩形框的中心点,对比横向和纵向的中心点大小和绝对值差,当横向的值差100像素以上就认为横向不居中,并且根据大小分居左和居右,纵向大小差值在200像素以上认为纵向不居中,并且根据大小分偏上和偏下,这个100,200像素值用户可以自己调节到合适的值。

调用TtsPlayer.playText提示,当播报结束后回调到onPlayEnd() ,如果居中那么已经提示用户可以拍照了,此时启动录音程序,用户不用点击button也不用唤醒,只许说拍照或者茄子就可以拍照了。

6.源码下载链接

https://pan.baidu.com/s/1qXITWs8

7.相关链接

语音在线听书:http://blog.csdn.net/ls0609/article/details/71519203

语音记账demo:http://blog.csdn.net/ls0609/article/details/72765789

olami开放平台语法编写简介:http://blog.csdn.net/ls0609/article/details/71624340

olami开放平台语法官方介绍:https://cn.olami.ai/wiki/?mp=nli&content=nli2.html