计算机考研--复试英语
Abstract 1
In recent years, machine learning has developed rapidly, especially in the deep learning, where remarkable achievements are obtained in image, voice, natural language processing and other fields. The expressive ability of machine learning algorithm has been greatly improved; however, with the increase of model complexity, the interpretability of computer learning algorithm has deteriorated. So far, the interpretability of machine learning remains as a challenge. The trained models via algorithms are regarded as black boxes, which seriously hamper the use of machine learning in certain fields, such as medicine, finance and so on. Presently, only a few works emphasis on the interpretability of machine learning. Therefore, this paper aims to classify, analyze and compare the existing interpretable methods; on the one hand, it expounds the definition and measurement of interpretability, while on the other hand, for the different interpretable objects, it summarizes and analyses various interpretable techniques of machine learning from three aspects: model understanding, prediction result interpretation and mimic model understanding. Moreover, the paper also discusses the challenges and opportunities faced by machine learning interpretable methods and the possible development direction in the future. The proposed interpretation methods should also be useful for putting many research open questions in perspective.
摘 要 近年来,机器学习发展迅速,尤其是深度学习在图像、声音、自然语言处理等领域取得卓越成效.机器学习算法的表示能力大幅度提高,但是伴随着模型复杂度的增加,机器学习算法的可解释性越差,至今,机器学习的可解释性依旧是个难题.通过算法训练出的模型被看作成黑盒子,严重阻碍了机器学习在某些特定领域的使用,譬如医学、金融等领域.目前针对机器学习的可解释性综述性的工作极少,因此,将现有的可解释方法进行归类描述和分析比较,一方面对可解释性的定义、度量进行阐述,另一方面针对可解释对象的不同,从模型的解释、预测结果的解释和模仿者模型的解释3个方面,总结和分析各种机器学习可解释技术,并讨论了机器学习可解释方法面临的挑战和机遇以及未来的可能发展方向.
Abstract 2
Deep learning is an important field of machine learning research, which is widely used in industry for its powerful feature extraction capabilities and advanced performance in many applications. However, due to the bias in training data labeling and model design, research shows that deep learning may aggravate human bias and discrimination in some applications, which results in unfairness during the decision-making process, thereby will cause negative impact to both individuals and socials. To improve the reliability of deep learning and promote its development in the field of fairness, we review the sources of bias in deep learning, debiasing methods for different types biases, fairness measure metrics for measuring the effect of debiasing, and current popular debiasing platforms, based on the existing research work. In the end we explore the open issues in existing fairness research field and future development trends.
摘要: 深度学习是机器学习研究中的一个重要领域,它具有强大的特征提取能力,且在许多应用中表现出先进的性能,因此在工业界中被广泛应用.然而,由于训练数据标注和模型设计存在偏见,现有的研究表明深度学习在某些应用中可能会强化人类的偏见和歧视,导致决策过程中的不公平现象产生,从而对个人和社会产生潜在的负面影响.为提高深度学习的应用可靠性、推动其在公平领域的发展,针对已有的研究工作,从数据和模型2方面出发,综述了深度学习应用中的偏见来源、针对不同类型偏见的去偏方法、评估去偏效果的公平性评价指标、以及目前主流的去偏平台,最后总结现有公平性研究领域存在的开放问题以及未来的发展趋势.
Abstract 3
TensorFlow Lite (TFLite) is a lightweight, fast and cross-platform open source machine learning framework specifically designed for mobile and IoT. It’s part of TensorFlow and supports multiple platforms such as Android, iOS, embedded Linux, and MCU etc. It greatly reduces the barrier for developers, accelerates the development of on-device machine learning (ODML), and makes ML run everywhere. This article introduces the trend, challenges and typical applications of ODML; the origin and system architecture of TFLite; best practices and tool chains suitable for ML beginners; and the roadmap of TFLite.
摘要: TensorFlow Lite(TFLite)是一个轻量、快速、跨平台的专门针对移动和IoT场景的开源机器学习框架,是TensorFlow的一部分,支持安卓、iOS、嵌入式Linux以及MCU等多个平台部署.它大大降低开发者使用门槛,加速端侧机器学习的发展,推动机器学习无处不在.介绍了端侧机器学习的浪潮、挑战和典型应用;TFLite的起源和系统架构;TFLite的最佳实践,以及适合初学者的工具链;展望了未来的发展方向.
Abstract 4
The rapid development of the Internet accesses many new applications including real time multi-media service, remote cloud service, etc. These applications require various types of service quality, which is a significant challenge towards current best effort routing algorithms. Since the recent huge success in applying machine learning in game, computer vision and natural language processing, many people tries to design “smart” routing algorithms based on machine learning methods. In contrary with traditional model-based, decentralized routing algorithms (e.g.OSPF), machine learning based routing algorithms are usually data-driven, which can adapt to dynamically changing network environments and accommodate different service quality requirements. Data-driven routing algorithms based on machine learning approach have shown great potential in becoming an important part of the next generation network. However, researches on artificial intelligent routing are still on a very beginning stage. In this paper we firstly introduce current researches on data-driven routing algorithms based on machine learning approach, showing the main ideas, application scenarios and pros and cons of these different works. Our analysis shows that current researches are mainly for the principle of machine learning based routing algorithms but still far from deployment in real scenarios. So we then analyze different training and deploying methods for machine learning based routing algorithms in real scenarios and propose two reasonable approaches to train and deploy such routing algorithms with low overhead and high reliability. Finally, we discuss the opportunities and challenges and show several potential research directions for machine learning based routing algorithms in the future.
摘要: 互联网的飞速发展催生了很多新型网络应用,其中包括实时多媒体流服务、远程云服务等.现有尽力而为的路由转发算法难以满足这些应用所带来的多样化的网络服务质量需求.随着近些年将机器学习方法应用于游戏、计算机视觉、自然语言处理获得了巨大的成功,很多人尝试基于机器学习方法去设计智能路由算法.相比于传统数学模型驱动的分布式路由算法而言,基于机器学习的路由算法通常是数据驱动的,这使得其能够适应动态变化的网络环境以及多样的性能评价指标优化需求.基于机器学习的数据驱动智能路由算法目前已经展示出了巨大的潜力,未来很有希望成为下一代互联网的重要组成部分.然而现有对于智能路由的研究仍然处于初步阶段.首先介绍了现有数据驱动智能路由算法的相关研究,展现了这些方法的核心思想和应用场景并分析了这些工作的优势与不足.分析表明,现有基于机器学习的智能路由算法研究主要针对算法原理,这些路由算法距离真实环境下部署仍然很遥远.因此接下来分析了不同的真实场景智能路由算法训练和部署方案并提出了2种合理的训练部署框架以使得智能路由算法能够低成本、高可靠性地在真实场景被部署.最后分析了基于机器学习的智能路由算法未来发展中所面临的机遇与挑战并给出了未来的研究方向.
Abstract 5
In recent years, the rapid development of Internet technology has greatly facilitated the daily life of human, and it is inevitable that massive information erupts in a blowout. How to quickly and effectively obtain the required information on the Internet is an urgent problem. The automatic text summarization technology can effectively alleviate this problem. As one of the most important fields in natural language processing and artificial intelligence, it can automatically produce a concise and coherent summary from a long text or text set through computer, in which the summary should accurately reflect the central themes of source text. In this paper, we expound the connotation of automatic summarization, review the development of automatic text summarization technique and introduce two main techniques in detail: extractive and abstractive summarization, including feature scoring, classification method, linear programming, submodular function, graph ranking, sequence labeling, heuristic algorithm, deep learning, etc. We also analyze the datasets and evaluation metrics that are commonly used in automatic summarization. Finally, the challenges ahead and the future trends of research and application have been predicted.
摘要: 近年来,互联网技术的蓬勃发展极大地便利了人类的日常生活,不可避免的是互联网中的信息呈井喷式爆发,如何从中快速有效地获取所需信息显得极为重要.自动文本摘要技术的出现可以有效缓解该问题,其作为自然语言处理和人工智能领域的重要研究内容之一,利用计算机自动地从长文本或文本集合中提炼出一段能准确反映源文中心内容的简洁连贯的短文.探讨自动文本摘要任务的内涵,回顾和分析了自动文本摘要技术的发展,针对目前主要的2种摘要产生形式(抽取式和生成式)的具体工作进行了详细介绍,包括特征评分、分类算法、线性规划、次模函数、图排序、序列标注、启发式算法、深度学习等算法.并对自动文本摘要常用的数据集以及评价指标进行了分析,最后对其面临的挑战和未来的研究趋势、应用等进行了预测.
Abstract 6
With the high-speed development of Internet of things, wearable devices and mobile communication technology, large-scale data continuously generate and converge to multiple data collectors, which influences people’s life in many ways. Meanwhile, it also causes more and more severe privacy leaks. Traditional privacy aware mechanisms such as differential privacy, encryption and anonymization are not enough to deal with the serious situation. What is more, the data convergence leads to data monopoly which hinders the realization of the big data value seriously. Besides, tampered data, single point failure in data quality management and so on may cause untrustworthy data-driven decision-making. How to use big data correctly has become an important issue. For those reasons, we propose the data transparency, aiming to provide solution for the correct use of big data. Blockchain originated from digital currency has the characteristics of decentralization, transparency and immutability, and it provides an accountable and secure solution for data transparency. In this paper, we first propose the definition and research dimension of the data transparency from the perspective of big data life cycle, and we also analyze and summary the methods to realize data transparency. Then, we summary the research progress of blockchain-based data transparency. Finally, we analyze the challenges that may arise in the process of blockchain-based data transparency.
摘要: 物联网、穿戴设备和移动通信等技术的高速发展促使数据源源不断地产生并汇聚至多方数据收集者,由此带来更严峻的隐私泄露问题, 然而传统的差分隐私、加密和匿名等隐私保护技术还不足以应对.更进一步,数据的自主汇聚导致数据垄断问题,严重影响了大数据价值实现.此外,大数据决策过程中,数据非真实产生、被篡改和质量管理过程中的单点失败等问题导致数据决策不可信.如何使这些问题得到有效治理,使数据被正确和规范地使用是大数据发展面临的主要挑战.首先,提出数据透明化的概念和研究框架,旨在增加大数据价值实现过程的透明性,从而为上述问题提供解决方案.然后,指出数据透明化的实现需求与区块链的特性天然契合,并对目前基于区块链的数据透明化研究现状进行总结.最后,对基于区块链的数据透明化可能面临的挑战进行分析.
Abstract 7
Blockchain technology is a new emerging technology that has the potential to revolutionize many traditional industries. Since the creation of Bitcoin, which represents blockchain 1.0, blockchain technology has been attracting extensive attention and a great amount of user transaction data has been accumulated. Furthermore, the birth of Ethereum, which represents blockchain 2.0, further enriches data type in blockchain. While the popularity of blockchain technology bringing about a lot of technical innovation, it also leads to many new problems, such as user privacy disclosure and illegal financial activities. However, the public accessible of blockchain data provides unprecedented opportunity for researchers to understand and resolve these problems through blockchain data analysis. Thus, it is of great significance to summarize the existing research problems, the results obtained, the possible research trends, and the challenges faced in blockchain data analysis. To this end, a comprehensive review and summary of the progress of blockchain data analysis is presented. The review begins by introducing the architecture and key techniques of blockchain technology and providing the main data types in blockchain with the corresponding analysis methods. Then, the current research progress in blockchain data analysis is summarized in seven research problems, which includes entity recognition, privacy disclosure risk analysis, network portrait, network visualization, market effect analysis, transaction pattern recognition, illegal behavior detection and analysis. Finally, the directions, prospects and challenges for future research are explored based on the shortcomings of current research.
摘要: 区块链是一项具有颠覆许多传统行业的潜力的新兴技术.自以比特币为代表的区块链1.0诞生以来,区块链技术获得了广泛的关注,积累了大量的用户交易数据.而以以太坊为代表的区块链2.0的诞生,更加丰富了区块链的数据类型.区块链技术的火热,催生了大量基于区块链的技术创新的同时也带来许多新的问题,如用户隐私泄露,非法金融活动等.而区块链数据公开的特性,为研究人员通过分析区块链数据了解和解决相关问题提供了前所未有的机会.因此,总结目前区块链数据存在的研究问题、取得的分析成果、可能的研究趋势以及面临的挑战具有重要意义.为此,全面回顾和总结了当前的区块链数据分析的成果,在介绍区块链技术架构和关键技术的基础上,分析了目前区块链系统中主要的数据类型,总结了目前区块链数据的分析方法,并就实体识别、隐私泄露风险分析、网络画像、网络可视化、市场效应分析、交易模式识别、非法行为检测与分析等7个问题总结了当前区块链数据分析的研究进展.最后针对目前区块链数据分析研究中存在的不足分析和展望了未来的研究方向以及面临的挑战.
Abstract 8
In recent years, as more and more large-scale scientific facilities have been built and significant scientific experiments have been carried out, scientific research has entered an unprecedented big data era. Scientific research in big data era is a process of big science, big demand, big data, big computing, and big discovery. It is of important significance to develop a full life cycle data management system for scientific big data. In this paper, we first introduce the background of the development of scientific big data management system. Then we specify the concepts and three key characteristics of scientific big data. After an review of scientific data resource development projects and scientific data management systems, a framework is proposed aiming at the full life cycle management of scientific big data. Further, we introduce the key technologies of the management framework including data fusion, real-time analysis, long termstorage, cloud service, and data opening and sharing. Finally, we summarize the research progress in this field, and look into the application prospects of scientific big data management system.
摘要: 近年来,随着越来越多的大科学装置的建设和重大科学实验的开展,科学研究进入到一个前所未有的大数据时代.大数据时代科学研究是一个大科学、大需求、大数据、大计算、大发现的过程,研发一个支持科学大数据全生命周期的数据管理系统具有重要的意义.分析了研发科学大数据管理系统的背景,阐述了科学大数据的概念和三大特征,通过对科学数据资源发展和科学数据管理系统的研究进展进行综述分析,提出了满足科学数据管理全生命周期的科学大数据管理框架,并从数据融合、数据实时分析、长期存储、云服务体系以及数据开放共享机制5个方面分析了科学大数据管理系统中的关键技术.最后,结合科学研究领域展望了科学大数据管理系统的应用前景.
Abstract 9
Recently, research on deep learning applied to cyberspace security has caused increasing academic concern, and this survey analyzes the current research situation and trends of deep learning applied to cyberspace security in terms of classification algorithms, feature extraction and learning performance. Currently deep learning is mainly applied to malware detection and intrusion detection, and this survey reveals the existing problems of these applications: feature selection, which could be achieved by extracting features from raw data; self-adaptability, achieved by early-exit strategy to update the model in real time; interpretability, achieved by influence functions to obtain the correspondence between features and classification labels. Then, top 10 obstacles and opportunities in deep learning research are summarized. Based on this, top 10 obstacles and opportunities of deep learning applied to cyberspace security are at first proposed, which falls into three categories. The first category is intrinsic vulnerabilities of deep learning to adversarial attacks and privacy-theft attacks. The second category is sequence-model related problems, including program syntax analysis, program code generation and long-term dependences in sequence modeling. The third category is learning performance problems, including poor interpretability and traceability, poor self-adaptability and self-learning ability, false positives and data unbalance. Main obstacles and their opportunities among the top 10 are analyzed, and we also point out that applications using classification models are vulnerable to adversarial attacks and the most effective solution is adversarial training; collaborative deep learning applications are vulnerable to privacy-theft attacks, and prospective defense is teacher-student model. Finally, future research trends of deep learning applied to cyberspace security are introduced.
摘要: 近年来,深度学习应用于网络空间安全的研究逐渐受到国内外学者的关注,从分类算法、特征提取和学习效果等方面分析了深度学习应用于网络空间安全领域的研究现状与进展.目前,深度学习主要应用于恶意软件检测和入侵检测两大方面,指出了这些应用存在的问题:特征选择问题,需从原始数据中提取更全面的特征;自适应性问题,可通过early-exit策略对模型进行实时更新;可解释性问题,可使用影响函数得到特征与分类标签之间的相关性.其次,归纳总结了深度学习发展面临的十大问题与机遇,在此基础上,首次归纳了深度学习应用于网络空间安全所面临的十大问题与机遇,并将十大问题与机遇归为3类:1)算法脆弱性问题,包括深度学习模型易受对抗攻击和隐私窃取攻击;2)序列化模型相关问题,包括程序语法分析、程序代码生成和序列建模长期依赖问题;3)算法性能问题,即可解释性和可追溯性问题、自适应性和自学习性问题、存在误报以及数据集不均衡的问题.对十大问题与机遇中主要问题及其解决方案进行了分析,指出对于分类的应用易受对抗攻击,最有效的防御方案是对抗训练;基于协作性深度学习进行分类的安全应用易受隐私窃取攻击,防御的研究方向是教师学生模型.最后,指出了深度学习应用于网络空间安全未来的研究发展趋势.