目 录
摘要
随着互联网技术的飞速发展,从网络获取信息已经成为用户接受信息的一个重要渠道之一。而用户获取信息是通过各类文本获取,各种类型的文本便构成了庞大的具有异构性、开放性特点的分布式数据库。网络信息的出现以及与计算机技术的结合大大加速了两者的发展,已经成为目前非常重要的手段。
本文简单介绍了PYTHON编程语言的功能特点以及爬虫,设计了一套基于网络爬虫的新闻推荐信息系统。在设计上采用了一些较新、较完善的设计,系统主要功能包含了新闻推荐网站信息爬取、信息存储和修改,分析了基于网络爬虫的新闻个性化推荐系统的一些基本功能和组成情况,包括系统的需求分析、系统结构,功能模块划分以及模式分析等,重点对应用程序的实际开发实现作了介绍,保证了数据信息的一致性和安全性,确保应用程序功能齐全完备,符合系统的要求。
关键词:网络爬虫;推荐;Python
ABSTRACT
With the rapid development of Internet technology, obtaining information from the network has become an important channel for users to receive information. Users obtain information through various types of text, and various types of text constitute a huge distributed database with heterogeneous and open characteristics. The emergence of network information and the combination with computer technology have greatly accelerated the development of both, and have become a very important means at present.
This paper briefly introduces the function and characteristics of PYTHON programming language and crawler, and designs a news recommendation information system based on Web crawler. Some newer and more perfect designs are adopted in the design. The main functions of the system include information crawling, information storage and modification of news recommendation websites. Some basic functions and components of the news recommendation information search and management system based on web crawler are analyzed, including system requirements analysis, system structure, functional module division and mode analysis. Emphasis is placed on the application program. The actual development and implementation are introduced, which ensures the consistency and security of data and information, and ensures that the application program has complete functions and meets the requirements of the system.
Keywords:web crawler; recommendation; Python
1 绪论
1.1选题背景
随着互联网技术的飞速发展,从网络获取信息已经成为用户接受信息的一个重要渠道之一。而用户获取信息是通过各类文本获取,各种类型的文本便构成了庞大的具有异构性、开放性特点的分布式数据库。在这些数据库中文本数据库占有比例更大,从而派生出文本挖掘[1]。他是一个从文本信息描述到选取提取模式,