Der Artikel wird am Ende des Bestellprozesses zum Download zur Verfügung gestellt.

Automated Data Collection with R

A Practical Guide to Web Scraping and Text Mining
 E-Book
Sofort lieferbar | Lieferzeit: Sofort lieferbar I
ISBN-13:
9781118834787
Veröffentl:
2014
Einband:
E-Book
Seiten:
480
Autor:
Simon Munzert
eBook Typ:
PDF
eBook Format:
Reflowable
Kopierschutz:
2 - DRM Adobe
Sprache:
Englisch
Beschreibung:

A hands on guide to web scraping and text mining for bothbeginners and experienced users of R* Introduces fundamental concepts of the main architecture of theweb and databases and covers HTTP, HTML, XML, JSON, SQL.* Provides basic techniques to query web documents and data sets(XPath and regular expressions).* An extensive set of exercises are presented to guide thereader through each technique.* Explores both supervised and unsupervised techniques as well asadvanced techniques such as data scraping and text management.* Case studies are featured throughout along with examples foreach technique presented.* R code and solutions to exercises featured in thebook are provided on a supporting website.
Preface xv1 Introduction 11.1 Case study: World Heritage Sites in Danger 11.2 Some remarks on web data quality 71.3 Technologies for disseminating, extracting, and storing web data 91.4 Structure of the book 13Part One A Primer on Web and Data Technologies 152 HTML 172.1 Browser presentation and source code 182.2 Syntax rules 192.3 Tags and attributes 242.4 Parsing 323 XML and JSON 413.1 A short example XML document 423.2 XML syntax rules 433.3 When is an XML document well formed or valid? 513.4 XML extensions and technologies 533.5 XML and R in practice 603.6 A short example JSON document 683.7 JSON syntax rules 693.8 JSON and R in practice 714 XPath 794.1 XPath--a query language for web documents 804.2 Identifying node sets with XPath 814.3 Extracting node elements 935 HTTP 1015.1 HTTP fundamentals 1025.2 Advanced features of HTTP 1165.3 Protocols beyond HTTP 1245.4 HTTP in action 1266 AJAX 1496.1 JavaScript 1506.2 XHR 1546.3 Exploring AJAX with Web Developer Tools 1587 SQL and relational databases 1647.1 Overview and terminology 1657.2 Relational Databases 1677.3 SQL: a language to communicate with Databases 1757.4 Databases in action 1888 Regular expressions and essential string functions 1968.1 Regular expressions 1988.2 String processing 2078.3 A word on character encodings 214Part Two A Practical Toolbox forWeb Scraping and Text Mining 2199 Scraping the Web 2219.1 Retrieval scenarios 2229.2 Extraction strategies 2709.3 Web scraping: Good practice 2789.4 Valuable sources of inspiration 29010 Statistical text processing 29510.1 The running example: Classifying press releases of the British government 29610.2 Processing textual data 29810.3 Supervised learning techniques 30710.4 Unsupervised learning techniques 31311 Managing data projects 32211.1 Interacting with the file system 32211.2 Processing multiple documents/links 32311.3 Organizing scraping procedures 32811.4 Executing R scripts on a regular basis 334Part Three A Bag of Case Studies 34112 Collaboration networks in the US Senate 34312.1 Information on the bills 34412.2 Information on the senators 35012.3 Analyzing the network structure 35312.4 Conclusion 35813 Parsing information from semistructured documents 35913.1 Downloading data from the FTP server 36013.2 Parsing semistructured text data 36113.3 Visualizing station and temperature data 36814 Predicting the 2014 Academy Awards using Twitter 37115 Mapping the geographic distribution of names 38015.1 Developing a data collection strategy 38115.2 Website inspection 38215.3 Data retrieval and information extraction 38415.4 Mapping names 38715.5 Automating the process 38916 Gathering data on mobile phones 39616.1 Page exploration 39616.2 Scraping procedure 40416.3 Graphical analysis 40616.4 Data storage 40817 Analyzing sentiments of product reviews 41617.1 Introduction 41617.2 Collecting the data 41717.3 Analyzing the data 42617.4 Conclusion 434References 435General index 442Package index 448Function index 449

Kunden Rezensionen

Zu diesem Artikel ist noch keine Rezension vorhanden.
Helfen sie anderen Besuchern und verfassen Sie selbst eine Rezension.

Google Plus
Powered by Inooga