本篇主要提供流式系统TylerAkidau,SlavaChernyak,ReuvenLax著东南电子书的pdf版本下载,本电子书下载方式为百度网盘方式,点击以上按钮下单完成后即会通过邮件和网页的方式发货,有问题请联系邮箱ebook666@outlook.com
图书基本信息 | |
图书名称 | 流式系统 |
作者 | TylerAkidau,SlavaChernyak,Reuv |
定价 | 128元 |
出版社 | 东南大学出版社 |
ISBN | 9787564183677 |
出版日期 | 2019-06-01 |
字数 | |
页码 | |
版次 | |
装帧 | 平装 |
开本 | 16开 |
商品重量 |
内容提要 | |
在传统的数据处理流程中,总是先收集数据,然后将数据放到DB中。当人们需要的时候通过DB对数据做query,得到答案或进行相关的处理。这样看起来虽然非常合理,但是结果却非常的紧凑,尤其是在一些实时搜索应用环境中的某些具体问题,类似于MapReduce方式的离线处理并不能很好地解决问题。这就引出了一种新的数据计算结构---流计算方式。它可以很好地对大规模流动数据在不断变化的运动过程中实时地进行分析,捕捉到可能有用的信息,并把结果发送到下一计算节点。《流式系统(影印版)》讲解流计算原理。 |
目录 | |
Preface Or: What Are You Getting Yourself Into Here? Part Ⅰ.The Beam Model 1.Streaming 101 Terminology: What Is Streaming? Othe Greatly Exaggerated Limitations of Streaming Event Time Versus Processing Time Data Processing Patterns Bounded Data Unbounded Data: Batch Unbounded Data: Streaming Summary 2.The What, Where, When, and How of Data Processing Roadmap Batch Foundations: What and Where What: Transformations Where: Windowing Going Streaming: Wheand How When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things! When: Watermarks When: Early/On-Time~Late Triggers FTWI When: Allowed Lateness (i.e., Garbage Collection How: Accumulation Summary 3.Watermarks Definition Source Watermark Creation Perfect Watermark Creation Heuristic Watermark Creation Watermark Propagation Understanding Watermark Propagation Watermark Propagatioand Output Timestamps The Tricky Case of Overlapping Windows Percentile Watermarks Processing-Time Watermarks Case Studies Case Study: Watermarks iGoogle Cloud Dataflow Case Study: Watermarks iApache Flink Case Study: Source Watermarks for Google Cloud Pub/Sub Summary 4.Advanced Windowing When/Where: Processing-Time Windows Event-Time Windowing Processing-Time Windowing via Triggers Processing-Time Windowing via Ingress Time Where: SessioWindows Where: Custom Windowing Variations oFixed Windows Variations oSessioWindows One Size Does Not Fit All Summary 5.Exactly-Once and Side Effects Why Exactly Once Matters Accuracy Versus Completeness Side Effects Problem Definition Ensuring Exactly Once iShuffle Addressing Determinism Performance Graph Optimization Bloom Filters Garbage Collection Exactly Once iSources Exactly Once iSinks Use Cases Example Source: Cloud Pub/Sub Example Sink: Files Example Sink: Google BigQuery Other Systems Apache Spark Streaming Apache Flink Summary Part Ⅱ.Streams and Tables 6.Streams and Tables Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity Toward a General Theory of Stream and Table Relativity Batch Processing Versus Streams and Tables A Streams and Tables Analysis of MapReduce Reconciling with Batch Processing What, Where, When, and How ia Streams and Tables World What: Transformations Where: Windowing When: Triggers How: Accumulation A Holistic View Of Streams and Tables ithe Beam Model A General Theory of Stream and Table Relativity Summary 7.The Practicalities of Persistent State Motivation The Inevitability of Failure Correctness and Efficiency Implicit State Raw Grouping Incremental Combining Generalized State Case Study: ConversioAttribution ConversioAttributiowith Apache Beam Summary 8.Streaming SQL What Is Streaming SQL? Relational Algebra Time-Varying Relations Streams and Tables Looking Backward: Stream and Table Biases The Beam Model: A Stream-Biased Approach The SQL Model: A Table-Biased Approach Looking Forward: Toward Robust Streaming SQL Stream and Table Selection Temporal Operators Summary 9.Streaming Joins All Your loins Are Belong to Streaming Unwindowed loins FULL OUTER LEFT OUTER RIGHT OUTER INNER ANTI SEMI Windowed loins Fixed Windows Temporal Validity Summary 10.The Evolutioof Large-Scale Data Processing MapReduce Hadoop Flume Storm Spark MillWheel Kafka Cloud Dataflow Flink Beam Summary Index |
作者介绍 | |
Tyler Akidau是Google的高级软件工程师,担任着Data Processing Languages & Systems小组技术负责人的职务。他也是Apache Beam PMC的创始成员。
|
编辑推荐 | |
如今,流式数据是大数据中的一个大问题。 随着越来越多的企业试图掌控遍布全球的无限海量数据集,流式系统终于到了足以被主流接纳的成熟度。通过这本实用指南,数据工程师、数据科学家和开发人员将学习到如何以概念化和无关于平台的方式处理流式数据。 基于对Tyler Akidau的热门博文《Streaming 101》和 《Streaming 102》的拓展,本书将带你从入门到细致入微地理解实时数据流处理的what、where、when和how。你还将与合著者Slava Chernyak和ReuveLax一起深入了解水印和exactly-once处理。 你将学习到: 如何比较流式和批量数据处理模式 健全的乱序数据处理背后的核心原理和概念 水印如何在无限数据集中跟踪进度和完整性 exactly-once数据处理技术如何确保正确性 流和表的概念如何构成批量和流式数据处理的基础 用现实世界的例子演示强大的持久状态机制背后的实用动机 时变关系(time-varying relations)如何将流处理和熟悉的SQL及关系代数世界联系起来 |