ApacheCon Asia 2022

Free continuing

20083 人关注

Date 2022-07-29 08:30 ~ 08-31 23:55

Location Virtual

The event is organized by ApacheCon

Count Down:

Days

Hours

Minutes

Seconds

Free continuing

20083 人关注

WeChatshare

QR Code for the Conference Website (Mobile)

Eventsshare

Enter the applet sharing event using WeChat scan.

CONFERENCE LIVE

THE OFFICIAL LIVE WEBSITE

2022-06-21

The conference is now live

ABOUT APACHECON

ApacheCon is the official global conference series of The Apache Software Foundation (ASF). Since 1998 – before the ASF’s incorporation – ApacheCon has been drawing participants at all levels to explore ”Tomorrow’s Technology Today” across 300+ Apache projects and their diverse communities. ApacheCon showcases the latest developments in Apache projects and emerging innovations through hands-on sessions, keynotes, real-world case studies, training, hackathons, and more.

ApacheCon showcases the latest breakthroughs from ubiquitous Apache projects and upcoming innovations in the Apache Incubator, as well as open-source development and leading community-driven projects the Apache way. Attendees learn about core open source technologies independent of business interests, corporate biases, or sales pitches.

The ApacheCon program is dynamic, evolving at each event with content directly driven by select Apache project developers and user communities. ApacheCon delivers state-of-the-art content that features the latest open source advances in big data, cloud, community development, FinTech, IoT, machine learning, messaging, programming, search, security, servers, streaming, web frameworks, and more in a collaborative, vendor-neutral environment.

TRACK CHAIR

谭中意

人工智能 / 机器学习

Apache Member，Apache brpc PPMC 成员，在 SUN、百度、腾讯等有超过20年开源工作经验，现在第四范式负责 AI 开源。是企业智能化转型开源社区— 星策的发起人。

琚致远

API / 微服务

琚致远于 2019 年 10 月入选 Apache APISIX PPMC，2022 年 3 月入选 Apache Member 和中国开源码力榜。他在开源软件商业公司 API7.ai 负责全球化团队，曾是 ApacheCon Speaker、OSPP/GSoC Mentor、freeCodeCamp 中国核心组织者之一。

代立冬

大数据

代立冬，白鲸开源联合创始人、Apache Member、Apache DolphinScheduler PMC Chair、Apache SeaTunnel PPMC、Apache 孵化器导师。

李岗

大数据

李岗，开源爱好者，Apache DolphinScheduler Initial committer & PMC，Apache Local Community (ALC) Beijing Member，现担任联想集团数据架构优化工程师。

王晔倞

社区

王晔倞，开源爱好者，Apache APISIX Committer，现任支流科技 VP，担任社区运营、Customer Success Header。曾在国内多家上市公司任职，22年技术从业经验，对技术管理和架构设计有一定的经验。

李建盛

开源文化

适兕，作者，「开源之道」主创，Apache Local Community（ALC）Beijing 正式成员。

姜宁

应用集成

姜宁，Apache 软件基金会（ASF） Member、Apache Local Community （ALC） Beijing 发起人，华为开源管理中心技术专家，前红帽软件首席软件工程师，有十五年以上企业级开源中间件开发经验，有丰富的 Java 开发和使用经验，函数式编程爱好者。

潘娟

孵化器

潘娟，SphereEx 联合创始人兼CTO, Apache member, AWS Data Hero, Apache ShardingSphere PMC, Apache brpc(Incubating) & Apache AGE(Incubating) & Apache hugeGraph(Incubating) mentor, 中国木兰开源社区导师, 腾讯TVP, 被评为《2020 中国开源先锋人物》, OSCAR尖峰开源人物。

乔嘉林

IOT AND IIOT

乔嘉林，清华大学博士，助理研究员。Apache IoTDB PMC 及初创成员，从头参与建设我国高校发起的首个 Apache 顶级项目。开放原子基金会银牌讲师，获北京市科技进步一等奖。个人公众号「铁头乔」。

王殿进

消息队列

王殿进，ALC Beijing Member，开源爱好者，目前在 StreamNative 担任社区负责人，推动 Apache Pulsar 社区建设与成长，近期关注 Apache 社区与开源运营。

杜恒

消息队列

阿里云消息混合云及 RocketMQ 社区负责人，Apache member，Apache RocketMQ PMC member & committer，OpenMessaging TSC member，ALC shenzhen 发起人之一，具有多年分布式消息中间件架构设计开发、规模化交付、系统化运维经验。目前对分布式中间件、K8s、微服务、物联网、Serverless 感兴趣。

张亮

中间件

张亮，数据库领域知名实践者，拥有超过 10 年的数据库领域探索、实践经验，热爱开源，擅长分布式架构，推崇优雅代码。曾在多个大型互联网集团公司任职架构、数据库团队负责人。Apache Member 、微软 MVP 、阿里云 MVP、腾讯云 TVP、华为云 MVP、Apache ShardingSphere 创始人 & PMC Chair。出版书籍《未来架构——从服务化到云原生》，在 ICDE 发表论文《Apache ShardingSphere：A Holistic and Pluggable Platform for Data Sharding》。

冯嘉

中间件

冯嘉，分布式系统领域资深技术专家，国内基础软件领域开源先锋，Apache 软件基金会（ASF） Member，Google编程夏令营导师。现任华为云中间件首席技术专家，中间件团队负责人。

吴晟

可观察性

吴晟，Tetrate创始工程师。Apache SkyWalking创始人和项目VP。Apache 软件基金会首位中国董事，Apache 软件基金会成员。AWS Container Hero。Microsoft MVP。

王伟冰

远程过程调用

王伟冰，Apache bPRC PPMC成员，百度资深研发工程师、云原生微服务方向技术负责人。

刘军

远程过程调用

刘军，Apache Dubbo PMC Member，阿里微服务框架开源负责人，是一个工作在微服务、RPC 领域的老兵，有丰富框架研发及企业落地经验。

李钰

流处理

李钰，ASF Member, Apache Flink & HBase PMC Member, Apache HugeGraph (incubating) Mentor; 阿里云EMR平台技术及Flink存储引擎团队负责人，资深技术专家。

郭炜

调度 / 数据处理

郭炜，ASF会成员, Apache 孵化器导师。ClickHouse 华人社区创始人， Apache Dolphin Scheduler PMC，Apache SeaTunnel(incubating) 导师，曾入选中国开源先锋 33 人，中国 2021 年开源杰出人物。郭炜先生毕业于北京大学，曾任易观 CTO，联想研究院大数据总监，万达电商数据部总经理，先后在中金、IBM、Teradata任大数据方重要职位，对大数据前沿研究做出卓越贡献。同时郭先生参与多个技术社区工作，Presto, Alluxio,Hbase等，是国内开源社区领军人物。

张乎兴

Web 服务器

张乎兴，Apache Member，Apache Tomcat/Dubbo PMC Member，Spring Cloud Alibaba 社区负责人，OpenSergo 项目创始人，Arthas 贡献者，阿里云高级技术专家。

张文丽

数据可视化

羡辙，Apache Member 以及 Apache ECharts 项目的 PMC 主席和核心贡献者。她对可视化项目富有热情，希望借助数据可视化让更多人理解数据背后的故事。

Schedule

2022-07-29

2022-07-30

2022-07-31

2022-07-29

09:00 -12:30

Keynote

2022-07-29

10:30-11:30

Apache 在中国的成功故事

Apache软件基金会（ASF)是一个成立于1999年的非盈利慈善组织，其使命是：“Provide software for the public good”，为公众利益提供软件。ASF最早源于开发Apache HTTP服务器的一个爱好者组织“Apache组织”。经过二十多年的发展，Apache软件基金会已成为世界上最大的开源基金会，负责监管350多个免费的企业级项目和1.9亿多行的代码，这些项目和代码成为支撑着全球广泛使用的应用程序的基石。

ASF 与中国到底有着什么样的渊源？在其成立的二十多年期间发生哪些有趣的故事？ASF在中国成功故事对中国的开源事业有什么样的帮助？本次圆桌将邀请 Apache 软件基金会的资深成员与大家分享 ASF 在中国发展的故事。

Speaker

姜宁

华为Apache 软件基金会（ASF）2022 年度董事

吴晟

Tetrate 创始工程师

刘天栋

开源社理事

谭中意

Apache 软件基金会成员

江波

SegmentFault 思否运营合伙人兼 COO

2022-07-29

11:30-12:30

顶级项目新创企业圆桌

在开源日益兴起的时代，围绕着Apache顶级项目打造产品的商业化公司越来越多。顶级项目的新创企业们也获得了投资界的较多青睐，比如开源微服务 API 网关 Apache APISIX的商业化公司支流科技、以 Apache Shardingsphere 为基石的SphereEx、捐献出 Apache DolphinScheduler 的易观、围绕着时序数据库 Apache IoTDB 打造的Timecho天谋科技。那么，Apache顶级项目这个title对进行商业化有什么帮助？有什么掣肘？开源和商业化这两件事如何平衡？本次邀请到了深圳支流科技的联合创始人&CEO 温铭、SphereEx 创始人兼CEO 张亮、易观大数据平台总监 & Apache DolphinScheduler PPMC 代立冬、Apache Member、Apache IoTDB PMC Chair 黄向东，来对顶级项目新创企业这类公司从不同的视角提出相关的看法。

Speaker

黄向东

Apache Member

温铭

深圳支流科技的联合创始人&CEO

代立冬

易观大数据平台总监

张亮

SphereEx 创始人兼CEO

乔嘉林

IOT AND IIOT

2022-07-29

10:10-10:30

聚焦开源基础软件，共同繁荣开源生态

开源已是全球技术创新的重要模式，华为坚持“解决问题，创造价值”的理念，持续投入并贡献开源，期望与Apache软件基金会一起共同推动开源生态繁荣。本次分享将对国内外开源趋势进行总结，分享华为参与开源历程和实践，以及基础软件开源项目的进展。作为全球开源软件价值链中的重要一环，呼吁开源企业、组织、基金会及个人开发者协同合作，增强中国开源土地肥力，共同繁荣开源生态。

Speaker

任旭东

华为首席开源联络官

2022-07-29

09:40-10:10

InnerSource and the Apache Way: How to learn Open Source

In 1999, just after the term Open Source was coined, Sun Microsystems made a code contribution that created the TomCat (Servlet API) project at the Apache Software Foundation (ASF), This was the beginning of my 20+ year relationship with the ASF, where I am still a Member. I was fascinated to witness the codification of how to write software using the massively peer-reviewed collaborative development method known as the Apache Way. Although not infallible, the Apache Way has helped tens of thousands of engineers (and organisations) learn how to work openly with fellow travelers and competitors in a public commons. Learn how and why I patterned the InnerSource Commons on the Apache Way and how a new wave of Open Source curious organisations in diverse fields of endeavour have allowed a practice of InnerSource to help them modernize their own engineering culture and prepare them for real Open Source engagement.

Speaker

Danese Cooper

Chair of the InnerSourceCommons.org

2022-07-29

09:10-09:40

What We All Need To Do Together To Secure The Open Source Software Supply Chain

Brian will speak about the ways in which the open source community have become vulnerable to new kinds of attacks on the software supply chain, and the efforts of many to address those challenges. Those efforts require new processes, new tools, and new initiatives to drive adoption. Heightened interest, particularly by governments of the world, has now driven the community to respond with a Mobilization Plan with specific goals. The talk ends with a specific list of things Apache projects can do to be more secure and support this global security effort.

Speaker

Brian Behlendorf

Open Source Security FoundationGeneral Manager

2022-07-29

09:05-09:10

祝贺 Apachecon Asia(亚洲)在线会议召开

Speaker

陆首群

教授

2022-07-29

13:30 -16:10

Incubator

2022-07-29

13:30-14:10

AGE's Journey to Graduation

The AGE team will tell the story of AGE from its introduction to the incubator to its graduation. The primary goal is to impart lessons learned by the PMC team on how to build a successful open source community

Speaker

Abdisho, Eya

AgeDBTechnical Engineer

Gemignani, John

AgeDBLead Software Engineer

Innis, Josh

AgeDBSenior Software Engineer

2022-07-29

14:10-14:50

How does Apache Pegasus (incubating) community develop at SensorsData

Sensors Data is one of the largest big data analytics and marketing technology services provider in China, now provides services to over 1500 companies globally. In distributed key-value storage usage scenarios, SensorsData has chosen the Apache Pegasus (incubating), which is a horizontally scalable, strongly consistent and high-performance key-value storage system. This presentation mainly covers why SensorsData has chosen Pegasus, how Pegasus resolves problems in application, and how SensorsData team contributes and helps to develop the community.

Speaker

YingChun, Lai

Sensors DataSoftware Engineer

Dan, Wang

Sensors DataSoftware Engineer

2022-07-29

14:50-15:30

为什么你应该选择 Apache Incubator

tison 是 Apache 软件基金会的正式成员，多个 Apache 项目的 Committer。tison 作为创始成员参与了 Apache inlong (incubating) 的孵化过程，作为 mentor 帮助 Apache Kvrocks (incubating) 进入孵化器并建立发展开源社群。本次主题 tison 将从自己的经历出发，讨论作为开源参与者为什么应该选择 Apache Incubator 孵化的项目，作为企业开源的决策人员，为什么应该选择向 Apache Incubator 捐赠项目。

Speaker

陈梓立

无无

2022-07-29

15:30-16:10

Apache ShenYu网关的前世今生

1.shenyu网关的发展历程 2.shenyu网关的功能介绍 3.shenyu网关的未来规划 4.shenyu网关的社区治理

Speaker

肖宇

京东科技架构师

2022-07-29

13:30 -15:30

Workflow/DataProcessing | 工作流/数据处理

2022-07-29

13:30-14:10

The Practice of Apache DolphinScheduler as a unified scheduler center in Lenovo

Introducting he technology promotion, enterprise practice and future planning of Apache DolphinScheduler in Lenovo. Lenovo TDP selected Apache DolphinScheduler as a unified scheduler center. In order to empower more domains, some improvements have been made based on version 2.0.5, adding Java task plug-ins and clients, improving http task to support result transfer, and support Lenovo internal authentication.

Speaker

Gang Li

LenovoData Architecture Optimization Engineer

2022-07-29

14:10-14:50

Apache Oozie 的深度实践

大数据离线处理领域在近些年的发展逐渐成熟完善，而在对离线任务编排领域，也就是工作流这一块，各种项目工具也是层出不穷。Apache 的第一代工作流调度软件 Oozie 已经是有 10 余年的历史了，而新晋火热的 dolphinscheduler 和 airflow 也是关注度相当高，证明了在工作流编排领域依然有很强的活力。本次演讲，将带来 Apache Oozie 作为一个非常古老的项目在爱奇艺的深度实践。涉及到 Oozie 功能与架构介绍、在爱奇艺的规模与应用、对 Oozie 的任务插件支持和改造、单 Oozie 集群统一调度多集群的方案实现、Roadmap、社区贡献等5个方面来介绍 Oozie 功能与架构介绍 Oozie 的架构不同于 airflow 和 dolphinscheduler, 没有复杂的 master/slaves 架构, 是一个无状态架构。每一个节点均是 master，通过 zk 协调者来认领自己名下的 action(也就是各个 job)。同时，相较于与业界知名的 dolphinscheduler 相比，它的 worker 的设计也是不同。Oozie 的 worker 并非是一个由 worker 组成的工作资源池，而是将其托管给 Hadoop Yarn 集群。对于每一个 Spark/Hive/MR/Flink(除了 SSH Action) 均会由一个 Oozie launcher 来启动。优点是避免了维护一个大的 worker 集群，Oozie 的扩展性取决于了 Hadoop Yarn 的容量，同时降低了Oozie 的管理成本；但是缺点也很明显，每一个 Oozie launcher 均会占据 1core/1g. Oozie5.0 之前的版本是 2core/2g(是一个 MR 任务，5.x 后是 Oozie 内部在 Yarn 上实现的 launcher AM)。同时 Oozie 提供了类型繁多的 action (类比于 dolphinscheduler 的 task type), 同时也暴露了底层通用的接口，方便用户进行扩展。现在 Oozie 原生已经支持的通用大数据 action, 有 hive/spark/mr/distcp 等等，同时我们也扩展了 flink batch/ tony 等 action type. 在功能性上，Oozie 提供了定时启动、触发式启动、SLA 告警、指定节点重启等功能，基本满足了大数据的离线调度编排的需求。但是因为其难用的界面，复杂的 xml 配置，导致一度让人望而生畏，因此在爱奇艺也只将 Oozie 作为底层的调度引擎，而在上层构建了数据开发平台，提供了可视化拖拽页面，增强易用性。但是借助 Oozie 的无状态架构，很好地支撑了爱奇艺大数据离线任务的发展。在爱奇艺的应用 Oozie 自 2018 年以来，在爱奇艺支持10多个离线 Hadoop 集群（节点总数2w+）的日均 20+ 的工作流离线任务，不仅覆盖了推荐、广告等离线数据开发，还承担了一部分机器学习工作流编排的场景。初期部署 Oozie 的方式是遵照 Oozie 的设计，即每个 Hadoop 集群都会对应部署一个 Oozie 集群（一般是 2-4 台物理机，使用的是 Oozie HA 模式），这种部署方式通过底层的物理隔离，使得单 Oozie 集群的调度量得到了限制，在稳定性上获得了很高的益处。但是随着业务发展，多云多机房的 Hadoop 集群搭建，Oozie 与 Hadoop 绑定的部署模式，带来了很大的运维负担。每次对 Oozie 的变更上线，都需要一周左右的灰度发布时间。因此我们也对 Oozie 的部署模式做了深度的改造。另外因为内部离线任务的提交入口都收缩到 Oozie 这个调度引擎上，使得需求众多，扩展了很多 action type，比如 flink batch action 等。 Oozie action 插件的支持和改造在应用 Oozie 初期将 Spark 任务提交入口从入口机迁移到 Oozie 时，也遇到一些兼容性问题。Oozie 原生的 Spark 提交模式，是通过将用户的 spark jar 在 launcher 启动的时候，利用 hadoop 的 distributed cache 下载到 launcher 任务目录下。但是此种方式，如果用户 spark jar 中带有了一些与集群不兼容的 yarn 依赖，则直接使得 spark 任务失败。因此通过将下载 Spark jar 延后到 launcher 启动后，来彻底解决了这个问题。另外初期在将公司入口机上的 crontab 任务迁移至平台时，大量推进了 ssh action 的应用（在 Oozie 侧使用远程登录用户入口机的方式，来执行任务）。因此对 ssh action 修复了若干 bug 和问题，均已经回馈社区。同时也发现了一些隐蔽 bug, 如偶发的调度死锁等问题 [OOZIE-3646] Possible dead-lock in SignalXCommand - 在基于 Oozie 的调度服务逐步成熟的过程中，我们也进行了一些新的尝试与探索，包括支持 Flink batch on Oozie 和使用 Oozie 来调度机器学习的训练任务。 Flink batch action 是为了推进流批一体，实现的一个 Oozie action. 但是在实现过程中，发现 Oozie 的 Hadoop delegation token 机制，在 Flink 上尚未实现，因此给 Apache Flink 贡献了多个 PR，如 1. https://issues.apache.org/jira/browse/FLINK-21700 2. https://issues.apache.org/jira/browse/FLINK-21768 3. https://issues.apache.org/jira/browse/FLINK-22294 4. https://issues.apache.org/jira/browse/FLINK-22329 5. https://issues.apache.org/jira/browse/FLINK-22534 另外为了支持机器学习工作流的编排，我们在 Oozie 上也引入了 TonY（tensorflow/pytorch on Yarn），通过实现 tony action 来将数据处理、样本生成同机器学习训练连通起来单 Oozie 集群统一调度多 Hadoop 集群任务因为 Hadoop 集群的众多，包括自建和云上的集群，同时 Hadoop 版本也各异，使得在使用原先 Oozie 的部署模式，带来了很大的运维成本。因为每个 Hadoop 集群都需要部署一个 Oozie 集群。同时也希望对高优任务进行集群调度层面的 HA 保障，为了实现这个目标，我们制定了三步骤的实施方案，从具备跨集群调度的能力、实现细粒度的调度隔离、实现智能的调度。在具备跨集群调度的能力上，因为同时随着对 Oozie 的深入理解，也开始了对 Oozie 的深入改造，通过将调度集群配置与任务集群配置分离，实现了单 Oozie 集群同时负责调度公有云、自建的 Hadoop2.x 和 3.x 多个版本集群。在改造后，在 Oozie 侧支持 Hadoop 新集群，只需要几分钟即可完成。通过将多个集群的任务统一由单一 Oozie 集群调度，势必会造成集群间的干扰。因此实现了不同 Oozie server 之间的集群调度归属策略，隔离调度负载，保障任务的稳定性（配图）目前，正在探索结合爱奇艺内部的 QBFS(一种基于 Hadoop HCFS 接口实现的虚拟文件系统)，来做任务的智能调度，对用户屏蔽掉底层的物理集群，提供统一的逻辑资源集群 Roadmap 1. 结合 QBFS，继续探索对任务的智能调度 2. 增加 Oozie 代码版本发布时的，细粒度灰度的能力，精确到指定工作流、指定用户，提升发布时的稳定性 Oozie 与相关项目的社区贡献 1. https://issues.apache.org/jira/browse/OOZIE-3646 2. https://issues.apache.org/jira/browse/OOZIE-3379 3. https://issues.apache.org/jira/browse/OOZIE-3594 4. https://issues.apache.org/jira/browse/OOZIE-3393 5. https://issues.apache.org/jira/browse/OOZIE-3569 6. https://issues.apache.org/jira/browse/OOZIE-3574 7. https://issues.apache.org/jira/browse/OOZIE-3589 增加 Flink batch action，Flink 侧的兼容 8. https://issues.apache.org/jira/browse/FLINK-21700 9. https://issues.apache.org/jira/browse/FLINK-21768 10. https://issues.apache.org/jira/browse/FLINK-22294 11. https://issues.apache.org/jira/browse/FLINK-22329 12. https://issues.apache.org/jira/browse/FLINK-22534

Speaker

张俊帆

爱奇艺大数据工程师

2022-07-29

14:50-15:30

apache dolphinscheduler调度在联通大数据的二次开发与实践

基于1.3.2版本进行二次开发的，在公司内部部署了80+的服务器运行与维护，扩展了后台多机器，版本升级，上线批量管理，以及扩展了与其他组件集成交互的功能等联通大数据基于apache dolphinscheduler 1.3.2版本进行二次开发的，在公司内部部署了80+的服务器运行与维护 1、贡献了许多功能到社区，其中包含，全局变量的扩展，工作流组件输入输出参数的定义，工作组的定义等功能在 2.0.x版本中有体现 2、由于部署的服务器比较多扩展了后台多机器，master和worker 版本升级，上线和停机服务，批量管理等功能 3、增加了对已部署的各个服务器留存任务执行的情况 4、对整体执行性能的优化等

Speaker

刘武

联通数字科技有限公司数据开发工程师

2022-07-29

13:30 -17:30

Streaming A

流式处理生产实践专场

2022-07-29

13:30-14:10

基于Apache Flink的流批一体在京东物流的实践

流批一体是Flink作为新一代计算引擎最重要的特征之一，本次演讲将介绍京东物流在该方向的思考、探索与实践，包含以下方面： - 京东物流的大数据应用场景； - 流批一体的必要性和可行性； - Flink流批一体在物流业务中的落地实践； - 未来计划。

Speaker

康琪

京东技术专家

2022-07-29

14:10-14:50

基于FlinkSQL的小米实时数据集成实践

在本次演讲中，我们将介绍小米在推进数据湖架构的过程中，在实时数据集成上面临的挑战，以及我们对实时数据集成的思考和实践。小米在2021年开始推进数据湖架构，实时数据集成作为数据湖生态的关键一环，在流批一体、Schema Evolution、异构数据系统集成、断点续传等特性上面临诸多挑战，本次演讲将介绍小米基于 Flink SQL 和 Flink CDC 尝试孵化的实时数据集成引擎，以及在数据湖实时数据集成的实践。

Speaker

胡焕

小米集团高级软件工程师

2022-07-29

14:50-15:30

大规模集群下的 Apache Flink 稳定性优化实践

本次演讲将系统的从 1）减少故障；2）降低影响；3）快速发现故障等三个方面介绍腾讯在大规模集群下 Apache Flink 的稳定性优化实践；希望通过该演讲给用户在建设 Apache Flink 的稳定性方面提供实质性参考。

Speaker

邱从贤

腾讯科技有限公司高级开发工程师

2022-07-29

15:30-16:10

基于Apache Flink的实时计算数据流框架在京东零售业务的实践和落地

京东零售数据与智能部基于京东业务特点，为提升特定场景下研发人员的开发效率而搭建了一套基于Flink的实时计算框架，致力于构建基于Flink的独有的数据流场景。本次主要分享京东零售数据与智能部积累的一些机器学习工程和相关数据分析的技术场景和相应的解决方案，包括但不限于：1) 榜单场景：致力于解决多流 join 场景和 TopN 场景；2) Query 动线分析场景：利用 Flink Gelly 将图分析结果数据落入多维分析 OLAP，然后启动流式分析查询 OLAP，供数据分析人员提供 QP 场景的 AB 因果分析等技术要点；3) 机器学习场景：致力于构建基于 Flink 的独有的机器学习工作流，内部构成机器学习链路闭环，包括但不限于实时特征生成、样本拼接、特征工程、模型训练、模型预估等环节，全链路批流一体，复用算子。

Speaker

张颖

京东算法工程师

闫莉刚

京东资深技术专家

2022-07-29

16:10-16:50

腾讯广告 Flink 实战：特征生产、训练样本、策略计算

腾讯广告业务的流式计算引擎，正从 Apache Spark（Spark Streaming）逐步切换为 Apache Flink。在本次演讲中，我们会介绍我们在特征生产、训练样本、策略计算的业务场景，以及在 Spark Streaming 切换 Apache Flink 过程中的一些挑战、经验和教训，包括我们对海量数据（40 TB）场景下、对 Flink 内核做的一些升级优化，使得能够满足 40TB 大状态异常快速恢复、流为批用（并非流批一体）等特性。

Speaker

林立伟

腾讯科技（北京）有限公司腾讯广告特征生产、样本数据、策略框架技术负责人

2022-07-29

16:50-17:30

基于Apache Flink的金山云实时计算平台实践与防疫场景下的应用

金山云实时计算平台始建于2015年初，旨在构建一站式实时计算开发平台，降低开发运维人员直接使用开源引擎的学习与运维门槛，提升开发效率。平台底层依托于Apache Flink，提供开发，测试，部署，运维一站式构建实时计算应用的能力。目前在金融、医疗、互联网等行业服务于大量的企业级客户。本次分享整体将分为两部分，第一部分介绍金山云实时计算平台在产品以及架构层面的实践与思考。第二部分将介绍在某市防疫专题大数据建设背景下，如何利用Apache Flink计算引擎的能力，提升各类防疫数据产出的时效性，并依托于金山云实时计算平台，快速响应各类防疫数据的实时计算需求。为防疫部门大屏实时数据展示以及各项防疫工作提供更为实时的数据支撑。分享提纲： a. 金山云实时计算平台的发展历程 b. 实时计算平台产品架构与核心功能介绍 c. 实时计算平台技术架构介绍 d. 核心功能实现原理介绍 e. 防疫场景下的应用 f. 未来发展分享要点：完整介绍实时计算平台的产品形态与技术架构。深入介绍Flink On K8S，在线故障诊断等平台核心功能实现思路。介绍防疫背景下对实时计算的需求，以及如何利用ApacheFlink在金山云实时计算平台上快速实现这些需求，提供稳定的实时计算数据出口。

Speaker

郑舒力

金山云研发专家

2022-07-29

13:30 -17:30

Middleware

2022-07-29

13:30-14:10

Apache EventMesh事件驱动分布式多运行时

Apache EventMesh是一款以事件驱动为核心的分布式服务运行时，通过动态的插件式云原生基础服务层，将应用程序和中间件层分离，并提供了灵活，可靠和快速的事件分发和处理能力，同时可以对事件进行管理，可以作为应用进程的连接层，为企业实现其数字化转型的目标，提供其所需的全套应用进程间通信模式。本次演讲将会介绍eventmesh社区的演进发展，以及eventmesh相关的特性。

Speaker

薛炜明

深圳前海微众银行股份有限公司中间件平台开发工程师

2022-07-29

14:10-14:50

Apache ShardingSphere Scaling解析

当业务发展到一定规模，传统数据库面临数据库量大、查询慢、无法弹性增长等瓶颈。如何高效、低成本、安全解决这一痛点？Apache ShardingSphere基于Database Plus理念，在传统数据库之上，提供了强大的分布式数据库增强计算引擎。其基于数据分片的水平扩缩容方案可以有效、平滑地为用户解决相关分布式数据库问题。Scaling提供的数据迁移、数据同步方案助力传统数据库平滑切换到ShardingSphere，同时还提供了数据库节点弹性扩缩容的能力，最终为用户打造完善的高性能、高可用、可扩展的分布式数据库解决方案。

Speaker

钟红胜

SphereEx数据库中间件开发工程师

2022-07-29

14:50-15:30

Apache Kvrocks(Incubating) 设计与实现

Apache Kvrocks(Incubating) 在 2019 年开源之后，旨在降低 Redis 的内存成本，同时提高存储容量。目前在海外内多个公司线上大规模使用，本议题会重点分享 Kvrocks 的演进以及设计实现。

Speaker

林添毅

AfterShip技术经理

2022-07-29

15:30-16:10

Apache Zookeeper and Apache Curator Meet the Dining Philosophers

A ZooKeeper walks into a pub … (actually an Outback pub), and ends up helping some Philosophers solve their fork resource contention problem. This talk is an introduction to Apache Zookeeper and Apache Curator to solve a new variant of the classic computer science Dining Philosophers problem. We’ll introduce Zookeeper (a mature and widely-used de facto technology for distributed systems coordination) and the Dining Philosophers problem, and explore how we used Apache Curator (a high-level Java client for Zookeeper) to implement the solution and show how it works. We tested the application on Instaclustr’s new managed Apache Zookeper cloud service, so we can also reveal performance results using a single Zookeeper server vs. an Ensemble. Finally, we take a look at the progress to remove Zookeeper from Apache Kafka. Even though Apache Kafka may be leaving the Zoo(keeper) soon, there are still lots of distributed applications in need of some coordination help. and it’s worth learning about Apache Zookeeper and Curator.

Speaker

Paul Brebner

InstaclustrChief Technology Evangelist

2022-07-29

16:10-16:50

拥抱云原生，基于 Kubernetes 的 ShardingSphere 云化改造

背景随着 Kubernetes 日趋成熟与稳定，SphereEx 在探索 ShardingSphere 未来的前景的路上也逐渐将其认为是 ShardingSphere 下一阶段的重要里程碑。在此背景下，SphereEx 云团队将对 ShardingSphere 云化改造作为重要目标注意。在实际过程中，团队遇到很多问题和挑战。本次演讲将分享一下团队对 ShardingSphere 云化改造中遇到的问题和做出的应对。 Kubernetes Operator 模式是什么 Kubernetes 的 Operator 模式概念允许你在不修改 Kubernetes 自身代码的情况下，通过为一个或多个自定义资源关联控制器来扩展集群的能力。 Operator 是 Kubernetes API 的客户端，充当用户自定义资源的控制器。 Kubernetes CRD 是什么定制资源（Custom Resource）是对 Kubernetes API 的扩展。用户可以使用资源定制对 Kubernetes 进行扩展。在 Kubernetes 中，下列的几种能力都能被满足： 1. 使用客户端还有 CLI 可以直接对资源进行更新。 2. 使用 Kubectl 命令也可以直接对这个资源进行支持。 3. 利用 Kubernetes 的自动化机制，可以对这个资源的变动进行感知并且对这个对象的关联对象进行后续操作。 4. 使用 Kubernetes 原生能力能够做到自动化处理。遇到的问题在 ShardingSphere-Proxy 上云的过程中，团队遇到了几个问题，一个是没有一个在云环境中的标准的部署过程，现阶段 ShardingSphere-Proxy 只支持 Docker 还有二进制的部署模式。另外一个是 ShardingSphere-Proxy 依赖 Zookeeper 作为治理节点。有状态的服务，例如 Zookeeper 不是那么适合在云上进行处理。Kubernetes 由于其 Pod 的生命周期特性和强大的水平扩展能力，更加适合处理无状态应用。有了 Zookeeper 的加入，ShardingSphere-Proxy 变得没那么无状态。这是亟待云团队进行解决的棘手问题。解决方案现阶段，ShardingSphere 在 Kubernetes 环境探索从未停止。为了增强在 Kuerbenets 环境中适配能力还有自运维能力，SphereEx 云团队将 Kubernetes Operator 模式与 ShardingSphere 进行结合，对 ShardingSphere-Proxy 进行云化改造。并且结合 Kubernetes 的原生能力，使ShardingSphere-Proxy 可以在 Kuebrnetes 环境中做到脱离 Zookeeper ，并保留集群模式单独进行运行。以 ShardingSphere-Proxy 这个数据库治理中间件为例通过结合 Kubernetes Operator 模式，能够做到增强 ShardingSphere-Proxy 在 Kubernetes 中的表现能力。主要体现在，自动运维，自动扩缩容以及自动节点状态监控。在部署结构和部署难度方面，Operator 也会降低这方面的难度。虽然现阶段，ShardingSphere-Proxy 已经正式支持了 Helm 进行部署。但是 Helm 只能够做到对一个 ShardingSphere-Proxy 集群环境进行快速部署和版本化管理，后续的自动运维包括集群状态检查还有自愈等能力是缺失的。ShardingSphere-Operator 的存在正好弥补了这方面的空白。利用声明式的资源描述，ShardingSphere-Operator 可以帮助用户简单快速的达到想要的 ShardingSphere-Proxy 部署模式终态。增加了云环境中 ShardingSphere-Proxy 集群的稳定性和可维护性。另外定制资源这个 Kubernetes 扩展的存在，也可以帮助 ShardingSphere-Proxy 对运行时集群中依赖的元数据进行存储。依托于 Kubernetes 的中 CRD 的 list/watch 能力，能够做到对于元数据变动后向其他集群中的节点进行广播，并实时更新其余节点的运行时元数据。 DistSQL（Distributed SQL）是 Apache ShardingSphere 特有的操作语言。它与标准 SQL 的使用方式完全一致，用于提供增量功能的 SQL 级别操作能力。在去 Zookeeper 的过程中，因为元数据变动的入口变得多元，如何将 DistSQL 写入的配置和用户直接修改 CRD 都提供支持，团队也提供了自己的思路。利用 ShardingSphere-Sidecar 结合 ShardingSphered 的做法，使 ShardingSphere-Proxy 可以做到对这两种元数据变动入口的同时支持。并且在未来 ShardingSphere-Sidecar 也有可能去支持 Service Mesh 的 xDS 协议。这样就大大的增加了 ShardingSphere-Proxy 在云环境中的扩展能力。

Speaker

李卓

SphereEx云研发工程师

2022-07-29

16:50-17:30

基于Apache EventMesh构建云原生数据流转平台

基于Apache EventMesh构建云原生数据流转平台的方案设计以及应用场景

Speaker

梁荣华

微众银行中间件开发工程师

2022-07-29

17:30-17:30

茶歇

2022-07-29

13:30 -17:30

Integration

2022-07-29

14:10-14:50

研发效能数据集成平台DevLake的架构分享

DevLake 是一款开源的研发数据平台，提供自动化、一站式的数据收集、分析以及可视化能力，帮助研发团队更好地理解开发过程，挖掘关键瓶颈与提效机会。 DevLake这样的多数据源集成平台在开发过程中最大的挑战是繁杂的数据源和庞大的数据量。在最初的架构中，有三方面的问题包括数据丢失，数据失真以及每次微调后都需要重复请求api。本次演讲分享了在架构演进的过程中团队是如何克服挑战，希望可以为数据集成及处理的框架设计上提供参考。

Speaker

陈映初

思码逸软件开发工程师

2022-07-29

14:50-15:30

Citizen Streaming Engineer - A How To

Democratizing the ability to build streaming data pipelines will help turn everyone who needs streaming data to be able to do it themselves. By utilizing the open source streaming stack known as FLiPNS will let us achieve this. FLiPNS is a stack of Apache Flink, Apache Pulsar, Apache Spark and Apache NiFi. Apache NiFi provides a Web UI for cse’s to build their own data pipelines. These citizen applications can be used to ingest data, route, transform, enrich, join and store it. As part of these applications Pulsar will provide a streaming data hub for ML model access as well as plugging in additional processing components.

Speaker

SPann, Timothy

StreamNativeDeveloper Advocate

2022-07-29

15:30-16:10

Camel K goes Quarkus Native

Cloud native development requires your microservices to have a small resource footprint and a fast startup. Classic Java (and Apache Camel) applications are not very well suited for Cloud Native. We'll learn how Camel K can leverage Quarkus Native build in order to transform transparently your application in a first class Cloud Native citizen. Thanks to the Kamelet technology and the recent development made in Camel K we'll see how this is possible and how the resulting application will be a memory reduced and faster version of the classic one. Pasquale will show a demo illustrating various "operational" aspects that will let you understand how best operate a Camel K Quarkus native application running on Kubernetes.

Speaker

Congiusti, Pasquale

Red HatSoftware Engineer

2022-07-29

16:10-16:50

Integrating systems in the age of Quarkus, serverless and Kafka

Have you ever got the task to implement an exchange of data between two systems that were not designed to communicate with each other? I bet you have, and I dare to introduce a couple of tools and approaches making the task easier to accomplish. First, I’ll speak about Apache Camel, the Swiss knife of integrating heterogeneous systems. It offers 300+ connectors out of the box, to transfer data to and from a wide variety of systems. The toolbox also brings options to route, filter and transform data based on the wildest requirements of a modern or legacy enterprise. Second, I will show what fun it is to write Camel integrations on top of Quarkus. You’ll learn about the famous Quarkus dev mode - the background compilation & live reload of the application while coding for faster dev cycles. Further, I’ll talk about dev services - an automatic provisioning of a required external service, such as Kafka broker or a database when testing or developing. Bonus: Quarkus applications start in milliseconds and consume just a few tens of megabytes of RAM. Third, I will explain how the outstanding integration capabilities of Apache Camel enrich serverless architectures based on Knative. I will touch topics like auto-scaling and scaling to zero, content based routing of cloud events, as well as streaming data between Apache Kafka and the 300+ kinds of systems supported by Apache Camel.

Speaker

Bendhiba Zineb

Red HatSenior Software Engineer

2022-07-29

16:50-17:30

Up & Running: Low Code Cloud-Native Integrations

Let's face it, integrations are better in the cloud. Learn how you can use open source software to design and deploy your integrations in a matter of minutes using Kaoto, a developer-friendly low code integration tool. Kaoto's lightweight architecture is based on Apache Camel and its battle-tested enterprise integration patterns.

Speaker

Yordán, Rachel

Red HatPrincipal Software Engineer

2022-07-29

13:30 -17:30

Big Data B

2022-07-29

13:30-14:10

Building a real-time analytics dashboard with Apache Kafka, Apache Pinot, and Streamlit

When you hear "decision-maker," it's natural to think of "C-suite" or "executive." But these days, we're all decision-makers. Restaurant owners, bloggers, big-box shoppers, and diners have important decisions to make and need instant actionable insights. Businesses need access to fast, fresh analytics to provide these insights to end-users like us. In this session, we will learn how to build our own real-time analytics application on top of a streaming data source using Apache Kafka, Apache Pinot, and Streamlit. Kafka is a distributed, open-source pub-sub messaging and streaming platform for real-time workloads, Pinot is an OLAP database designed for ultra-low latency analytics, and Streamlit is a Python-based tool that makes it easy to build data-driven apps. After introducing these tools, we'll stream data into Kafka using its Python client, ingest that data into a Pinot real-time table, and write basic queries using Pinot's Python SDK. Once we've done that, we'll bring everything together with an auto-refreshing Streamlit dashboard to see changes to the data as they happen. There will be lots of graphs and other visualizations! This session is aimed at application developers and data engineers who want to quickly make sense of streaming data.

Speaker

Dhanushka, Dunith

StarTreeDeveloper Advocate

Wolok, Karin

StarTreeHead of Developer Community and Marketing

2022-07-29

14:10-14:50

Apache Ozone: Multi-Protocol aware system handles both Files and Objects efficiently

Apache Ozone is a distributed, scalable and a high performance object store that can scale to billions of objects of varying sizes. Apache Ozone object store recently implemented a multi-protocol aware bucket feature, where a single Ozone cluster with the capabilities of both Hadoop Core File System (“HCFS”) and Object Store (like Amazon S3) features. In this talk we will deep dive into the unified and extensible architectural design in Ozone representing directories, files, objects and buckets that allows interoperability between hierarchical file system and object store protocol. Basically, this multi-protocol capability will be attractive to systems that are primarily oriented towards File System - like workloads, but would like to add some Object Store feature support. For example, a user can ingest data into Ozone using FileSystem API, and the same data can be accessed via Ozone S3 API(Amazon S3 implementation). This would potentially improve the efficiency of the user platform with on-prem Object Store. Furthermore, data stored in Ozone can be shared for various use cases, eliminating the need for data duplication, which in turn reduces risk and optimises resource utilisation. Finally, we will also talk about the roadmap to leverage this new design to introduce a hash based locking mechanism to allow more concurrent metadata namespace operations(mixture of write and read) by replacing the global bucket level lock.

Speaker

Radhakrishnan, Rakesh

Cloudera Private LimitedStaff Software Engineer

Singh, Mukul Kumar

Cloudera Private LimitedSenior Engineering Manager

2022-07-29

14:50-15:30

eBay基于Apache Kyuubi(Incubating) 构建Unified & ServerLess Spark网关实践

Apache Kyuubi(Incubating)是一个大数据网关，支持多租户和分布式等特性，可以满足企业内诸如ETL、BI报表等大数据场景的应用。目前的主要方向是依托本身的架构设计，围绕各类主流计算引擎，打造一个Serverless SQL on Lakehouse服务,目前支持的引擎有Spark、Flink、Trino。 eBay软件工程师王斐将会介绍Apache Kyuubi的架构和主要特性，丰富的使用场景以及eBay如何基于Apache Kyuubi构建Unified & ServerLess Spark Gateway，以及Apache Kyuubi社区的最新特性和roadMap。

Speaker

王斐

eBayStaff Engineer,Apache Kyuubi PPMC Member

2022-07-29

15:30-16:10

Optimization and practice of Apache InLong in Tencent Cloud

1. Introduction to Apache InLong 2. InLong Sort overall architecture 3. InLong Sort implementation details 4. Practice case

Speaker

Yunqing Mo

Tencent CloudBig data senior engineer

2022-07-29

16:10-16:50

基于 Zeppelin 的 Flink/Spark 云原生实践

主要介绍如何在 Kubernetes 环境之上基于 Zeppelin 构建作业开发管理平台，并运行和管理 Flink/Spark on native k8s 上的实践。

Speaker

陶克路

字节跳动基础架构研发工程师

王正

字节跳动工程师,

2022-07-29

16:50-17:30

基于血缘的离线数仓数据发现方法

在数据中台的大背景下，离线数仓领域中用户经常需要解决以下问题: - 哪些 Hive 表包含业务 A 的数据 - Hive 表核心字段是否存在转存导致的权限放大问题 - Hive 表是否包含机密数据需要被清理这些问题可以统一归类为数据发现问题。字节跳动针对离线数仓任务进行SQL分析，构建 Hive表的血缘关系，基于标签传播算法以自动化工程化地解决数据发现问题，规避人工标注存在的周期长、成本高、准确率低、不灵活等问题。数据发现包括但不限于: Hive 表/列的业务分类分级和机密字段识别等。应用场景举例: 1. 哪些 Hive 表包含业务 A 的埋点数据通过分析 ETL SQL的执行逻辑结合各类 Filter 过滤条件判断下游表是否会包含 app='a' 的数据 2. 某ODS表中以复杂结构存储了一些机密数据，通过对下游表进行SQL分析，识别出下游表是否包含复杂结构的一部分数据, 进而识别权限放大问题。例如，是否包含map中某一个key的数据等 3. 某个机密字段是否已经完成脱敏

Speaker

韩帅

字节跳动高级研发工程师

孙科

字节跳动高级研发工程师

2022-07-29

13:30 -17:30

Big Data A

2022-07-29

13:30-14:10

Apache Doris 1.x 极速版的新特性和云原生时代的未来规划

Apache Doris 是百度贡献给Apache 社区的一款高性能全场景MPP企业级MPP数据库。提供针对海量数据的近实时数据分析，支持大规模数据导入，能提供亚秒级实时分析和超高并发的查询响应。在本次演讲中，将由Apache Doris PPMC 杨政国首先为大家简要介绍Doris社区的发展历程。之后会重点介绍Doris 最令人期待的1.0 极速版本中的新特性，包括向量化执行引擎和精细的内存管理等内容，最后将介绍Doris社区目前正在开发的一些令人期待的新特性以及Doris 在云原生时代的规划。

Speaker

杨政国

百度Apache Doris PPMC，资深研发工程师

2022-07-29

14:10-14:50

An extension of Apache Atlas’ data model and an alternative open source user interface.

Apache Atlas provides data governance functionality and is part of the Hadoop eco-system, however, it is not limited to it. The underlying data model is very generic and can be extended. This makes Apache Atlas very flexible, however, has consequences for the usability of the user interface. In this talk an extension of the data model and an alternative open source user interface is presented, which can be used more intuitively by non-technical users, especially business users. In more detail, in this presentation the underlying extension of the data model is motivated and explained. Further, it is motivated which derived information is required by the business user to increase the usability. Next the open source backend functionality is explained and the underlying technologies are motivated. Finally, a short tutorial is provided on how to setup your own system with a related open source helm chart and explains how to get started. The motivation for this talk is to promote this open source project and find support and interest in the community. Some related numbers: - we are adding 6 additional base types - we have a further extension of these 6 types for elastic, kafka and kubernetes - the frontend is in Angular - the backend uses Apache Atlas, HBase, Apache Kafka, Apache Flink, Keycloak, Apache Httpd, elasticsearch and elastic enterprise search

Speaker

Wombacher, Andreas

Aurelius Enterprise B.V.CTO

2022-07-29

14:50-15:30

Apache Druid cloud native architecture evolution

Introduce the background of the Apache Druid cloud native architecture, the process of landing in many SHOPEE businesses, the opportunities and challenges during the period, our main technology evolution route, and future plans and prospects.

Speaker

金嘉怡

SHOPEEExpert Engineer

2022-07-29

15:30-16:10

Scaling Open Source Big Data Cloud Applications is Easy/Hard

In the last decade, the development of modern horizontally scalable open-source Big Data technologies such as Apache Cassandra (for data storage), and Apache Kafka (for data streaming) enabled cost-effective, highly scalable, reliable, low-latency applications, and made these technologies increasingly ubiquitous. To enable reliable horizontal scalability, both Cassandra and Kafka utilize partitioning (for concurrency) and replication (for reliability and availability) across clustered servers. But building scalable applications isn’t as easy as just throwing more servers at the clusters, and unexpected speed humps are common. Consequently, you also need to understand the performance impact of new server types and partitions, replication, consumers, connections, etc; monitor the correct metrics to have an end-to-end view of applications and clusters; conduct careful benchmarking, and scale and tune iteratively to take into account performance insights and optimizations. In this presentation, Paul will explore some of the performance goals, challenges, solutions, and insights I discovered over the last 5 years of building multiple realistic demonstration applications. The examples include benchmarking and diagnosing a performance problem we encountered before releasing our managed Apache Kafka offering on AWS’s Graviton2 (ARM) instances, trade-offs and automation of elastic Cassandra auto-scaling, scaling a Cassandra and Kafka anomaly detection application to 19 Billion checks per day, understanding and mitigating the impact of Kafka partitions and replication on cluster throughput, and building low-latency streaming data pipelines using Kafka Connect.

Speaker

Paul Brebner

InstaclustrChief Technology Evangelist

2022-07-29

16:10-16:50

Fine grained authorization to Cloud stores using Apache Ranger

Apache Ranger has widely been used as an authorization service for most of the products like HDFS, HIVE, HBASE etc in Apache Hadoop ecosystem for on-premise clusters. In this talk, we want to discuss various use cases on how Ranger policies can be leveraged to perform fine-grained authorization for objects stored in S3 public cloud storage along with capability to view access audits.

Speaker

Mukund

ClouderaStaff Software Engineer

2022-07-29

16:50-17:30

货拉拉大数据基础架构体系演进

货拉拉大数据在规划架构的演进时，主要聚焦的目标是：业务支撑、稳定、安全、控本、增效，来打造基础扎实、能力强大的大数据基础设施和平台化服务，支撑数据价值和数据赋能，助力公司业务高质量增长。本次演讲将会分享货拉拉大数据基础架构团队基于这些目标，持续在做的工作：包括基于hadoop生态和混合云的大数据基础架构及优化、稳定性保障、安全保障、大数据云原生等内容。演讲提纲:1. 背景介绍 a. 货拉拉介绍 b. 货拉拉大数据介绍2. 基础架构体系1.0 a. 大数据安全建设 b. 稳定性建设 c. 平台能力建设 d. 大数据sre架构 e. 计算引擎架构 f. 存储架构演进3. 基础架构体系2.0 a. 降本增效 b. 大数据安全4. 未来展望你将获得: 了解基于混合云的架构组成了解典型的大数据基础架构方向的建设方法了解存储计算、稳定性、数据安全等领域的建设实践

Speaker

张伟伟

深圳依时货拉拉科技有限公司大数据SRE负责人

2022-07-29

14:00 -18:00

Messaging A

2022-07-29

14:00-14:40

小米基于 RocketMQ 搭建高可用在线消息平台实践

消息队列是连接、解耦各类异构数据系统的管道，小米在 RocketMQ 社区和自研的基础上，针对集成、工业、商业化等场景构建了一套高可用、高性能的在线消息平台，能够大幅降低用户的接入成本。本文将会分享常见的应用场景和数据规模，然后介绍关于消息平台建设的最佳实践，包括高可用、性能优化、容灾、可观测性、资源优化、devops 、异构数据源集成、Schema 等相关的内容，最后会对未来规划进行简单描述，包括存计分离、分层存储、元数据自管理、存储层优化等。

Speaker

王帆

小米高级软件研发工程师

2022-07-29

14:40-15:20

基于 RocketMQ 的全链路业务灰度

在云原生分布式微服务架构下，在对业务基本无侵入的情形下，利用 RocketMQ 等中间件为业务实现全链路的业务灰度的管控能力

Speaker

黄展鹏

政采云有限公司政采云资深架构专家

2022-07-29

15:20-16:00

大数据生态的RocketMQ事件、数据流融合处理

主要介绍RocketMQ生态中是如何融合处理事件、数据流以及支持项目介绍

Speaker

李伟

腾讯资深开发工程师

2022-07-29

16:00-16:40

RocketMQ 与 Kafka 的比较及一种绘画技巧在中西文化中应用的比较。

1.RocketMQ 与 Kafka 的比较。2.一种绘画技巧在中西方文化中应用的异同。

Speaker

彭龙

美的集团软件工程师

2022-07-29

16:40-17:20

大规模ActiveMQ平滑迁移RocketMQ

历史原因，闪送公司业务线全面使用ActiveMQ作为消息系统，随着公司业务体量增大，ActiveMQ的高性能，高扩展，高可用等问题也愈发浮现，成为系统稳定性最重要的一个待改进问题项（公司级重点推进项目）。在该演讲中，我将分享在不影响各业务团队迭代的前提下，如何去平滑，快速迁移到RocketMQ的落地方案。

Speaker

高向阳

北京转转精神科技有限责任公司资深研发工程师

2022-07-29

17:20-18:00

RocketMQ消息队列在移动云端的云原生实践与应用

RocketMQ作为消息队列组件，已经在中国移动内部诸多业务系统（电子商务平台、交易平台和门户管理平台）中被广泛使用和落地。每次业务系统需要接入和使用RocketMQ消息队列组件，都会提需求单或工单给RocketMQ中间件研发团队和SRE，SRE或者交付组的同事会使用批量部署工具在服务器上完成RocketMQ组件版本的部署。随着RocketMQ消息队列的大规模的部署和使用，RocketMQ中间件团队在集群弹性扩容、版本管理和提升服务器资源利用率等方面面临诸多的挑战。胡宗棠作为本次议题分享的嘉宾，将主要从以下几个方面来介绍RocketMQ消息队列在移动云端的云原生实践与应用。（1）云原生消息队列的定义；（2）移动云RocketMQ消息队列介绍；（3）RocketMQ消息队列云原生的弹性设计与实践；（4）RocketMQ云原生消息队列的技术演进与未来展望；

Speaker

胡宗棠

中国移动云能力中心技术专家

2022-07-29

14:20 -16:50

Data Visualization

2022-07-29

14:20-14:40

Apache ECharts 的无障碍设计

数据可视化并不只是用来“看”的，让更多人理解图表所传递的信息也是可视化重要的方面。在这个分享中，我将介绍 Apache ECharts 近些年来在无障碍设计方面的努力，包括自动生成的图表描述、用以区分数据的贴花图案、在触摸设备增加响应范围等细节设计。

Speaker

羡辙

Apache EChartsPMC Chair

2022-07-29

14:40-15:20

Introducing advanced drill-down functionality in Apache Superset using Apache ECharts

Currently Apache Superset features dashboard native filters, which make it possible to add interactive filters to dashboards that control what data is being shown on charts. However, until now, only select chart types have offered cross filtering functionality for emitting filters to other charts, limiting limiting user interaction mostly to native filters. Currently the project is working on introducing a rich set of interactive features to the main visualization types, making it possible to use charts to drill down (e.g. clicking on a country to expose the distribution of cities), drill through (drill into any dimension in the dataset, irrespective of hierarchies), drill through (show row level data on the fly) and drill to dashboard (open a new dashboard with the selected dimensions pre-populated as native filters). This means that users will be able to interact with charts instead of being limited to the current set of native filter types. In the talk we'll be exploring the details of the feature, new context menus, hooks for extending the functionality and using the feature when building custom visualization plugins and timelines. We'll also show how the feature integrates with Apache ECharts (the main visualization library used by Superset), and plans for future development.

Speaker

Brofeldt, Ville

AppleSoftware Engineer

2022-07-29

15:20-16:00

How we use ECharts in SkyWalking

The background of data visualization. Implement the SkyWalking Dashboards with ECharts. Usage scenario analysis in SkyWalking. Summary, benefits of using ECharts in SkyWalking.

Speaker

Fan Qiuxia

Apache SkyWalking PMC, Apache CommitterSoftware Engineer, Tetrate.io

2022-07-29

16:00-16:40

大数据可视化低代码平台的探索与实践

介绍数据可视化目前的现状，目前现状所暴露出来的的痛点，站在解决痛点的思路上推出低代码概念，介绍低代码引擎设计与思考，介绍数据可视化编排平台Flyfish的特点与能力、如何解决行业痛点及如何进行提效。如何结合Apache Echarts打造出更好更强有力可视化组件，并高效做出数据可视化大屏！

Speaker

王海虎

云智慧（北京）科技有限公司研发经理

2022-07-29

14:00 -17:30

Culture

2022-07-29

14:00-14:40

差序格局与开源文化的碰撞

引子：开源共同体建设的“公”与“私”两难差序格局：概念引入与举例说明中西之别：“差序格局”与“团体格局”的”群我关系“对比回看开源：主流开源文化与“团体格局”的亲和性类型学分析：对应两种格局，公私观念和激励方式的区别对策建议：如何扎根本土，推广开源文化

Speaker

姜宁

华为技术专家

李圳虎

北京大学硕士在读

2022-07-29

14:40-15:20

如何吸引亚洲文化背景的开发者参与到开源

ASF 有很多发起自中国的开源项目，比如 Apache APISIX, Skywalking 等，由于语言和文化的差异，有亚洲文化背景的开发者（本次分享以中国开发者为例），在参与开源项目贡献，或者参与到 Apache 开源社区的时候，有一些“水土不服”的情况出现。本次分享，张晋涛将以 Apache APISIX 等项目及社区为例，结合自身经验来分享，如何吸引亚洲文化背景的开发者参与到开源。以及如何让开发者接受开源社区的文化，以及更好的融入到 Apache 社区中。

Speaker

张晋涛

API7.ai云原生技术专家

2022-07-29

15:20-16:00

How SegmentFault build community? In an opensource way

The Apache Way is a culture and spirit which guides not only how to make good open-source software, but how to run a good developer community and user group as well. SegmentFault is the leading developer community in China, where tens of millions of developers share their development experiences every month. In this talk, as the Co-Founder of SegmentFault, I will share with you: What do we understand the community is? How do we govern the content-based developer community in an open-source way? How has that given us a unique advantage and high growth?

Speaker

Bo Jiang

SegmentFault, KAIYUANSHE, CCF Open-Source Development CommitteeCOO of SegmentFault; Board Member of KAIYUANSHE; Executive Member of CCF Open-Source Development Committee

2022-07-29

16:00-16:40

UPSTREAM FIRST: AN ETHNOGRAPHIC STUDY OF OPEN SOURCE SOFTWARE COMMUNITY

This presentation is based on an in-depth research. The research aims to trace the formation and development of community-based innovation. Based on a 14-month ethnographic study in 10 open source communities, I systematically summarized the working mechanisms of open source software development. I identified four fundamental factors that underpin open source communities, including economy of community- based maintenance, iterative generation of value, incentives of professional networking, and entrepreneurship of core teams. A comprehensive framework incorporating these four factors is then constructed, which provides a dynamic process-based snowball model to explain the heterogeneity of open source communities. In contrast to previous literature, my field research suggests that there is convergence rather than divergence between community-based innovation and intra-firm R&D, thus opening up many possibilities for future research, such as comparing open source with closed source and exploring hybrid models of organizational innovation.

Speaker

Zhou, Jesse(周禹任）

Peking UniversityStudent

2022-07-29

16:40-17:20

开源与商业化友好的过去和未来

什么是商业化友好？是对开源的背叛？还是对商业的过分妥协？能走远吗？我们找到来自法律、创投、布道者、创业者一起来聊聊开源与商业化友好： 1. 许可即产品，软件许可的作用 2. 何为商业化友好？该如何解读 3. Open Core 能走多远？ 4. Apache 软件基金会旗下的开源项目优势在哪里？ 5. 开源项目商业化的风险识别

Speaker

适兕

「开源之道」主创

2022-07-29

14:00 -18:00

AI分会场

谭中意

2022-07-29

14:00-14:40

Apache Submarine 云原生机器学习平台

Apache Submarine 是一个可以进行机器学习全流程处理的一站式工作平台，它以云原生的方式运行在 Kubernetes 和 Cloud 之上。 Submarine 提供了完善的平台部署和 Tensorflow、Pytorch 等机器学习框架的 YAML 文件和 Docker 镜像，这让整个系统的部署和使用都变的非常简单，您只需运行一条 Heml 命令就可以将 Submarine 机器学习平台运行在 Kubernetes 或 Cloud 之上，Submarine 提供支持多用户操作的 Workbench，数据科学家和算法工程师通过浏览器就可以完成数据加工、算法开发、作业调度、模型训练以及模型 Serving 的所有工作。 Submarine 提供了标准的 Tensorflow、Pytorch、Python 和 XGBoost 等机器学习框架 Docker 镜像，您还可以自己进行定制和扩展，通过 Docker 为机器学习作业提供了完全隔离的运行环境，借助 Kubernetes 和 Cloud 的资源管控能力，Submarine 支持大量机器学习作业的调度运行。

Speaker

刘勋

滴滴高级技术专家

2022-07-29

14:40-15:20

OpenMLDB: An Enterprise-Grade Feature Platform Built Upon Spark

OpenMLDB is an enterprise-grade feature platform that tackles the challenges of feature engineering for machine learning. It enables consistent features for offline training and online inference, and highlights the efficiency of real-time feature extraction. In this talk, we will first introduce the design methodology of OpenMLDB, which is built based on separate batch and real-time SQL engines. Then we will focus on the detailed architecture, especially (1) the optimization techniques for Spark to improve the efficiency of batch feature processing; and (2) the unified execution plan generator to inherently ensure the consistency between the batch and real-time SQL engines. Finally, we will demonstrate a few use cases for real-world machine learning applications based on OpenMLDB.

Speaker

LU MIAN

OpenMLDB Community; 4ParadigmOpenMLDB PMC core member; Tech lead of HPC and database teams in 4Paradigm

2022-07-29

15:20-16:00

Pegasus与Flink在小米机器学习平台中的实践

随着小米广告业务的增长，算法实验的不断增加，如何提升算法同学的实验效率成为我们面临问题。在这样的背景下，我们搭建了机器学习平台，帮助我们解决算法过程中特征数据的共享，解决离线、在线实验效果不一致等一系列问题，在这个演讲里，我们将和大家分享小米是如何利用Pegasus, Flink等Apache组件构建小米机器学习平台。

Speaker

黄飞

小米互联网业务部商业平台技术部负责人

2022-07-29

16:00-16:40

抓住P99的尾巴 -- 机器学习推理的性能调优

随着机器学习应用的落地，很多公司模型部署中都面临着对于性能的需求。无论是线上还是线下，都存在着机器选择，参数配置等诸多难题。比如，如何优化 P90/P99的推理性能？怎么解决线下推理的扩展问题？在这个session中，我们将介绍几种常见的高性能机器学习系统，并分享一些在实际部署应用中遇到的挑战。同时，我们也会介绍如何更快的排查机器学习的推理问题，以及如何最终提升CPU/GPU的利用率。这些机器学习架构是基于Apache Spark, DeepJavaLibrary, Java Spring等流行的开源框架。

Speaker

兰青

亚马逊云科技软件开发工程师

2022-07-29

18:00-18:00

茶歇

2022-07-29

13:30 -17:30

Messaging B

2022-07-29

13:30-14:10

华为终端基于Apache Pulsar的消息队列演进之路

在云原生时代，华为终端的消息队列作为基础设施，面临了诸多挑战：MQ种类多且难以维护、容灾能力建设难度大、机器成本居高不下等等。本次将会与大家分享，华为终端在消息队列演进过程中的一些经验，以及如何应对上述问题。

Speaker

林琳

华为SDE专家

王小童

华为资深工程师

2022-07-29

14:10-14:50

Apache Pulsar在vivo的探索与实践

vivo移动互联网旨在为3.5亿的全球vivo智能手机用户提供极致的互联网产品和服务。目前已建立完整的移动互联网生态圈，围绕vivo大数据运营，打造包括应用、游戏、资讯、品牌、电商、内容、金融及搜索在内的全方位服务生态。在过去的几年里，我们使用多个kafka集群支撑了万亿/天数据量级的ETL、推荐、Push、监控等互联网业务。现在我们选择使用Apache Pulsar来作为我们下一代的消息中间件以应对更高量级数据的挑战。在本次演讲中，我将分享选择Pulsar的理由、pulsar从0到1的实践经验与问题解决思路、从 Kafka 到 Pulsar 的无缝迁移方案以及未来的规划展望。

Speaker

全利民

维沃移动通信（深圳）有限公司大数据工程师

2022-07-29

14:50-15:30

KoP在新浪微博的优化与实践

介绍新浪微博引入KoP、Pulsar的动机和背景，分享在使用KoP、Pulsar过程中遇到的挑战和优化、解决方案。

Speaker

沈文兵

新浪微博数据平台开发工程师

2022-07-29

15:30-16:10

Apache Pulsar 在腾讯云稳定性优化实践

Apache Pulsar 在腾讯云中已经得到大规模的生产实践。在过去一年中，承接了多行业生态中不同的使用场景，在实际的生产实践中，我们针对社区版本做了一系列的性能优化和稳定性功能方面的工作，来保障用户在不同的场景下系统的稳定高效的运行。此次分享将重点针对腾讯云近一年在 Pulsar 稳定性和性能方面优化的工作。

Speaker

冉小龙

腾讯云高级研发工程师

2022-07-29

16:10-16:50

BIGO基于Pulsar在高吞吐追赶读场景下的性能优化实践

BIGO 旗下目前有 BIGO Live 和 Likee 短视频两大视频产品与服务，当前 BIGO Live 直播业务已覆盖 150 多个国家与地区，Likee 短视频也拥有超过 1 亿用户，产品在 Z 世代中广泛流行。在过往的技术架构中，BIGO 采用开源的 Kafka 集群来支撑实时数据计算分析与短视频推荐业务。但随着业务不断快速发展，过往架构遇到了巨大挑战。Apache Pulsar 带来的分层架构及低延迟、持久化存储、水平易扩展等特性帮助我们解决了生产系统中面临的诸多问题。本文将介绍 BIGO 基于 Pulsar 在高吞吐读写的环境下、针对追赶读场景完成的性能优化，本文的主要内容包括：追赶读对系统性能损耗的讨论、消息磁盘读的主要耗时阶段、BIGO 提出的全新的异步预读优化方案、新策略在生产环境中的实际表现效果，本文将以最贴近实际的方式和大家分享 BIGO 在优化追赶读场景时的思考和心得。

Speaker

吴展鹏

BIGOStaff Engineer

2022-07-29

16:50-17:30

基于Pulsar Functions的日志加工DSL设计与实现

本次演讲由来自中国移动云能力中心的王嘉凌介绍消息队列Pulsar在移动云智能运维平台上的落地应用，包括如何通过消息队列Pulsar实现日志数据的采集和投递，以及如何通过Pulsar Functions实现支持DSL的日志加工

Speaker

王嘉凌

中国移动云能力中心软件开发工程师

2022-07-29

14:00 -18:00

Observability

2022-07-29

14:00-14:40

Apache SkyWalking: An open source holistic application performance monitoring and observability tool

Apache Skywalking is an application performance monitoring and observability tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures — with or without a service mesh. In this talk, Marc will go trough Skywalking’s approach to handle metrics, logs and traicing holistically and correlated them all in a low latency, extensible and performant way that allows it to have 100% client sampling. You will learn how easy and quickly you can get Apache Skywalking up and running to pinpoint where the problems in your system are thanks to Skywalking’s functionalities for: - Monitoring the health of your services with distributed tracing collected with low payload. - Observing your distributed system, on or off service mesh. - Automating source code change and instrumentation: multiple language agents provided. - Advanced visualization: used in traces, metrics, and topology maps. - On demand logs. - ebpf-based interactive profiling for C, C++, Golang, and Rust.

Speaker

Navarro Sonnenfeld, Marc

TetrateSoftware Engineer

2022-07-29

14:40-15:20

Apache SkyWalking with Native eBPF Agent

The topic of eBPF has become hotter and hotter in recent years. It turns the Linux system into a programmable kernel, which makes it easier for us to obtain various data in Linux. As an observable platform, Apache SkyWalking can use eBPF technology to aggregate more data into the platform for analysis. So, we created the native eBPF agent. In this session, I will introduce what eBPF can bring to the Apache SkyWalking.

Speaker

Han, Liu

TetrateEngineer

2022-07-29

15:20-16:00

Approaching Robust Anomaly Alerting Capabilities at Apache SkyWalking with AIOps

With drastic improvements in machine learning capabilities over the recent years, practical AI solutions have been deployed at scale in production-ready scenarios. In the landscape of observability, major commercial platforms have provided their users with various AI-enabled functionalities, mostly anomaly detection, since 2015. At the Apache SkyWalking ecosystem, open-source developers and young researchers are working on an community-driven AIOps solution to lower the bar for the curious practitioners. As a young open-source initiative, focusing on reliable reactive anomaly detection is the key to building a concrete basis for later phases. Therefore, the project scope in phase one is defined as "To provide a pluggable AIOps anomaly alert engine with out-of-box integrations to popular observability platforms like Apache SkyWalking (Full-Stack APM) and Prometheus (Metrics Systems)." This presentation will introduce the basic concepts of AIOps for observability, the definition, justification and the design of the project.

Speaker

Chen Yihao

Queen's UniversityMaster's Student, Apache SkyWalking Committer

2022-07-29

16:00-16:40

Observability Solution for Apache Http Server

The observability solution of Apache Http Server is based on OpenTelemetry (source code at https://github.com/open-telemetry/opentelemetry-cpp-contrib/tree/main/instrumentation/otel-webserver-module). It enables tracing of incoming requests to the server by injecting instrumentation into the Apache Http Server at runtime. It also has capabilities to capture the response time of many modules (including mod_proxy) involved in an incoming request, thereby including the hierarchical time consumption by each module. Monitoring individual modules is crucial to the instrumentation of Apache Http Server. As the HTTP request flows through individual modules, delay in execution or errors might occur at any of the modules involved in the request. To identify the root cause of any delay or errors in request processing, module wise information (such as response time of individual modules) would enhance the debuggability of the Apache Http Server. Some of the modules monitored by this solution are: mod_sso, mod_php, mod_dav, mod_proxy and mod_proxy_balancer. There is a list of excluded modules that are not monitored because these modules are responsible for internal working of Apache Http Server and are not directly involved in application request processing. This list of excluded modules can be found at https://github.com/open-telemetry/opentelemetry-cpp-contrib/blob/main/instrumentation/otel-webserver-module/src/apache/ExcludedModules.cpp. All other modules other than the excluded modules are monitored out of the box. In case the user wants to monitor any of these modules, one can easily remove them from the excluded list. The observability solution is provided as a Module of Apache Http Server and should be loaded after all other modules are loaded, so that the module level details are captured. Since this solution is based on OpenTelemetry, it creates spans and traces as specified in the OpenTelemetry Specifications. The telemetry data can be viewed and analyzed in any backend such as zipkin, Jaeger, Appdynamics etc. On every request received by Apache Http Server, Spans are created at the start of the request and ended at the end of the request. Apart from this, spans are also created and ended at the start and end of individual modules involved in request processing. The spans are then emitted periodically to the configured backend for processing and to derive valuable insights. The observability solution is being actively managed by a team in Cisco(Appdynamics) based out of Bangalore, India.

Speaker

Nagariya, Ajay

CiscoDirector Of Engineering at AppDynamics (Cisco)

Das, Debajit

CiscoSenior Software Engineer at AppDynamics (Cisco)

Pratyush, Kumar

CiscoSoftware Engineer at AppDynamics(Cisco)

2022-07-29

16:40-17:20

How SkyWalking Uses BanyanDB

BanyanDB, as an observability database, aims to ingest, analyze and store Metrics, Tracing, and Logging data. It's designed to handle observability data generated by Apache SkyWalking.

Speaker

Hongtao Gao

tetrate.ioFounding Engineer

2022-07-29

17:20-18:00

Apache SkyWalking MAL practice -- VMs and Kubernetes Monitoring

Apache SkyWalking supports custom metrics and integration with the mainstream metrics ecology and provides a meter system for metrics analysis and calculation. MAL(Meter Analysis Language) is the language that sets these rules for analyzing. Users can use it to perform analytical calculations on these source metrics. In this proposal will share: What is the MAL, and what scenarios should use it. MAL's basic data flow and syntax format. How to analyze metrics by MAL. VMs monitoring practices by MAL. Kubernetes monitoring practices by MAL.

Speaker

Kai Wan

TetrateEngineer

2022-07-30

09:00 -11:50

Keynote

2022-07-30

10:50-11:50

中国如何发展全球性开源治理策略

作为数字经济、数字治理与数字主权的重要组成部分，开源治理已经成为一个国家、地区、企业或组织数字化发展重要的战略选择，并涌现出多种开源治理的最佳实践模式，例如 Apache 的开源治理之道、TODO Group 的 OSPOlogy 方法论等，给我国发展全球开源治理事业提供了极好的借鉴。本 Panel 聚集了开源社“开源战略研究组”（ONES Group）的部分核心成员，分别从企业开源治理、基金会开源治理、开源社区自身的治理等不同视角，探讨目前我国在开源治理方面所积累的经验，并在国家开源治理标准、企业 OSPO 组织建设、开源治理数字化工具、开源治理人才培养等多个方面，提供建设性意见，希望能够给大家带来启发。

Speaker

王伟

华东师范大学研究员

王永雷

DevSecOps 专业人士

边思康

企业 OSPO 专业人士

赵生宇

博士研究生

2022-07-30

10:20-10:50

Apache Doris开源项目到产品的商业化探索之路

Apache Doris是一个基于MPP架构的高性能、实时的分析型数据库。SelectDB 是一家开源技术公司，基于 Doris 研发新一代云原生实时数仓 SelectDB（Powered by Apache Doris）运行在多家云上，为用户和客户提供开箱即用的产品能力。本次演讲将会介绍在云原生时代一个开源核心技术如何走向一个成熟的商业产品，系统阐述从Apache Doris到SelectDB的核心洞察、理念、商业产品规划和客户价值落地，探讨开源和商业化如何相铺相成的良性发展之道。

Speaker

连林江

SelectDB公司CEO

2022-07-30

09:55-10:20

OpenChain Project Introduction

Open source, the supply chain, and compliance are central to modern trade. License compliance has been joined by security compliance as central to trade. In this keynote Shane Coughlan, GM of the ISO standard for open source license compliance will explain what his community and other adjacent communities are doing to make things quicker, safer and more efficient.

Speaker

Shane Coughlan

expert

2022-07-30

09:20-09:55

Brand Management - An Introduction

This session is aimed primarily at committers of Apache projects. It will start with an overview of branding and trademarks and discuss how these relate to the project community and how they contribute to the overall health of the project. The bulk of the session will cover the services that the Brand Management team provides to projects. These services cost the ASF between $60k and $80k each year and the dicussion of services will include typical costs.

Registration will look at the benefits of registering the project’s marks, the registration process and how we choose which countries we apply to for registration. It will also look at the used by incoming projects to transfer any existing marks they may have to the ASF.

Using our marks will explain how we grant permission to use our marks as well as what we - generally - allow and don’t allow. This may be of help to companies looking to build products based on Apache projects so that they can ensure that they use our marks correctly.

Managing potential infringements will describe the key things to do and not to do when addressing a potential infringment of a project mark so that the project has the best chance of resolving the issue with the minimum of hassle for everyone concerned.

Speaker

Mark Thomas

member of the ASF

2022-07-30

09:00-09:20

Practical Steps to Encourage Community Growth

Building an enthusiastic contributor community is perhaps the hardest part of any open source project. But without it, your project will struggle to sustain itself long term. Although community growth is more art than science, there are some practical things that you can do to make it more likely that people will want to join in. Rich will share some lessons from 20+ years of community shepherding.

Speaker

Rich Bowen

member of the Apache Software Foundation

2022-07-30

13:30 -16:50

Workflow/DataProcessing | 工作流/数据处理

2022-07-30

15:10-15:50

DolphinScheduler 在T3出行一站式平台中的应用

Apache DolphinScheduler是一款优秀的分布式数据工作流任务调度系统，但是缺少相关CI\CD模块。T3出行整合了一站式开发平台，并且使DolphinScheduler对接了Apache Kyuubi、Apache Linkis计算中间件，形成了开发调试、发布、版本控制等一套完成的开发流程，极大提高了开发效率，规范了数据开发流程，实现了代码开发、业务上线与调度系统的打通，同时可以收口做到大数据开发 CI/CD 管理，帮助业务部门低门槛上线大数据相关的需求，减轻了数据开发的压力，向我们一站式开发平台的目标更进了一步

Speaker

李心恺

T3出行大数据平台研发工程师

赵玉威

T3出行大数据平台研发工程师

2022-07-30

13:10-13:50

Apache DolphinScheduler with Kubernetes for Big Data Processing

In order to onboard their internal business and make Apache Dolphin Scheduler a more powerful big data platform, Cisco Webex big data platform team integrated various data processing jobs with Kubernetes backend like Flink and Spark jobs on Apache Dolphin Scheduler.

Speaker

Liu Dingzheng

CiscoLeader, Engineering

2022-07-30

13:50-14:30

oppo基于Apache Seatunnel的数据处理平台

本文主要介绍Apache Seatunnel有哪些特性和优势。以及在生产中我们的使用经验，以 Seatunnel为基础，打造拖拉拽，配置化的数据处理平台，主要包括逻辑校验，任务开发，插件管理等。

Speaker

范未太

oppo移动通信有限公司高级后端开发工程师

2022-07-30

14:30-15:10

The improvement and application of DolphinScheduler in BIGO

In BIGO, they use DolphinScheduler to unify the workflow schedule system. They made a lot of improvements on DolphinScheduler and even some automatical tools for migrating existing workflow in AirFlow and oozie to DolphinSceduler. Now DolphinScheduler has become the only workflow schedule system in BIGO.

Speaker

XU SHUAI

BIGOPrinciple Engineer

2022-07-30

13:30 -17:30

Community

2022-07-30

13:30-14:10

95 后 Apache Member 与基金会的故事

琚致远出生于 1997 年，在接触 Web 领域后了解到了 Apache Web Server 并被羽毛 Logo 所吸引。当时他从未想过会参与其中并与 Apache 软件基金会产生故事，在 2019 年大学毕业后，他注意到 APISIX 项目并参与其中，接着项目被捐献给 Apache 软件基金会；2020 年 Incubator Apache APISIX 项目以非常快的速度与质量从基金会毕业，他也成为项目管理委员会（Apache APISIX PMC）一员。在过去 3 年的时间中，他把精力与热爱全心奉献给了项目与社区，今年他也很荣幸地被选举为 Apache Member，通过琚致远与基金会、Apache APISIX 社区的故事，相信能够为社区爱好者/观望者带来启发。

Speaker

琚致远

API7.aiHead of Global

2022-07-30

14:10-14:50

How to enable and foster open source collaboration with leading corporations listed on the stock market

How to grow an Open Source community from just a handful of contributors, to one of the leading communities in China and fastest growing Apache projects in the world? There's no secret sauce, it's alll about true openness and inclusiveness, the willingness to both mentor students by collaborating with initiatives such as Google Summer of Code or Anita B. Org’s Grace Hopper Celebration, while at the same time fostering a technically sound community whose solution has been adopted by over 170 corporations listed on the stock exchange. In this talk you will learn the best practices for open source project leadership.

Speaker

Yacine Si Tayeb, PhD

SpereEx & Apache ShardingSphereHead of International Operations

2022-07-30

14:50-15:30

怎么与开源开发者们进行友好的远程协同工作

李广远同学进行了https://apisix.apache.org/zh 的官网整站的i18n中文翻译工作，其中涉及到很多和APISIX成员的业务方面交流和工具库使用方式的讨论，如果大家对开源框架内的一些实现有自己的想法，一定是需要和社区的维护人员的交流的。如何和他们交流，如何快速的建立有效沟通，如何规范性的提出你的问题，和规范性的让别人清晰的审阅你的代码，这些我都会在这次演讲中，同步给你们每个热爱开源的开发者们，如果你还不知道如何给开源项目提出issue或者pr，请认真看完这期演讲，相信会对你有益。

Speaker

李广远

深圳市Bello智能科技有限公司高级前端开发工程师

2022-07-30

15:30-16:10

openGauss社区治理实践分享：从Open Source到Open Governance

openGauss是领先的企业级开源关系型数据库，经过两年成长，openGauss开源社区最近被中国信通院评为各项指标成熟度均为先进级的开源社区。本议题将介绍openGauss开源社区从0到1的治理的实践经验。这里包含了若干The Apache Way的治理方法的实践落地。

Speaker

梅相如

华为技术有限公司openGauss社区运营经理

2022-07-30

16:10-16:50

Apache Doris 社区运营实践与思考

Apache Doris 是一款高性能、易于使用、支持实时的分析型数据库，在国内有着极其广大的用户规模，社区目前也聚集了庞大的开发者群体。本次分享将主要介绍 Apache Doris 从开源到贡献给Apache基金会的历程，同时介绍在社区发展过程中的运营实践以及个人对开源社区运营的思考。

Speaker

鲁志敬

Apache Doris 社区社区运营负责人

2022-07-30

17:30-17:30

茶歇

2022-07-30

13:30 -17:30

Big Data B

2022-07-30

13:30-14:10

Apache Ozone behind Simulation and AI industries

This talk introduces two types of workloads over Apache Ozone, in self-hosted supercomputer systems of Preferred Networks. They're the result of our effort connecting the AI & HPC ecosystem (PyTorch, GPU) to BigData ecosystem (JVM, commodity hardware). One is distributed training of deep learning models. It is essentially a repeated random-read workload of about a few million images for a single round of training. Because of the random-read nature of the workload, prefetching the data cannot hide the latency enough to optimize the throughput. We introduce how we mitigated the issue by introducing a client-side cache system. Another is dataset generation from scientific simulations (first-principles calculation), and image rendering from 3D models. The amount of data grows constantly proportional to the amount of computational resources (e.g. GPUs). We introduce a new form of the small files problem in Apache Ozone - the overhead of a metadata of a single object cannot be zero. We also discuss how we aggregate multiple (hundreds to million) files as a dataset file, under the constraints in the current situation of the client-side.

Speaker

Kota Uenishi

Preferred Networks, Inc.Engineer

2022-07-30

14:10-14:50

What's new in Apache Impala 4.x

Apache Impala是一个基于MPP架构实现的分布式查询引擎。本次演讲将分享Impala的基本实现和Impala中的基本概念，并基于此介绍Impala 4.x中值得关注的新功能和改进，最后也会介绍Impala的未来规划。 Apache Impala is a distributed, massively parallel analytic query engine. This presentation will provide a short description of how Impala works and the basic concepts in Impala. Then the talk will introduce some notable features/improvements in Impala 4.x, and finally the roadmap of Impala.

Speaker

黄权隆

ClouderaStaff Software Engineer

2022-07-30

14:50-15:30

HBase在美团的改进和实践

主要讲述HBase在美团大规模应用场景下遇到的一些问题和改进方案，以及美团使用HBase的一些实践经验。

Speaker

哈晓琳

北京三快在线科技有限公司研发工程师

2022-07-30

15:30-16:10

Support Customized Kubernetes Schedulers: 为Spark on Kubernetes提供更完善的调度能力

Spark on Kubernetes得到了越来越多的关注和使用，由于Kubernetes对批量调度支持缺乏，导致大数据场景调度经常出现资源死锁的问题，同时，缺乏队列、优先级、资源预留、多样性算力调度等高级能力。本议题将介绍Apache Spark社区Support Customized Kubernetes Schedulers的最新进展和最佳实践。 - Spark on Kubernetes的现状和挑战 - Spark社区最新进展：Support Customized Kubernetes Schedulers - 通过Volcano，展示Spark on Kubernetes调度能力 - Demo演示：演示Spark + Volcano的整体功能，队列、优先级、资源预留、多样性算力调度的能力。

Speaker

姜逸坤

华为高级软件工程师

2022-07-30

16:10-16:50

Cloud Shuffle Service 在字节跳动 Spark 场景的应用实践

字节跳动内部主要使用 Spark 进行离线大数据处理，每天线上约有几十万的 Spark 作业。内部业务用户对 SLA 有明确需求，如果破线将对业务产生较大影响。Shuffle 是 Spark 引擎的一个重要操作，在大规模作业下，开源 ExternalShuffleService(ESS) 的实现机制容易带来大量随机读导致的磁盘 IOPS 瓶颈、Fetch 请求积压等问题，进而导致运算过程中经常会出现 Stage 重算甚至作业失败，继而引起资源使用的恶性循环，严重影响 SLA。此外，在字节跳动内部的在离线混部场景下，在线机器的磁盘容量等能力较小，运行中经常遇到磁盘满的问题。在此背景下，字节跳动 Spark 团队一方面针对 ESS 做了大量的优化，包括 Shuffle 相关参数优化(减少随机读的请求)、增加 Shuffle 限流等，大大提高了 ESS 在 SSD 集群的稳定性；另一方面在 HDD 磁盘/在离线混部等场景的集群中，我们提出了 Cloud Shuffle Service(CSS) 作为解决方案，即 map task 通过 push 的方式将同一个 partition 的数据推送到同一个 CSS 工作节点，reduce task 可以从对应的节点进行顺序读，大大提高了读取的性能和 Shuffle 的稳定性，有效保障了SLA。目前字节跳动内部的线上 Spark / Flink / MapReduce 均已接入CSS。

Speaker

魏中佳

字节跳动基础架构大数据开发工程师

2022-07-30

16:50-17:30

Interactive data engineering workload execution using Livy session on Kubernetes cluster

In recent times, the demand for time-critical short-lived interactive spark workloads has grown tremendously. One common use case is that of data preparation wherein a pipeline of data integration steps is built through spark driven interactions with subset of data (data worksheet) and visualizing the transforming data responses in real time. This recipe of interactive spark queries is then published to be applied on petabytes of data at hyperscale. Informatica supports these use cases through deep integration with Apache Livy on a managed Kubernetes cluster. This session will cover in detail the enhancements made to Apache Livy SDK for supporting concurrent asynchronous submission of spark code snippets (statements). Deep dive of the framework will cover the lazy deployment of Livy server as a Kubernetes service, the optimizations made in the client agent application to achieve sub second queries dispatch, and a new Java’s *CompletableFuture* based listener for asynchronous queries monitoring and results retrieval. In this talk, we will also discuss how the framework prioritizes and supports fail-fast quick dispatch mode over traditional batch workflow demands like job recovery, cluster state correctness, and lazy job resources availability to describe a few. Finally, we will summarize observed performance gains regarding job runtime achieved by this framework over a regular Spark job.

Speaker

Chaturvedi, Anmol

Informatica CorporationDirector of Engineering

2022-07-30

13:30 -17:30

API/微服务

2022-07-30

16:50-17:30

Apache Shenyu 基于 OpenSergo 实现全链路灰度

微服务(MicroServices) 架构的兴起让大规模、高并发、低延迟的分布式应用成为可能。Apache ShenYu网关作为流量入口，和OpenSergo结合，给业务提供全链路灰度能力，让业务更加稳定。本次演讲将分享OpenSergo和Apache ShenYu网关如何协同实现全链路灰度的能力，并能够动态修改灰度规则、实时灰度。

Speaker

鲁严波

阿里巴巴高级开发工程师

2022-07-30

16:10-16:50

API 网关的实践之路

API网关是什么？帮助不同行业/企业解决了哪些痛点？它的发展会越来越好吗？来这里一起探讨下吧

Speaker

展留坤

API7.ai产品经理

2022-07-30

15:30-16:10

APISIX Runtime Debugging

APISIX provides powerful runtime debugging and context snooping capabilities. But few people seem to have paid attention to this powerful feature. GOALS: Introducing APISIX runtime debugging capabilities Introducing the technical principles of APISIX runtime debugging Showing the practical utility of APISIX runtime debugging Looking to the Future of APISIX Runtime Debugging

Speaker

ZhengSong Tu

Apache APISIXSoftware Engineer

2022-07-30

13:30-14:10

基于 Apache APISIX 的 K8s 网关建设

基于APISIX的K8s网关建设，从Nginx迁移至APISIX原因以及如何快速部署、集成现有容器平台的。

Speaker

曾强

武汉木仓科技股份有限公司运维开发

2022-07-30

14:10-14:50

安信证券基于 Apache APISIX 的云原生网关实践

本次演讲主要从架构规划、管理模式、实践案例等方面介绍 Apache APISIX 在安信证券的落地与实践情况

Speaker

卢勇辉

安信证券股份有限公司软件工程师

2022-07-30

14:50-15:30

为什么现代微服务架构需要云原生 API 网关

微服务架构越发流行，API 作为基础设施，数量呈现爆发增长。应用与应用之间的访问关系越发复杂，外加每个微服务的快速上下线、扩缩容等，我们必须要很好解决 API 的动态管理、安全策略等新问题。本地分享将给大家剖析这些问题的由来，以及如何用云原生 API 网关（Apache APISIX）来彻底解决这类问题，让我们的微服务架构不仅满足当下，也能够顺利过渡到云原生为将来铺路。

Speaker

王院生

API7.aiCo-founder & CTO

2022-07-30

13:30 -17:30

Streaming B

流式处理生态专场

2022-07-30

13:30-14:10

基于 Flink CDC 和 Hudi 高效地构建实时数据湖

数据库中的业务数据是最有价值的数据之一，如何有效地将这些数据高效地同步到数据湖中是一个非常有价值的主题。 CDC（Change Data Capture）是用于从数据库中捕获变更的技术，Flink CDC 是实时数据集成框架的开源代表，具有全增量一体化、无锁读取、并发读取、分布式架构等技术优势，在开源社区中非常受欢迎。除了具备实时入湖入仓能力，Flink CDC 还支持强大的数据加工能力，可以通过 SQL 对数据库数据做实时关联、聚合、打宽等, 配合 Flink 丰富的下游生态可以将加工后的数据方便地写入 Kafka、Hudi、Iceberg 、Doris等下游。在本次分享中，徐榜江老师首先会分享 Flink CDC 的无锁算法、并行读取、断点续传和分布式架构等核心设计和实现，并结合具体的业务场景，分享Flink CDC 在不同场景中的应用，然后配合 demo 详细介绍如何基于 Flink CDC 和 Hudi 高效地完成实时数据湖构建。

Speaker

徐榜江

阿里云高级研发工程师

2022-07-30

14:10-14:50

使用 Apache Pulsar 开发基于 Apache Flink 的流批一体化应用

Apache Pulsar 已经在 Apache Flink 中提供了基于 DataStream API 的 Connector，使用该 Connector 可以读写 Pulsar，开发批流融合的应用。本次演讲中，盛宇帆将向大家介绍 Connector 的背后技术细节，Pulsar 的各类能力如何在 Connector 上使用并支持。并将现场演示如何基于 Connector 去设计开发一个一致性、高吞吐的流计算应用。

Speaker

盛宇帆

StreamNativeFlink 开发工程师

2022-07-30

14:50-15:30

基于数据湖格式构建流式增量数仓——CDC

随着数据湖格式的兴起和应用，如何在实际生产环境中更好的与现有大数据生态结合，解决当前大数据/数仓架构下的难点，是需要持续去探索和丰富的。该topic探讨在经典的数仓CDC场景下，如何将Apache Hudi和Apache Spark结合，实现CDC解决方案，来构建完整的流式增量数仓。

Speaker

毕岩

阿里云智能-计算平台事业部-开源大数据平台技术专家

2022-07-30

15:30-16:10

Introduce the Flink SQL Connector for Pulsar and PulsarCatalog

Based on the new Flink DataStream API Pulsar connector, the StreamNative team implemented Flink's Table API Connector for Pulsar and PulsarCatalog to help users to interact with Pulsar clusters via Flink SQL easily. Yufei will go through the design of the SQL connector. He will introduce how the connector transforms Pulsar data and manages schema evolution. SQL semantics within Pulsar context will be covered as well. Finally, Yufei will introduce PulsarCatalog's two different modes of using Pulsar as a metadata store and how it makes querying from Pulsar easier for the user.

Speaker

Yufei Zhang

StreamNativeSoftware Engineer

2022-07-30

16:10-16:50

Making Flink K8S works as your wish

让Flink在K8S运行的更好，当今K8S已经慢慢进入线上部署服务，在整个大数据技术演进过程中，我们对运行在K8S上的Flink job有了更高的需求，例如， 1. 在资源紧缺的情况下，我们需要Flink job能够按照一定的规则按序运行。 2. 集群资源可配置，不需要人工计算和配置Flink job运行时的资源，确保Flink job时的资源预占。更广泛的需求是：调度算法的多样性，调度性能的高效性，无缝对接主流计算框架，对异构设备的支持等等 Volcano正是针对这些需求应运而生的。同时，Volcano继承了Kubernetes接口的设计风格和核心概念。Volcano是CNCF下首个也是唯一的基于Kubernetes的容器批量计算平台，主要用于高性能计算场景。它提供了Kubernetes目前缺少的一套调度机制，这些机制通常是机器学习大数据应用、科学计算、特效渲染等多种高性能工作负载所需的。下面让我们针对CNCF PodGroup概念进行展开，领略一下Flink on K8S 和 volcano如何让Flink job按照我们的意愿运行。

Speaker

赵波

华为无

2022-07-30

16:50-17:30

Use Apache Pulsar Functions in a Cloud-Native way

Pulsar Function is a succinct computing abstraction Apache Pulsar provides to express simple ETL and streaming tasks. The simplicity comes in two folds: Simple Interface and Simple Deployment. As it has been adopted, we realized that the ability to run natively on the cloud and integrate multiple functions into one integrity are key to user success. We developed this new feature -- Function Mesh -- to support these new requirements. This talk aims to provide a thorough walkthrough of this new Function Mesh Feature, including its design, implementation, use cases, and examples, to help people seeking simple streaming solutions understand this newly created powerful tool in Apache Pulsar.

Speaker

Rui Fu

StreamNativeSoftware Engineer

2022-07-30

13:30 -18:50

IoT and IIoT

2022-07-30

13:30-14:10

行家设备云 AIOT助力工业服务数字化转型

首先，通过介绍目前中国工业服务现状以及数字化转型的大趋势，引入到震坤行在面向工业设备的AIoT平台“行家设备云”建设方面的实践工作，接着，围绕目前流行的Apache开源项目包括了Apache Nifi，Apache Flink，Apache Hudi，Apache IoTDB等，来深入详细介绍构建行家设备云的关键技术底座“基于数据湖的可预测性维护平台”，最后通过一个在“行家设备云”上真是运行的预测性维护案例来介绍如何基于AIoT技术实现工业设备的一站式数字化闭环服务

Speaker

李知周

震坤行工业超市物联网专家

2022-07-30

14:10-14:50

工业物联网时序数据库 Apache IoTDB

Apache IoTDB（物联网数据库）是为管理物联网场景下的海量数据而生的数据库管理系统软件。它集成了一体化的数据收集、存储、管理与分析功能。物联网场景下的时序数据具有产生速度快、体量大、潜在价值高等特点。IoTDB针对时序数据的特性，使用了高效的压缩方案来帮助用户降低存储成本；实现了高效的读写方案来帮助用户提升数据业务体验；提供了丰富的时序数据分析能力来帮助用户更好地从数据中挖掘价值。 Apache IoTDB 采用轻量式架构，具有高性能和丰富的功能，并与Apache Hadoop、Spark和Flink等进行了深度集成，可以满足工业物联网领域的海量数据存储、高速数据读取和复杂数据分析需求。Apache IoTDB秉持"开放带来共赢"的理念来做开源，在2018年孵化并在2020年毕业成为Apache国际顶级数据库软件项目。

Speaker

张金瑞

天谋科技(北京)有限公司软件开发工程师

2022-07-30

14:50-15:30

Apache JMeter 在 IoT 测试中的应用

随着物联网设备规模的增长和业务逻辑的日益复杂，为了保证物联网应用的质量，不仅要进行功能方面的验证，还要对应用大量接入设备时的可用性和可靠性进行验证。物联网测试的价值与必要性日渐凸显。而物联网系统接入设备量大、协议多样化等特点，也使物联网测试面临着重重挑战。EMQ 公司在构建面向物联网的数据基础设施时，广泛使用基于 Apache JMeter 的测试工具对物联网协议与应用进行测试，本次演讲将与大家交流使用 JMeter 进行物联网应用测试过程中的心得。一方面，将重点分享使用 JMeter 测试物联网协议尤其是 MQTT 协议的经验，介绍使用 JMeter 插件的扩展性实现物联网协议测试的方法，以及如何设计 MQTT 连接场景的脚本和消息吞吐场景的脚本用于实际测试。另一方面，将以 Apache IoTDB 为例，介绍使用 JMeter 测试物联网应用的一些想法。希望通过本次演讲，大家得以一窥 Apache JMeter 在物联网测试领域的魅力。

Speaker

殷翀元

EMQ项目经理

2022-07-30

15:30-16:10

Apache IoTDB在华为云的实践

王超将讲述Apache IoTDB在华为云的实践，其中包括华为云在IoTDB的内核增强，以及物联网时序数据和非时序数据的多模融合分析技术和场景等。

Speaker

王超

华为云计算技术有限公司Apache IoTDB Committer，华为云MRS时序数据库研发负责人

2022-07-30

16:10-16:50

IOTDB 在阿里云智能制造业务中的实践

主要介绍基于云原生的阿里云智能制造平台架构设计以及 IoTDB 在平台中的实践，具体包括：智能制造平台解决方案介绍：包括不限于数据采集、建模、存储、计算、分析、智能决策等。基于IoTDB的触发器实现数据分发的架构设计与实现 IoTDB 在水泥、钢铁、垃圾焚烧等传统业务场景中的实践经验。

Speaker

巩宁

阿里巴巴高级技术专家

2022-07-30

16:50-17:30

MQTT-on-Pulsar：How MoP Supports MQTT 5 on Pulsar

The MQTT 5.0 protocol has been released for many years and it is widely used in the field of IoT. This talk will walk you through how Apache Pulsar supports MQTT 5.

Speaker

Zhao, Qiang

StreamNativeSoftware Engineer

2022-07-30

17:30-17:30

IoTDB-Workbench低成本助力工业实时数据库

实时数据是工业领域的重要资产，在诸多场景都有广泛的应用。在轨道交通场景，利用实时数据对桥梁状态（挠度、应变、振动、支座位移）进行健康检测并加固，避免桥梁坍塌，保障人民出行安全；在能源管控场景，利用实时数据对化工厂进行能耗监测、报警管理、预测优化，实现节能、减排、增效，响应“碳达峰、碳中和”；在智能制造场景，利用实时数据通过对设备状态监控和统计过程控制，提高生产效率，实现制造页转型升级，响应中国制造2025的号召。而IoTDB实时数据库围绕着这些需求提供了完整、可靠、低成本、轻量化的解决方案。IoTDB Workbench针对IoTDB实时数据库，提供了IoTDB-cli客户端的全部基本功能，并在此基础上，增加了数据可视化、监控数据实时展示等功能。 IoTDB数据库作为Apache ASF顶级项目，目前在全球已经有超百万台终端设备应用。而IoTDB workbench的推出，将更有效地为IoTDB服务，包括但不限于丰富的聚合函数库、高效的实现数据管理可视化、监控运维可视化的功能。监控运维的功能将为数据库的可靠运行提供保障，包括JVM指标、CPU指标、内存指标、存储指标、查询指标等，并提供慢查询语句信息显示以及慢查询日志下载功能。大大降低了运维人员运维成本，方便了IoTDB数据库可视化管理。

Speaker

郑强

重庆市赛迪信息技术有限公司工程师

2022-07-30

18:10-18:50

Apache IoTDB UDF 的基本原理及其在工业数据质量领域的实践

Apache IoTDB UDF 框架面向用户提供了一套简单易用、语义丰富的 Java API。用户基于此框架可以快速构建功能强大的时序处理逻辑：能够按逐个数据点、Sliding Time Window、Sliding Count Window 或者 Session Window 的方式对原始数据进行消费，进而实现时间序列的聚合、变形或扩增。目前社区已基于此框架实现了百余种函数。配合 Apache IoTDB 的嵌套表达式特性，可大幅度降低时序分析场景下的业务负担。本演讲将主要围绕 Apache IoTDB UDF 框架展开，包含两个主要部分： 1. Apache IoTDB UDF 的基础知识：包含 UDF 接口语义的介绍、编程示例、使用示例以及未来展望等； 2. Apache IoTDB UDF 在工业数据质量领域的实践：包括数据画像、数据修复、数据匹配、异常检测、序列发现等函数的介绍。

Speaker

苏宇荣

清华大学软件学院硕士研究生

贺文迪

清华大学软件学院硕士研究生

2022-07-30

13:30 -18:10

Middleware

2022-07-30

13:30-14:10

喜马拉雅基于Apache ShardingSphere实践

喜马拉雅成立之初，各个业务管理各自的数据库，在不断地发展中，逐渐意识到需要在公司层面，提供统一的定制化的数据访问平台的重要性。为此，基础架构团队推出了自己的PaaS化平台，业务只需要申请一个资源ID，就能使用数据库，达到对资源使用的全部系统化。其中，对数据库的访问，是基于 Apache ShardingSphere 来实现，基于Apache ShardingSphere强大功能，在故障容灾、资源动态变更、读写分离、影子库、多活、分布式唯一ID和监控报警等方面，做了优化和增强。

Speaker

彭荣新

上海喜马拉雅科技有限公司架构师

沈辉

上海喜马拉雅科技有限公司高级工程师

2022-07-30

14:10-14:50

Apache EventMesh如何解决SaaS组合式应用集成标准化问题

在SaaS新时代下，业务适应性需求引导企业转向支持快速、安全和高效应用变化的技术架构。组合式应用作为加速数字化的关键技术，是Gartner在2022年的重要战略技术趋势之一，它由以业务为中心的模块化组件构建而成，使技术和业务团队可以更敏捷、更有效地重用代码。组合式应用需要面临的一个难题是如何解决各个应用之间的集成标准问题，比如应用可能仅支持HTTP、TCP等协议中的一种，而缺乏统一的通讯标准就给业务落地该架构带来了困难。下面介绍Apache EventMesh 是如何解决这一问题。

Speaker

罗锦荣(Alex Luo)

华为技术有限公司华为主任软件工程师

2022-07-30

14:50-15:30

Kafka高可用架构实践

本次演讲主要介绍Kafka高可用架构在消息服务中的应用实践，从物理多租架构、高可用部署、数据高可靠等几个方面，介绍如何构建高可用的Kafka消息服务

Speaker

刘俊洋

华为云计算技术有限公司消息中间件技术负责人

2022-07-30

15:30-16:10

基于消息的分布式事务

分布式事务有2PC、SAGA、TCC等方案，演讲主要介绍各种方案的架构和优缺点，已经如何基于消息队列来实现

Speaker

余洲

华为云计算技术有限公司高级工程师

2022-07-30

16:10-16:50

云原生数据库如何走向极致扩展- Amazon Aurora 的做法

除性能、可用性以外，数据库的可扩展性也是用户在进行数据库选型时或者随着业务扩张数据量不停增长需要考虑的方面。以亚马逊云科技自研的关系数据库 Amazon Aurora 为例，它在存储扩展、读节点水平扩展、全球数据库扩展、以及单节点的纵向扩展方面都有良好的支持。在写节点的水平扩展上，Aurora 现在支持最多4个写节点，在用户写的并发数比较多时，用户需要进行数据库中间件的选型。本次分享会介绍如何结合 Apache ShardingSphere 来进一步扩展 Amazon Aurora 的写扩展的能力，并会为听众在进行数据库中间件选型基于什么角度考虑提供一个参考。本次演讲会涵盖： 1. Amazon Aurora 和 ShardingSphere 基本介绍 2. ShardingSphere-Proxy 对 Amazon Aurora 动态分片的支持 3. ShardingSphere-Proxy 对 Amazon Aurora 读写分离的支持 4. ShardingSphere-Proxy 对 Amazon Aurora 表 Join 的支持 5. ShardingSphere-Proxy 对 Amazon AuroraFailover 的支持 6. ShardingSphere-JDBC 与 Amazon Aurora 的结合

Speaker

马丽丽

亚马逊云科技数据库专家架构师

2022-07-30

16:50-17:30

SpamAssassin 4.0: new features to detect new spam types

SpamAssassin 4.0 is around the corner with lot of new features that can be used to improve how spam messages are detected. Giovanni Bechis will talk about recent spam campaigns and how new SpamAssassin features can be used to detect new spam types and new malware that can be injected into Office files.

Speaker

Bechis Giovanni

SNB S.r.l.Ceo

2022-07-30

17:30-18:10

Creating cross-platform, reproducible, binary builds for Java projects

With the increasing focus on supply chain security, there is a greater demand for reproducible binary builds. This presents an opportunity for open source projects as, unlike closed source projects, reproducible binary builds enable open source projects to categorically demonstrate that the convenience binaries that they provide have been built from an tagged, unaltered source tree that end users are able to audit. Over the past year, the Apache Tomcat project has been working towards cross-platform reproducible binary builds. Tomcat is now in the position where, from a given project tag, identical release distributions including source archives, binaries and authenticode signed installers for Windows can be generated on either Windows or Linux. This session will look at the challenges the Apache Tomcat project faced in moving to reproducible builds, the techniques used to debug differences between builds and the different solutions used to resolve them.

Speaker

Thomas, Mark

VMwareStaff Engineer

2022-07-30

13:30 -18:30

Big Data A

2022-07-30

13:30-14:10

Capturing per thread statistics for a job - Thread-level IOStatistics - HADOOP-17461

For effective reporting of the IOtatistics of individual worker threads, we need a thread-level context that IO components update(HADOOP-17461). IOStatistics is a statistics capturing API created by Steve Loughran which includes counters, max, mins, means, and gauges for a filesystem and streams to look at. These statistics give off better details for a job or operation than we had previously in the Hadoop world and could be used to figure out performance improvements needed in certain areas of a filesystem. Implementation in S3A and ABFS streams includes the creation of an IOStatisticsSnapshot WeakReferenceTreeMap which would store a specific thread ID as the key and the Snapshot of an IOstatistics instance as the value. Only aggregating it once the stream closes and then setting the snapshot back. This would help in getting the IOStatistics of a particular thread’s work even after the streams are closed. This would provide a better view into Hive and spark jobs in the big data space where the failure of a job doesn’t necessarily end up with a certain set of statistics to look at and debug the issue better. Having the statistics per thread level is also beneficial in getting a better look into such jobs dealing with big data and presenting us with a good way to understand the issue better on a thread level.

Speaker

Singh, Mehakmeet

ClouderaSoftware Engineer II

2022-07-30

14:10-14:50

Apache Ozone 的最近进展和实践分享

Apache Ozone是Hadoop社区推出的新一代分布式存储，构架简洁易扩展，兼容Hadoop文件系统，无缝支持MR, Hive，Spark和Impala等计算引擎；兼容S3对象协议，支持AWS客户端访问；丰富的企业级特性，包括用户认证，数据透明加密，GDPR规范，访问权限控制，配额管理等等。本次分享，将介绍Ozone的定位和社区的一些最新进展。包括为什么使用Ozone ，Ozone的使用场景，Ozone的新特性-存储成本优化利器 - 纠删码的进度和现状，文件目录操作的优化提升，企业级多租户的支持等等。

Speaker

Yan Liu

ClouderaSenior Sales Engineer

陈怡

ClouderaPrincipal Engineer

2022-07-30

14:50-15:30

字节跳动基于 Apache HUDI 的数据湖表优化管理服务

字节跳动目前是国内数据湖覆盖数据最多的公司之一，覆盖了百 PB 级别的数据。随着任务数的增加，对于任务的管理成本也是大幅度增加，并且 HUDI 本身提供的表服务例如 compaction，clustering 等提供的策略比较基础。在此背景下字节跳动实现了一个数据湖管理优化表优化管理服务，用于统一管理以及优化 HUDI 表，对自适应生成的 Hudi 表优化任务进行全托管，且近期计划贡献到 Hudi 社区。

Speaker

喻兆靖

字节跳动高级开发工程师

2022-07-30

15:30-16:10

如何使用 Apache Seatunnel 简化数据同步

主要介绍如何使用 Apache Seatunnel 基于 Flink/Spark 来实现不同数据源间的数据同步，以及 Seatunnel 的核心实现、社区后续发展方向。最后介绍下如何参与开源社区。

Speaker

陶克路

字节跳动基础架构研发工程师

2022-07-30

16:10-16:50

Hadoop Vectored IO: your data just got faster!

Since 2006 the world of big data has moved from terabytes to hundreds of petabytes, from local clusters to remote cloud storage, yet the original Apache Hadoop posix-based file APIs have barely changed. It is wonderful that these APIs have worked so well, but we can do a lot better with remote object stores, by providing new operations which suit them better, targeted at columnar data libraries such as ORC and Spark. Only a few libraries need to migrate to these APIs for significant speedups of all big data applications. This talk introduces a new Hadoop Filesystem API called "vectored read", coming in Hadoop 3.4. An extension of the classic FSDataInputStream it is automatically offered by all filesystem clients. The S3A connector is the first object store to provide a custom implementation, reading different blocks of data in parallel. In Apache Hive benchmarks with a modified ORC library, we saw a 2x speedup compared to using the classic s3a connector through the Posix APIs. We will introduce the API spec, the S3A implementation and the benchmarks, and show how to use it in your own applications. We will also cover our ongoing work on providing similar speedups with other object stores, and use of the API in other applications.

Speaker

Thakur, Mukund

ClouderaStaff Software Engineer

2022-07-30

16:50-17:30

Apache Hive 4.0 的新特性

Apache Hive 4.0的发布计划已经暂停很久了，但是社区并没有停止研发和改进的脚步，被解决并被标记为Apache Hive 4.0的Patch 已经累加到了很高的一个数量级。但是在中国，并不是所有的Apache Hive用户都清楚和了解Apache Hive的能力，很多人依旧停留在Hive On MapReduce的阶段。为了改变这一现状，我认为有必要重新给大众介绍Apache Hive及Apache Hive 4.0的众多新特性，来改变大家对Apache Hive的认知，增加Apache Hive的用户和社区交流

Speaker

Yan Liu

ClouderaSolution Engineering

2022-07-30

13:30 -17:30

Messaging A

2022-07-30

13:30-14:10

FLiPN Awesome Streaming with Open Source

In this talk I will walk through how to build different types of streaming applications utilizing Apache NiFi, Apache Flink, Apache Spark and Apache Pulsar together. We will ingest various data and REST feeds to enrich and send to Apache Pulsar. We will build applications on top of this live streaming data with Web socket dashboards, Apache Spark SQL ETL and Apache Flink continuous SQL. We will also ingest IoT sensor data from various devices and stream directly to Apache Pulsar via MQTT and native Pulsar protocol. Finally at the end of this talk, developers will walk away with the What, How and Why of building this apps.

Speaker

Spann, Timothy

StreamNativeDeveloper Advocate

2022-07-30

14:10-14:50

Introducing TableView: Pulsar's database table abstraction

In many use cases, applications are using Pulsar consumers or readers to fetch all the updates from a topic and construct a map with the latest value of each key for the messages that were received. The new TableView consumer offers support for this access pattern directly in the Pulsar client API itself, and encapsulate the complexities of manually constructing such local cache manually. In this talk, we will demonstrate how to use the new TableView consumer using a simple application and discuss best practices and patterns for using the TableView consumer.

Speaker

Kjerrumgaard, David

StreamNativeDeveloper Advocate

2022-07-30

14:50-15:30

Get rid of topic metadata limitation for infinite data retention of Apache Pulsar

Abstract: The Pulsar topic data will not be limited by the storage resources of a single node, even if the topic is a non-partitioned topic or a partitioned topic only has a single partition. The fundamental reason is that Pulsar uses the logical storage model. The pulsar topic data will be distributed to more bookies nodes, and the partition/topic will not be 1:1 bound to any storage node. But the Logical Storage Model also means we need to maintain metadata for each data segment in the metadata store, a ledger list to indicate that the data is stored in which ledgers. The metadata storage mode will be a bottleneck if you want infinite topic data retention. Reducing the frequency of ledger rollover can extend the time to encounter bottlenecks, but the topic unloading will also cause ledger rollover. It can be triggered but the load manager, restart broker, or manually. The topic metadata limitation is mainly in two parts: 1. The large znode size to maintain the metadata of a topic 2. The memory consumption to cache all the ledgers of a topic This talk will share how Pulsar gets rid of the limitation to support infinite data retention of the topic.

Speaker

Penghui Li

StreamNativeTech lead & Apache Pulsar PMC member/committer

2022-07-30

15:30-16:10

How Kafka-on-Pulsar Integrates with Pulsar Schema to Leverage Pulsar for Kafka Users

KoP (Kafka-on-Pulsar) is a open source protocol plugin of Apache Pulsar that supports producing and consuming via Apache Kafka clients. Pulsar has built-in schema registry service that manages the schema definition of topics, while Kafka needs some 3rd-party schema registry service like the most popular Confluent Schema Registry. This talk is mainly about how does KoP make use of Pulsar's schema registry service,

Speaker

Yunze Xu

StreamNativeSoftware Engineer

2022-07-30

16:10-16:50

Deep Dive into Message Chunking in Pulsar

Apache Pulsar, like all messaging systems, imposes a size limit on each message sent to the broker. It prevents the payload of each message from exceeding the max message size set in the pulsar broker. However, many users need the Pulsar client to send large messages to the broker for use cases such as image processing and audio processing. Therefore, instead of increasing the configuration of max message size, Pulsar provides a message chunking feature to enable sending large messages. With message chunking, the producer can split a large message into multiple chunks based on the max message size configuration and send each chunk to the broker as an ordinary message. The consumer then combines the chunks back to the original message. In this talk, Zike Yang will explain the concept of message chunking, deep dive into its implementation, and share best practices for this feature.

Speaker

Zike Yang

StreamNativeSoftware Engineer

2022-07-30

16:50-17:30

Deep Dive into Apache Pulsar: How Two-Phase Deletion Protocol works between Storage and Metadata

目前 Apache Pulsar 对于数据的删除存在两个步骤，步骤一删除元数据，步骤二删除实际存储数据。由于这两个步骤是分开的，我们无法保证步骤二操作一定成功，因此可能导致元数据被删除成功，但是实际存储数据依然存在。现有 Pulsar 的用户在生产环境也遇到了这个问题，存在大量的脏数据无法删除。因此，我们引入两阶段删除协议来解决以上场景。本次分享将详细介绍两段数据删除的工作原理，让 Apache Pulsar 用户和开发者更加了解该功能背后的原理和工作机制。

Speaker

赵延

StreamNativeSoftware Engineer

2022-07-30

14:00 -17:00

WebServer/Tomcat

2022-07-30

14:00-14:40

State of the Cat

A review of the past year or so for Apache Tomcat and a look forward to what is expected in the coming 12 months.

Speaker

Thomas, Mark

VMwareStaff Engineer

2022-07-30

14:40-15:20

Tomcat优雅停机设计及相关特性应用实践介绍

介绍Tomcat中提供的优雅下线相关设计原理，介绍阿里云微服务引擎MSE如何利用相关特性服务广大外部企业级用户帮助其实现优雅下线。

Speaker

饶子昊

阿里云阿里云智能研发工程师

2022-07-30

15:20-16:00

Extending Valves in Tomcat

Valves act as a request pre-processing mechanism in Tomcat. Custom Valves can be developed by extending the base Valve Class, to integrate additional capabilities for the requests that are accepted by the Tomcat server for processing. Request traffic rate limiting, implementing enhanced mTLS security, debugging mechanisms etc. are some of the examples, where custom Tomcat Valves can be leveraged for extending capabilities of Tomcat. A custom Tomcat Valve can be developed to implement enhanced mTLS security, for use cases such as Tomcat Application server works as an API Gateway, and accepts and process requests from only trusted clients. A Tomcat connector allows to secure the inbound traffic with two-way TLS handshake, using the attributes on the SSLHostConfig section. However, this approach still can't be considered as very secure method of traffic. Tomcat can accept the requests from any client that provides a valid certificate as part of 2-way TLS handshake, as long as the certificate provided by the client can be trusted against the CA trust chain configured on the Tomcat. A custom Valve can be developed which checks the client certificate parameters such as Certificate Distinguished Name (DN) or Certificate serial number to ensure that the request is indeed comes from a client that provides the correct certificate. Distinguished Name or Client Serial Number can be securely stored at Tomcat, as a whitelisted set of client certificates, and only those clients that provide the valid certificate will be accepted for request processing at the Tomcat. Rate Limiting is another area where custom Tomcat Valves can be used. Rate Limiting is restricting the number of requests that can be processed by the application deployed on Tomcat. This can be achieved by developing a custom Tomcat Valve that can be integrated with rate limiting capabilities offered with Google Guava API libraries. Google Guava RateLimiter offers the traffic throttling based on the Token bucket algorithm implementations. For every incoming request that comes to the Tomcat, it is validated against the token availability to ensure the traffic rate that has been set, is maintained. If the inbound request rate is more than what is set at the Valve, requests will be rejected with a custom response code. This rate limiting capability of the custom Tomcat Valve can be further enhanced with a dynamically set rate limiting, that avoids Tomcat application server to be restarted when a new rate is set. This can be achieved by using certain controller requests which also will be processed by the same Valve and will set the new rate limits dynamically, after ensuring the controller requests are authenticated. This will be very useful capability when it requires a dynamic traffic request throttling is required. Debugging is also one useful application area of custom Tomcat Valves. There are already Valves such as Header Dumper Valve that comes with Tomcat, which can dump the header values. However, custom Tomcat Valves can be developed for capturing and dump the request content, session details, certificate parameters including expiry dates (only if there is client authentication is enabled) apart from dumping the header details. This will be very useful at times when there is troubleshooting required to view the request details. Valves are a powerful mechanism in Tomcat that leverages the capabilities of Tomcat request processing pipeline. The capabilities of Tomcat Valves can be extended with custom Valves to meet different use cases such as explained above.

Speaker

Jacob, Dennis

VISASenior Consultant

2022-07-30

16:00-16:40

tomcat在快手的应用实践

快手业务架构部宋洋，将分享tomcat容器在快手内的使用实践，包括tomcat的部署模式、改造实践和服务调优手段等

Speaker

宋洋

快手（北京达佳互联科技有限公司）后端研发工程师

2022-07-30

14:00 -17:30

Culture

2022-07-30

14:00-14:40

开源软件和软件许可证

作为开源软件的重要一环，开源许可证是 copyleft 运动的重要组成，在现在的开源软件生态之中也扮演着重要的角色。无论是著名的谷歌和甲骨文的 java 源代码之争，还是最近的 SFC vs vizio。开源许可证在软件供应链的作用越来越重要。借由于此，SPDX （Software Package Data Exchange）在发展十年之后也算讲 2.2.1 提交为 ISO 标准。在这样一个背景之下，现在每个软件工程师都有必要了解一下开源许可证，以及如何在自己的开源项目中选择开源许可证，又或是在使用别人的项目中需要注意些什么。

Speaker

Rui Chen

MeetupSenior Staff Software Engineer

2022-07-30

14:40-15:20

「开源之史」介绍与思考

开源是人类文化的产物，而文化不是自变量，它是一种被人们选择或者说追求的过程。《开源之迷》作者适兕，期望通过考察过去20多年开源世界走过的历史，尤其是其中的文化变迁，试图寻找出文化在开源的发展过程中所发挥的重要作用。

Speaker

适兕

「开源之道」作者

2022-07-30

15:20-16:00

认识“开源”两百天

想到开源大家的脑海中可能会浮现丰富的词汇：开放、共享、源代码、社区、Apache项目、GitHub、贡献者、Community over Code等等，而如果把它们比喻成“果实”，那培育这些果实的土壤一定是“开源文化”。所以开源文化是怎样的？一个小白如何从认识开源到贡献开源？通过本次分享希望能引导更多的小伙伴认识开源和贡献开源。

Speaker

展留坤

API7.ai产品经理

2022-07-30

16:00-16:40

当开源和开放合流：Open 的价值核心和大众认同

开源（Open Source) 是 Open Knowledge （开放知识）众多分支中的一支也是最为广泛被认知、参与率最高的一支，但实质 Open 的世界中还有诸如开放数据(open data)、开放访问(open access）、开放科学(open science）等等分支的存在。那么什么是 Open 的价值核心？开源社区和各类 Open 社区间如何形成合力，去更好地推动大众对 Open 价值的认同和参与？我们需要如何努力，才能将 Open 作为一种行为的准则嵌入我们的社会运行中？本演讲将带来个人的一些思考，并寻求社区的反馈和协作。

Speaker

高丰

开放数据中国执行主任

2022-07-30

16:40-17:20

The art of technical writing

This will be a presentation on how to turn software documentation into an art form such that it becomes natural and helps you ride better code. It will also help encourage her documentation culture in your organization that will reap the rewards such as better communication between Engineering, Business, finance, and operations.

Speaker

Sacks, Matthew

StoneFish Defense Inc.Ceo

2022-07-30

14:00 -18:00

RPC

RPC（Remote Procedure Call）即远程过程调用，是用于解决分布式系统架构中的跨进程、跨主机通信问题的技术方案，是当今广泛运行的分布式系统背后的核心技术。随着云原生微服务架构的演进，RPC 开始扮演着更为重要的角色。

本论坛聚焦高性能、易扩展的 RPC 框架及其实践案例分享。

2022-07-30

14:00-14:40

bRPC：云原生时代的RPC框架进化

RPC框架伴随着分布式系统诞生，已经有多年的发展历史，那么在云原生的时代，RPC框架又有哪些新的发展方向？演讲内容会从以下几个角度展开： * 云原生时代的通信协议：bRPC的多协议支持 * 云原生时代的服务治理：bRPC的负载均衡、熔断与限流机制 * 云原生时代的服务可观测：bRPC与Prometheus * 云原生时代的多语言通信方案：bRPC与Service Mesh * 云原生时代的软硬结合性能优化：bRPC与RDMA、用户态协议栈

Speaker

王伟冰

百度资深研发工程师

2022-07-30

14:40-15:20

brpc高性能最佳实践

Brpc是百度内最常使用的工业级RPC框架，具备高性能，易用性，定位问题的便利性，因此百度内部已经有上千个服务使用brpc。Brpc研发较早，早年间性能已经能够满足大部分业务场景的需求。但是随着云计算的不断发展、硬件技术的不断革新，对brpc网络通信的吞吐和时延都提出了更高的要求。因此brpc不能仅仅满足于多年前的高性能，而是应该进一步提升吞吐和时延，将目前的硬件能力发挥到极致，以满足目前的业务需求。本次分享将以云存储场景为例，详细介绍百度在brpc适配云存储场景下为了提升brpc性能所做的实践。主要包括三部分内容：一是对brpc线程模型的改造，同时使用用户态tcp以及dpdk技术加速数据收发；二是对brpc代码以及使用模块，主要包括原子变量、IOBuf以及序列化的裁剪和优化；三是对brpc使用rdma的改造设计，充分发挥rdma的性能优势。以上对brpc的改造工作已经对接了百度云存储的cds云盘团队，并且已经规模化上线运行。可以让百度云存储cds产品的前端iops性能有较大幅度的提升。

Speaker

周末

百度时代网络技术（北京）有限公司资深研发工程师

2022-07-30

15:20-16:00

基于bRPC打造高性能Service Mesh数据平面

bRPC是一款优秀的工业级RPC框架，Service Mesh是云原生时代最具统治级别的微服务架构，将bRPC和Service Mesh结合起来又会产生怎样的业务价值？本次分享主要介绍百度基于bRPC自研高性能服务网格数据面的技术实现，以及在核心业务线大规模应用和生产实践经验。

Speaker

刘帅

百度资深研发工程师

2022-07-30

16:00-16:40

从Dubbo、SpringCloud 到服务网格的平滑演进实践

在过去的几年时间里，Dubbo 、SpringCloud 等传统微服务框架被企业大规模使用，伴随着云原生浪潮的到来，ServiceMesh 、Istio 等服务网格架构理念和产品的兴起，微服务领域出现了新的变化。 Istio作为一个领先的服务网格解决方案，正在获得极大的欢迎，并广泛用于云原生应用。Istio通过将复杂性从应用程序代码卸载到独立的基础设施层，帮助客户建立一个高度弹性、安全、可观察和可扩展的微服务架构。在这个演讲中，受到阿里云内外部典型客户的云原生解决方案的启发，曾宇星将结合阿里云服务网格产品ASM的产品化实践，分享Dubbo、Spring Cloud服务迁移Istio的最佳实践主题。他将分享Dubbo、Spring Cloud和Istio 框架之间的异同，包括机制和工作场景，并重点介绍业务生产落地的最佳实践方案，这使得Dubbo、Spring Cloud应用服务零代码修改就能在Isito 框架下运行，并使用Istio 提供服务流量管理和治理能力

Speaker

曾宇星

阿里云技术专家、云原生架构师

2022-07-30

16:40-17:20

Service Mesh 在小米的实践

小米是 Dubbo 的深度用户，基于 Dubbo 构建了微服务体系，对 Dubbo3 及 Dubbo-Go 社区都有深度的合作，本次演讲主要分享小米结合 Dubbo 的Service Mesh实践，同时也会分享小米围绕Dubbo建设的微服务治理体系。

Speaker

张志勇

小米高级开发工程师

2022-07-30

17:20-18:00

Apache ShenYu网关如何代理Dubbo服务

Apache ShenYu是一个异步的，高性能的，跨语言的，响应式的 API 网关。兼容各种主流框架体系，支持热插拔，用户可以定制化开发，满足用户各种场景的现状和未来需求，经历过大规模场景的锤炼。Apache Dubbo 是一款微服务开发框架，它提供了 RPC 通信与微服务治理两大关键能力。本次分享将介绍如何通过 Apache ShenYu 网关代理 Dubbo 服务。

Speaker

刘良

Apache ShenYu CommunityApache ShenYu PPMC

2022-07-30

18:00-18:00

茶歇

2022-07-30

14:00 -18:00

Messaging B

2022-07-30

14:00-14:40

RocketMQ Streams-轻流计算在云安全和边缘计算的最佳实践

大数据越来越重要，越来越多的产品依赖大数据计算，如果你的产品是部署在用户机房，边缘端，私有云，当产品输出时，就会部署一些列大数据集群如：采集集群，流计算集群，消息队列集群，每个集群最少3台机器才能保证集群的可靠性，即使有混部，输出资源也是非常大的，尤其是流计算需要对任务提前设置资源，而且每个任务资源开销也很大，很快资源不足了，还需要扩容（采购，审批，部署）。RocketMQ Streams集成了采集能力，消息队列，流计算能力一体化，1core，1g可以部署，而且可以像web应用那样通过增加实例横向扩展，尤其对于高过滤场景做了大量优化资源不随规则增加线性增加，cpu资源是同类大数据计算的40%，cpu是4%。在这个专题中，大家不仅可以了解rocketmq-streams的实现原理，也可以基于阿里对rocketmq-streams的应用实践，来判断自己的业务是否可以有更轻的方案，降低产品成本，提供竞争力

Speaker

袁小栋

阿里云高级技术专家

2022-07-30

14:40-15:20

鲁班RocketMQ平台消息灰度方案

RocketMQ（以下简称MQ）作为消息中间件在事务管理，异步解耦，削峰填谷，数据同步等应用场景中有着广泛使用。当业务系统进行灰度发布时，Dubbo与HTTP的调用可以基于业界通用的灰度方式在vivo的微服务治理与网关平台来实现，但MQ已知的灰度方案都不能完全解决消息的隔离与切换衔接问题，为此，vivo技术架构团队在鲁班MQ平台（包含根因分析、资源管理、订阅关系校验、延时优化等等的扩展）增加了MQ灰度功能的扩展实现，该方案主要以拆分queue的目标使用方式为基础进行深度扩展封装实现。

Speaker

区二立

vivo技术总监

2022-07-30

15:20-16:00

基于RocketMQ Connect构建全新数据流转处理平台

Apache RocketMQ作为业务消息的首选，在业务系统中发挥这重要的作用，与此同时越来越多的数据在RocketMQ上流转，这些数据往往需要与各种数据系统连接，RocketMQ Connect作为RocketMQ数据集成的重要组件，更方便的与RocketMQ数据集成。各种异构数据系统数据可以使用RocketMQ Connect，构建端到端数据管道，ETL，实现CDC，构建数据湖等能力。

Speaker

周波

阿里巴巴高级开发工程师

2022-07-30

16:00-16:40

Apache RocketMQ 5.0：消息、事件、流融合高可用架构的演进

RocketMQ 从2017成为Apache TLP，以高性能、低延迟和高可靠的消息发布与订阅服务成为各厂商业务消息的首选。2021年，Apache RocketMQ全新升级成为云原生的"消息、事件、流" 融合处理平台，与此同时，RoccketMQ的高可用架构也在同期演进，成为能同时处理好“消息”、“事件”和“流”等多个场景的新一代融合架构。本次演讲将会介绍Apache RocketMQ 高可用架构历史演进过程以及新一代RocketMQ 5.0 融合高可用架构的解密，主要包括支持消息场景的Master-Slave架构再升级以及支持实时计算、顺序消息等场景的可插拔的新切换架构。

Speaker

金融通

阿里巴巴Apache RocketMQ PMC Member/ Commiter，阿里巴巴高级研发工程师

2022-07-30

16:40-17:20

消息驱动的轻量级计算在物流领域的应用

物流作为劳动密集型行业，涉及揽收、仓储、海关、干线运输、末端配送等环节，实操链路长，通过IOT设备采集的人、货、场相关的数据量大。如何以较低的技术成本同时满足数据的时效和准确性要求是面临的巨大的技术挑战。本分享介绍了一种不同于传统Hadoop、Flink、Spark等计算引擎的方法，使用基于MQ的轻量级计算以较低的技术成本，支撑物流数据化运营、决策。

Speaker

王鑫

菜鸟网络菜鸟网络高级技术专家、出口物流数据智能负责人，Apache Storm、RocketMQ、IoTDB Committer

2022-07-30

17:20-18:00

Apache RocketMQ 5.0 在流存储领域的实践与探索

在线业务场景，RocketMQ 先是经过了阿里巴巴多年双十一的高压验证，后又被中国大部分互联网和金融Top企业广泛采纳在核心生产系统中，打磨出了极致的性能、易用性和稳定性。然而，RocketMQ 没有止步于此。在 5.0 的最新版本中，RocketMQ 存储层新增了一系列新的特性，包括 Batch、LogicQueue、CompactTopic等。这些特性是 RocketMQ 在流存储领域的实践结晶。来自 Apache RocketMQ 社区的 PMC Member 刘振东，将在这次大会上和大家分享这背后的思考判断与实践路径。

Speaker

刘振东

Apache RocketMQ CommunityApache RocketMQ PMC Member/Committer

2022-07-30

14:00 -18:00

AI分会场

2022-07-30

14:00-14:40

实时深度学习训练PAI-ODL

DeepRec(PAI-TF)是阿里巴巴集团统一的大规模稀疏模型训练/预测引擎，广泛应用于淘宝、天猫、阿里妈妈、高德、淘特、AliExpress、Lazada等，支持了淘宝搜索、推荐、广告等核心业务，支撑着千亿特征、万亿样本的超大规模稀疏训练。基于DeepRec、Flink、Kafka、Flink-AIFlow打造的Online Deep Learning，将在线学习与离线训练相结合，打造一体化的在线离线学习框架，基于云原生架构，提供给用户从离线到在线的一套完整解决方案。本次演讲将会介绍ODL场景下的一系列的关键技术，包括：超大稀疏模型训练/预测、秒级的模型热更新、实时训练模型校正、模型回退及样本回放、样本修复、实时训练弹性资源调度等等。

Speaker

刘童璇

阿里云智能计算平台事业部PAI高级技术专家

2022-07-30

14:40-15:20

Spark + ONNX + CANN: 如何提升分布式推理的性能与体验？

数据处理平台和深度学习框架都在各自的领域中不断发展。通常使用Apache Spark 进行离线数据处理，然后使用各种深度学习框架进行数据推理。用于 DL 推理的简化 API 作为桥梁非常重要。理想的数据和深度学习推理管道是什么样的？我们将讨论如何使用 Spark 和 ONNX 构建您的 AI 应用程序，Spark 社区改进此管道的现状和初步想法，以及充分利用 Ascend Hardware Platform 的功能。本次会议将包括以下部分： - 如何打通大数据和AI推理的流程？ - ONNX是什么如何，帮助硬件进行加速使能。 - 在 Spark 社区中引入“SPIP: Simplified API for DL Inferencing”背景 - 一个简单的演示来展示它是如何工作的。本主题帮助您了解在 ONNX 中集成 Ascend Hardware Platform 的最新进展以及 Spark 社区对推理管道改进的初步构想。

Speaker

王玺源

华为高级软件工程师

姜逸坤

华为高级软件工程师

黄之鹏

华为华为昇腾开源生态总监

2022-07-30

15:20-16:00

Flink ML: 基于Apache Flink的实时机器学习

本次演讲中，高赟博士和张智鹏博士将介绍在 Apache Flink 机器学习库 (Flink ML) 中已经完成的工作，近期的发展计划，以及 Flink ML 的发展愿景。我们设计了原生支持实时机器学习的算法接口，算法使用者可以更容易配置，组合和部署在线预测算法和在线学习算法。所设计的算法接口可以支持多输入多输出，以及将算法模块以有向图的方式进行组合使用。我们设计并实现了基于 DataStream 的迭代引擎，以取代基于 DataSet 的迭代引擎。针对各种算法的需求，我们设计了更容易使用的迭代引擎接口，为算法开发者优化算法性能提供更丰富的接口选择。在此基础上，我们实现了15+个高效、易用的离线/在线的机器学习算法。我们计划按照新设计的算法接口以及迭代引擎，将阿里云研发多年的 Alink 算法库改造并贡献进入 Flink ML。通过将 Apache Flink 的强大社区生态，技术领先的 Alink 算法库，与新设计的算法接口结合在一起，我们希望做到优势互补，帮助 Flink ML 成为最容易使用的，覆盖最多算法的，以及应用最广泛的流批一体机器学习算法库。

Speaker

高赟

阿里巴巴技术专家

张智鹏

阿里巴巴高级算法工程师

2022-07-30

16:00-16:40

BladeDISC: 支持动态Shape的深度学习编译器实践

随着深度学习的不断发展，模型结构在不断演进的同时，底层计算硬件技术更是层出不穷。对于广大开发者来说，不仅要考虑如何在复杂多变的场景下有效的将算力发挥出来，还要应对计算框架的持续迭代。深度编译器是业界在实践中探索出的解决以上问题的技术方向，它让开发者仅需专注于上层模型开发，降低手工优化的人力成本，并进一步压榨硬件性能。业内已经出现了以TensorFlow XLA、Apache TVM为代表的一批深度学习编译器，但是随着深度学习的发展，也来越来越多的模型呈现出了动态Tensor Shape的特性，例如，1）可以支持多种尺寸图片输入的CNN模型；2）以Bert为代表了NLP模型在BatchSize和SequenceLength维度均有动态性。然而前述编译器最初是面向静态Shape场景设计，需要输入及中间Tensor每个维度上具有固定的尺寸，因此并不能很好地适配动态Shape的场景。面临业务中的强需求，深度学习社区也出现了TVM Relay VM的工作。本演讲主要介绍阿里云PAI团队以BladeDISC为中心，在动态Shape编译器上的工作，主要包括如下内容： - BladeDISC的主要架构：为何以及如何基于MLIR框架搭建支持动态Shape的编译器，为什么BladeDISC选择MHLO作为接入层Dialect，以及BladeDISC在为MHLO Dialect做了哪些改造。相较于Apache TVM，在技术路径上有哪些差异以及背后的设计考量。 - 动态Shape带来的挑战：许多静态Shape语义下比较确定性的问题，例如指令层的向量化，codegen模版选择，是否需要implicit broadcast等等在动态shape场景下都会面临更大的复杂性。部分在编译时期的决策需要移动到运行时进行。 - 大粒度算子融合：在动态Shape的语义下，如何实现通过低访存开销的硬件（Shared memory in GPU, Memroy Cache in CPU）实现更大尺度的算子融合。以及如何在未知Shape的情况下，推到Tensor之间在Shape上的约束关系进一步实现性能提升。 - 计算密集型算子：如何利用厂商库、Apache TVM算子生成、手写算子等手段提升计算密集型算子效率，以及非编译生成的算子如何在架构上实现与编译器的统一与互补。 - 如何支持多种深度学习框架，降低终端用户的使用门槛，以及BladeDISC在阿里云业务中的应用。

Speaker

邱侠斐

阿里云计算有限公司高级技术专家

2022-07-30

18:00-18:00

茶歇

2022-07-31

09:00 -12:00

Keynote

2022-07-31

10:15-10:30

POWER TUNA

TUNA 是维护清华大学开源软件镜像站的技术力量，也是国内最大的高校开源社区之一。TUNA 与国内外开源社区长期合作，共同举办软件自由日、Release Party 等公众活动，致力于推广开源理念、普及基础知识，为中国开源事业贡献力量。本次分享将回顾 TUNA 的历史与发展，讲述 TUNA 在高校开源社区建设过程中作出的尝试。

Speaker

刘一芃

TUNA 会长

2022-07-31

11:00-12:00

来自 Z 世代的开源新生力量

“开源之夏”活动在众多的开源项目与高校学生之间搭建起来一座桥梁。在开源项目导师的指导下，通过暑期项目实践的方式为高校的同学们提供了一个了解并参与开源社群的好机会。同学们通过参与“开源之夏”的活动，不但可以丰富项目实践经验，提升项目开发的技能，而且可以与开源项目的开发者进行深度的交流，以一种最直接的方式了解开源，深入开源，为后续的学业提升提供方向参考，职业发展积累人脉；同时在这一过程中也推动开源及开源项目在高校群体中的普及和推广/布道，今年，开源之夏已进入了第三个年头，几百个项目任务在学生和导师的共同努力下有了极具价值的产出。今天我们邀请到了开源之夏的同学与我们一同分享大家参与开源之夏的点滴故事，希望能对Z世代的开发者有所启发，激励更多年轻的开发者们参与到开源项目开发活动。

Speaker

李梦

中科院软件所运营

陈意昊

学生

谢其骏

学生

张俊杰

学生

黄章衡

学生

2022-07-31

10:30-11:00

实时数仓 Apache Doris 发展历程和技术解析

Apache Doris 是近年来新兴的开源实时数仓项目，具有高性能、场景支持完善、易于使用、易于运维等特点。本文将总体介绍该项目在百度内部以及开源之后的发展历程，并深入解析其核心特性和技术实现，希望能帮助更多人认识、使用和参与这一优秀的项目。

Speaker

张东进

百度架构师

2022-07-31

10:00-10:15

在开放社群中推动教育公益和青年发展

演讲者将和听众分享开源公益社区 freeCodeCamp 如何让编程教育普惠更多人，如何与广大高校、企业、公益学校和社区协作在国内推动免费的编程教育，以及如何以开源技术社区为起点，联结科技、人文、艺术、环保等不同领域的朋友营造“开源市集”这一开放社群，通过深度交流、协作共创点亮更多有趣的灵魂，为社会创造价值。

Speaker

刘于瑜

freeCodeCamp.org中文社区大使

2022-07-31

09:45-10:00

开源社区是最好的社会大学

高校学生是开源软件世界不容忽视的用户群体，也是最被期望转化为贡献者的群体。现代开源开发为我们提供了一个大规模协作和社交化编程的场景，而同学们在迈进去之前，常常面临着语言、时区和心理上的陌生感等障碍。这场分享通过我个人在读研期间参与开源社区的经历和心路历程，希望能激励、帮助和启发希望或正在参与开源的高校同学们。

Speaker

夏小雅

博士

2022-07-31

09:30-09:45

数解开源

开源带来的不仅是开放的代码协作，更是产生了海量的开放数据。本次分享将带大家近距离走进开源相关的大数据，并理解如何使用这些数据更好的服务开源协作，助力开源项目快速成长。

Speaker

赵生宇

博士研究生

2022-07-31

09:00-09:30

Learning the Apache Way through the ASF Incubator

What is the Apache Foundation Apache, and what is the Apache Way? In this talk, I’ll introduce the ASF and the Apache way and explain how the Apache Incubator teaches it to new projects at the ASF Incubator. If you are new to ASF, you may think that some ways we do things are strange or unnecessary. Understanding the Apache Way helps you comprehend why certain things happen the way they do in ASF projects and how you create a long-lived community around a project.

Speaker

Justin Mclean

InstaclustrVice President

2022-07-31

13:30 -15:30

Workflow/DataProcessing | 工作流/数据处理

2022-07-31

13:30-14:10

如何在银行当中让更多人受益数据处理

1.backgrand(背景) 随着业务发展，业务深入，数据用户越来越多，从原来的高层，分析师至一线业务人员，如何快速的获取数据分析结果对业务发展越来越重要，通过提供简化的数据加工流程/数据分析工作对一线业务分析人员至关重要,通过优化的数据处理工具/流程协助非技术用户获取业务分析数据并自动化嵌入至数据处理流程中，通过配置化简化数据使用成本。 2.purpose(目标) 让业务用户几乎无代码实现部分数据加工处理逻辑。 3.method(方法) 适当的对现有的工具、工具组合进行合理的封装，仅保留核心流程相关的逻辑处理，其他技术相关的处理流程由系统预设进行处理，从而简化用户操作流程。封装基于现有调度系统，系统调度系统提供的环境、数据源管理以及任务调度功能进行合理融合。 4.design and implementation(设计与实现) 从数据处理来讲，数据采集、加工、分发三个步骤中，加工最为复杂，但是采集、分发均可采用配置处理方式进行简化优化处理。合理的数据模型设计，将数据模型加工到合理的数据粒度，再通过自动化编码工具处理维度建模后续内容，从而实现数据加工流程的简化，简化数据模型至报表、可视化分析的数据鸿沟。在此过程中，尤其从模型到数据应用层由用户参与，因此需要工作流系统合理支持配置化作业生产，并自动嵌入至已有工作流进行日常调度执行。 5.benfit and impaction(收益与影响) 数据处理人员更多关注数据处理核心流程、模型，完成后在模型工具的协助下，向用户交互模型数据。用户可以在模型数据的基础上实现自助分析，数据分发等。 6.question(问答)

Speaker

陈卫

四川新网银行大数据架构师

2022-07-31

14:10-14:50

Apache Linkis 数据处理实践

Linkis背景介绍 Linkis 在上层应用和底层引擎之间构建了一层计算中间件。通过使用Linkis 提供的REST/WebSocket/JDBC 等标准接口，上层应用可以方便地连接访问Spark, Presto, Flink 等底层引擎,同时实现跨引擎上下文共享、统一的计算任务和引擎治理与编排能力。关于Linkis 数据处理实战，我主要分享两方面，一方面关于元数据，另一方面关于计算任务。元数据元数据划分为三类：数据字典、数据血缘和数据特征，Linkis基于Linkis DataSource和Apache Atlas 两种服务为数据资产提供元数据管理能力，DataSource 业务边界因为WeDataSphere社区很多开源工具（Scriptis\Visulalis\Exchangis\Streamis）都会用到数据源，缺乏统一管理能力，而且用户需要在不同的产品反复多次设定数据源，我们希望通过提供统一的数据源管理服务，一次设置可以多处使用。Atlas 是一组可扩展和可扩展的核心基础治理服务，Linkis EngineConn (引擎连接器) 基于Atlas Hook 做了整合，执行计算涉及到数据信息，特征，血缘采集到Atlas中，供上游数据资产使用。计算任务 dolphinscheduler 拉起Linkis 计算任务，dolphinscheduler Shell 任务类型通过LinkisDolphinSchedulerClient 配置相关参数，拉起相关任务。小结到此Linkis 数据处理的整体链路，涉及到元数据，调度任务，形成完整闭环。

Speaker

李孟

上海仙翁科技数据架构

2022-07-31

13:30 -18:10

Messaging B

2022-07-31

13:30-14:10

When failure is not an option.

Developing a highly-available application requires more than just utilizing fault-tolerant services such as Apache Pulsar in your software stack. It also requires immediate failure detection and resolution including built-in failover when there are data center outages. Up until now, Pulsar clients could only interact with a single Pulsar cluster and were unable to detect and respond to a cluster-level failure event. In the event of a complete cluster failure, these clients cannot reroute their messages to a secondary/standby cluster automatically. With the release of Pulsar 2.10, this much-needed automated cluster failover capability has been added to the Pulsar client libraries. In this talk, I will walk you through the changes you need to make inside your application code to take advantage of this new capability.

Speaker

Kjerrumgaard, David

StreamNativeDeveloper Advocate

2022-07-31

14:10-14:50

用傲腾持久内存打造极速Pulsar

英特尔傲腾持久内存是一种革命性的内存产品，它融合了高性能，大容量，持久化等特点，获得了业界的一致好评。特别是在极致性能要求的场合，傲腾持久内存能充分发挥其优势。Pulsar是一个流行的消息队列软件，广泛应用在很多领域。在本演讲中，我们将介绍如何使用傲腾持久内存帮助进一步提升Pulsar的吞吐率，降低延迟，有效应该i对性能要求苛刻的场景。

Speaker

胡风华

英特尔公司傲腾产品事业部云软件架构师

2022-07-31

14:50-15:30

How Key_Shared subscription works

Key_Shared订阅模式在Pulsar-2.4就已支持，经过多个版本的迭代后，现已稳定。本次分享，将从设计到实现，深入剖析Key_Shared的原理。

Speaker

郭吉伟

StreamNative软件工程师

2022-07-31

15:30-16:10

Deep Dive into Apache Pulsar Transaction - How it works and notes for it.

Apache Pulsar transaction 是 Apache Pulsar 2.9.0 开始支持的重要功能。经过多次更新迭代，Pulsar Transaction目前已经基本稳定，而且已经被很多公司采用并投入到生产环境中使用。为了让更多的人可以了解并且使用 Pulsar Transaction，作者会在这次演讲中简单介绍 transaction 的基本概念和实现原理，并且会详细的介绍如何去使用 transaction，以及在使用 transaction的过程中有哪些需要注意的点。在介绍完 transaction 的基础概念和实现原理之后，观众们应该会对 transaction 有一个大概的了解。在此之后，作者会介绍如何去开启并配置 transaction，会详细的介绍和 transaction 相关的各个配置项的作用。然后为大家讲述如何使用 transaction 去发送和确认消息，并介绍在使用 transaction 过程中需要注意的事项。之后会介绍在未来可能会增加的一些优化，让大家对于 transaction 的发展有清晰的认识。最后，作者会介绍如何用运维工具去查看和transaction 相关的状态信息，以及如何去测试 transaction 的性能。

Speaker

孟祥迎

StreamNativePlatform Engineer

2022-07-31

16:10-16:50

Apache Pulsar 中 Load Manager 的工作原理与实践

Apache Pulsar 是一个可水平伸缩的消息系统，因此我们必须保证集群中 Broker 的负载尽可能均衡, 而 Load Manager 就是负责该功能的组件，本演讲将介绍 Load Manager 的基本工作原理，以及如何在 Pulsar 中使用和管理 Load Manager。

Speaker

王锴

StreamNative软件工程师

2022-07-31

16:50-17:30

移动云RabbitMQ消息队列在Openstack超大规模应用中的实践

RabbitMQ作为消息队列中间件，已经在中国移动内部诸多业务系统（电子商务平台、交易平台和门户管理平台）中被广泛使用和落地，尤其是面向云计算基础设施的OpenStack领域。RabbitMQ是OpenStack平台中不可或缺的消息通信基础设施组件，为其内部的Nova、Cinder和Neturon组件提供消息流转和异步解耦能力。随着，OpenStack在中国移动公有云端的部署规模越来越大（部署在单AZ可用区的计算节点超过5000个规模），开源RabbitMQ在脑裂故障问题、消息积压、集群弹性扩容、提升服务器资源利用率等方面，都面临着诸多的挑战。胡宗棠作为本次议题分享的嘉宾，将主要从以下几个方面来介绍移动云自研RabbitMQ消息队列在移动云端OpenStack平台中的实践与应用。（1）OpenStack接入开源RabbitMQ的问题；（2）移动云RabbitMQ消息队列介绍；（3）移动云自研RabbitMQ的设计与实践；（4）移动云自研RabbitMQ的技术演进与未来展望；

Speaker

胡宗棠

中国移动云能力中心技术专家

2022-07-31

13:30 -17:30

Community

2022-07-31

13:30-14:10

From Zero to 1.5k Stars: Launching a New Open Source Project

In this session, Maxim Wheatley (Head of Business and Marketing) will share lessons learned and practical tips from the open source launch of DevLake, an open source DevOps platform that grew to over 1.5K stars on GitHub in under two months! In this talk, attendees will learn about all the key details, from crafting an effective GitHub page and ReadMe, to strategic marketing and community efforts to bring massive attention to the project.

Speaker

Wheatley, Maxim

MericoHead of Business and Marketing

2022-07-31

14:10-14:50

从运维到 Tech-Writer，如何理解文档对于开源软件的重要性

在本次演讲中，韩飞将为大家介绍了一个开源新手如何完成从运维到 Tech-Writer 的转变。并且将从他自身出发，为大家讲解为什么要从事开源社区的技术文档写作，以及开源软件文档的重要性，并且从专业角度讲解非技术类同学如何参与开源软件文档的贡献。从运维到技术写作的转变，其实更能理解为什么开源软件需要优秀的文档，在日常运维工作当中，会接触到大量的开源软件，如何快速上手并且学习这些开源软件是非常重要的，那么对于韩飞来说，是什么原因让他想要从事技术文档写作？开源软件的文档不重要？想要知道什么功能只看源码不好吗？相信很多开发人员都会有这些想法，那么站在运维和技术写作的角度的韩飞，如何看待这一问题呢？很多人听过开源，也参与过开源，但是很多人会有这样的误区，开源软件是开发人员才能参与的，非开发人员是无法参与开源社区的。这个误区导致很多人望开源而止步。那么在本次演讲中，韩飞将从文档角度为着重大家介绍如何参与开源社区，并大家介绍以非代码方式为开源社区做贡献。

Speaker

韩飞

Apache APISIX技术文档工程师

2022-07-31

14:50-15:30

开源网站信息架构的攻守之道

近几年，全球新一轮科技革命和产业变革深入发展，开源行业迎来新的发展机遇。随着开源被纳入「十四五规划」，开源行业的春天纷至沓来。与传统行业不同，开源行业因其能够集众智采众长、创新频迭代快等特点，使用户对信息体验的要求愈高。开源网站是提供信息体验的主要窗口，为了优化内容体验，如何从产品视角来设计网站的信息架构？为了夯实内容品质，如何衡量网站信息架构的质量？在开发网站信息架构的旅程中，有哪些固有思维需要避免？有哪些经验教训可供参考？有哪些最佳实践值得遵循？有哪些个人思考可以探讨？本次分享以Apache 软件基金会的顶级开源项目—— Apache Pulsar 为例，为您揭晓以上问题的答案。

Speaker

Yu Liu

StreamNativeInformation Architect

2022-07-31

15:30-16:10

Building Super-Contributors in Alluxio Open Source Community

The lack of community engagement is one of the most significant barriers to the survival of open source projects. In Alluxio (alluxio.io) open source community, we experimented with different approaches to nurture the community. In this talk, Bin Fan will share the story and our findings of engaging the Alluxio community over the past six years. For example, rather than simply point-scoring and giving badges, introducing gamification turns very effective to understanding and influencing human behaviors. There is a delicate balance of triggers, ability, and motivation to find the “happy path” for contributors – the perfect amount of challenge and competition to keep them interested while preventing boredom. He will also discuss other pillars of community building (e.g., localization) and how to bring together the different pillars to build an everlasting and vibrant community. With innovative techniques, open source projects can create deeper engagements and turn ordinary community members into super-contributors.

Speaker

Wang, Jasmine

Alluxio, Inc.Community Manager

2022-07-31

16:10-16:50

Apache APISIX 的社区运营实践分享

如何帮助社区扩大影响力，搭建正循环的社区生态一直都是大家思考的问题。对于 Jing 和 Reese 来说，作为开源社区的初体验者和非代码贡献者，能为Apache APISIX 做什么，是成长道路上最大的困惑和必经的思考。在帮助 Apache APISIX 做社区运营过程中，Jing 和 Reese 踩过不少坑。比如：开源社区和商业化界限如何把握？与第三方利益冲突怎么协调？在做 Summit/ Meetup 的路上，遇到了哪些自己从未遇见的问题？开源社区之间的活动要如何让“双方”都获益；免费的刷脸能用多久，刷完脸之后我们还能做什么；APISIX 社区生态建设工作里到底要怎么和其他开源社区如何沟通、合作才能更加容易被对方接受？对比其他 KPI 式开源运营模式，我们是如何做的：避免运营人员过度参与到开源项目协作过程中，尤其是“过度修饰性”。在每一次的活动里，都要不断摸索和前进。

Speaker

Jing Li

Apache APISIX 社区Apache APISIX Contributor/ API7.ai 用户增长团队

Reese Liu

Apache APISIX 社区 Apache APISIX Contributor / API7.ai 用户增长团队

2022-07-31

17:30-17:30

茶歇

2022-07-31

13:30 -17:30

Big Data B

2022-07-31

13:30-14:10

BIGTOP 3.0 with the upgraded Mpack: New era of BigData Distribution

Apache Bigtop is a top-level Apache project for Infrastructure Engineers and Data Scientists who are looking for building their own Bigdata Distros stack from ground up. New Bigtop 3.0, the first release based on Hadoop 3, supported Linux distributions including Ubuntu, Debian, CentOS and Fedora are also updated. The new upgraded Mpack integrated in Bigotp 3.0 which makes Bigdata deployment and management simpler and more flexible with a click of button. This introductory session will provide the overview of new Bigtop 3.0. The speaker will give an in-depth perspective into the new upgraded Mpacks architecture, and he also will talk about the lessons learned so far like migrating compatibility issues and dependency conflicts in the ecosystem.

Speaker

Yuqi Gu（顾煜祺）

Arm ChinaSenior Software Engineer

2022-07-31

14:10-14:50

Large scale migration to Parquet in Uber

Parquet is the core file format in Uber's big data stack. It is a prerequisite for many key initiatives like column level encryption, column pruning etc. in Uber's data org. However, there are nearly 20,000 existing Hive tables still using other formats. It is inefficient and error-prone to migrate them using the traditional SELECT-INSERT method. We tackled the challenge with a 3-pillared solution: - High throughput rewriter - Mixed format partition support in Hive and Spark query engines - Reliable ETL pipeline conversion The solution can be a reference for anyone needs to migrate massive amounts of Hive tables and data files to Parquet format

Speaker

Huicheng Song

UberSenior Software Engineer

2022-07-31

14:50-15:30

Spark在小米的应用实践

1.Spark在小米的技术演进 2.Spark Multiple Catalog实践 3.Spark数据血缘收集实践 4.Spark可观测性指标建设

Speaker

王准

小米研发工程师

2022-07-31

15:30-16:10

Linkis 如何为多样化的大数据计算存储引擎提供计算治理能力

Apache Linkis(incubating)在上层应用程序和底层引擎之间构建了一层计算中间件，提供连通、管控、扩展、编排等计算治理能力。在本次演讲中，邸帅将为大家介绍，Linkis 如何为更高效、低成本的连通多样化的计算存储引擎提供便利（如Python、Spark、Flink 等），以及如何解决多租户隔离、高并发、新引擎适配、高危操作管控等计算治理问题。同时，邸帅将展示Linkis 社区当前正在进行的一些令人期待的工作，和Linkis 的后续规划。

Speaker

邸帅

微众银行Apache Linkis(incubating) PPMC，微众银行大数据平台负责人，ALC深圳成员

2022-07-31

16:10-16:50

基于 Apache Calcite 的多引擎虚拟列技术

在数据分析领域，同一个业务指标可能需要针对不同的引擎写不同的SQL，编写的人需要处理不同引擎的语法差异，学习和使用成本较高。当指标口径变更时，需要通知到所有使用该指标的业务方进行相应的变更，沟通成本较高。字节跳动基于Apache Calcite设计并实现了多引擎的虚拟列，来解决这一问题。字节跳动基于 Apache Calcite进行了深度定制，项目名称为 ByteQuery。ByteQuery 基于 Calcite 设计并实现了 ANSI SQL 2011 标准语法，对外提供跨引擎的统一 OLAP 查询服务。ByteQuery 在 Calcite 的基础上扩展了虚拟列的 DDL 语法，比如add/drop virtual columns语法；ByteQuery 针对不同 OLAP 引擎，自动完成虚拟列的逻辑改写和引擎对接，降低用户使用成本。通过虚拟列技术，业务方只需要将常用的业务指标保存为虚拟列即可和普通列一样使用。这种方案有着如下的优点: - 虚拟列不占用额外的存储空间，降低存储成本 - 无需用户了解不同引擎的语法差异，ByteQuery 自动适配多引擎，降低使用成本 - 无需周知指标的使用方，方便指标统一管理，降低指标维护成本 - 相比于 View，提供 Hive 实体Table 一致的使用体验 - 可以在数据加密/脱敏等多种应用场景落地，例如明文字段配置高权限级别，添加脱敏的虚拟列字段并配置低权限级别，提供灵活的使用方式

Speaker

谢佳君

字节跳动高级研发工程师

2022-07-31

16:50-17:30

大数据Python生态在传智教育的实践和思考

Chinese Abstract 在大数据和人工智能领域，Python占比越来越高，其中Spark3.0的版本升级中曾经提到Python语言现在是Spark使用最广泛的语言之一，Python语法的简洁易学，为企业快速解决大数据生态问题提供了强有力支持。大数据的Python生态主要目标亮点：①将大数据的计算能力输出给Python用户，通过为大数据组件提供一系列的Python API，方便Python语言比较熟悉的用户开发大数据作业，如PySpark，PyFlink等 ②将Python生态基于大数据存储和计算进行分布式化，使用Python库的API，但是底层计算引擎使用大数据计算引擎，如TensorFlow On Flink，SparkTorch，Flink-onnx-Pytorch。在这个讲座中，我们将讨论传智教育在大数据Python生态针对实时推荐业务线的最佳实践和思考，项目从Pulsar中实时读取数据，由于需要用到Alink机器学习库构建离线特征、在线状态训练与更新模型及通过对应推荐算法提供召回和排序服务，该包更好支持Python语言，故也选择了基于流批一体架构的PyFlink进行数据处理与统计分析，构建用户画像平台；此外我们还将讨论实时推荐业务模块根据线上反馈数据进行在线学习，实时快速进行模型调整，形成闭环系统。最后，我们将讨论一个大数据Python生态完整推荐系统最优解决方案，以将上述的内容都呈现在实践中。

Speaker

张敬存

江苏传智教育科技有限公司资深研究员

赵晨杰

江苏传智教育科技有限公司资深研究员

2022-07-31

13:30 -18:10

Big Data A

2022-07-31

17:30-18:10

apache doris (incubating) flink connector 2.0

apache doris (incubating) flink connector 2.0的实现、如何支持EOS语义以及减少flink内存占用。

Speaker

杨勇强

SelectDB产品vp

2022-07-31

13:30-14:10

Disaster Recovery in Apache Ozone

Apache ozone is a distributed, scalable and a high performance object store. It provides Object Store semantics (like Amazon S3) and also has a Hadoop compatible filesystem layer implemented above it and can handle billions of objects. Handling such large scale distributed system clusters is often prone to data accidents like hardware failure, human errors or natural disasters. This talk will deep dive into the solutions implemented in Apache Ozone such as container replication, high availability, trash feature and as to how a user can replicate the data across clusters. The talk would also discuss the future roadmap and the enhancements being done to improve the data replication across Ozone clusters.

Speaker

Sadanand Shenoy

ClouderaSoftware Engineer ||

Rakesh Radhakrishnan

ClouderaStaff Software Engineer

2022-07-31

14:10-14:50

更易用、更强劲的大数据分析平台 -- Kylin 5.0 社区路线一览

2021 年 9 月份，Apache Kylin 社区发布了 Kylin 的新一代产品 Kylin 4.0，使用 Parquet 替代 HBase 使得架构更加精简，实现了存算分离；目前在社区已经有很多用户选择升级到新的版本，获得了良好的使用体验。与此同时，Kylin 社区也在酝酿着下一代 Kylin，一方面优化 Kylin 的预计算引擎和模型元数据，大大提升模型设计的灵活性；另一方面积极拥抱最新的技术潮流，从 Native engine、微服务等角度重新设计和实现 Kylin 的计算引擎和部署架构。在这次分享中，将会由 Kylin 社区的 PMC Member 俞霄翔来为大家带来下一代 Kylin 技术的前瞻性介绍和使用演示。议程如下： 1. Kylin 技术的介绍 2. 下一代 Kylin 的路线图介绍 - 灵活的元数据设计 - Native Engine - 微服务和云原生 3. 技术预览版使用演示

Speaker

俞霄翔

Kyligence高级研发工程师

2022-07-31

14:50-15:30

Flink Table Store：流式数仓架构与场景

Flink Table Store 是一个为流式数仓打造的流批统一的存储，用于在Flink中为流批处理建立动态表，支持实时流消费和实时OLAP查询。 Flink Table Store 已经发布了第一个前瞻版本，但是缺少了生态和稳定性的不少工作。目前我们已经开始研发第二个版本，我们希望第二个版本能够带来更多的生产能力，通过此次分享你可以了解到我们通过加强哪些方面来提高存储的可靠性和生态。另外，我也会分享后续的架构，Service版本，它如何达成统一的流仓的存储，它又解锁了哪些场景。

Speaker

李劲松

阿里巴巴技术专家

2022-07-31

15:30-16:10

开源大数据Studio: Dolphinscheduler + Notebook

对于大数据工程师来说，大数据作业的开发和调度通常是在不同的环境中进行的。需要在IDE中完成作业开发、调试后，再将代码copy paste或打包到调度工具中进行调度。一方面影响了开发效率，另一方面由于环境的差异导致调度时可能产生难以预知的问题。本演讲将介绍并演示如何采用开源的Apache Dolphinscheduler调度工具和Apache Zeppelin以及Jupyter两种Notebook组成大数据开发Studio。数据平台团队适配好相关环境后，大数据/AI工程师在线交互式开发/debug，并进行一键调度，无需再花费时间处理由于环境不一致导致的适配问题，极大地提高了大数据作业到开发效率和体验。演讲中所涉及的组件间整合代码已完全开源，欢迎下载体验。

Speaker

高楚枫

阿里云EMR数据开发团队基础平台开发工程师

2022-07-31

16:10-16:50

虎牙基于图数据的应用元数据平台实践

现有公司平台目前存在诸多痛点：各个平台只有基础而割裂的元数据，没有建立关联关系；元数据之间的关系没有一个有效的分析和应用；缺乏”千人千面”的标签能力，无法对应用/资源/域名等元数据灵活打标。公司迫切的需要一个能够提炼和分析各种关系数据的一个元数据平台；在此需求背景下，我们选择了百度开源的HugeGraph图数据,并做了深度的集成和二次开发，创建了虎牙应用元数据平台：构建虎牙全网应用和资源的动态关系网络; 并在此基础上结合AI、大数据进行智能检索和分析，横向打通应用和资源的计量关系，纵向建设应用架构合理性、应用标签等智能分析的一站式可视化元数据管理分析平台。本次分享，聚焦于HugeGraph图数据在虎牙应用元数据项目中的具体实践以及案例分享。分享内容包括：应用元数据的需求背景及解法、应用元数据平台的技术方案选型、图数据库的二次开发及深度集成、图数据库在应用元数据项目中的业务实践分享等。

Speaker

邹磊

虎牙直播信息科技有限公司高级工程师

2022-07-31

16:50-17:30

Tales at Scale: Analytics at 1000 QPS and Beyond

How do you build and operate systems that can ingest millions of events per second, store petabytes of historical data, and run thousands of queries per second, all at subsecond response times? It’s not easy, but it has been accomplished using the right mix of compute-storage design, scatter/gather query engines, and cluster management. Gian Merlino, Apache Druid® committer and co-founder of Imply will share tales of scale, showing how high-performance systems for interactive data conversations with high concurrency and low latency combining stream and batch data are built and used today. - How to build and operate systems that can ingest millions of events, store petabytes of historical data, and run thousands of queries - What’s the right mix of compute-storage design, scatter/gather query engines, and cluster management - How Apache Druid delivers high-performance for interactive data conversations with high concurrency and low latency, for both streaming and batch data

Speaker

Merlino, Gian

ImplyCo-Founder and Chief Technology Officer

2022-07-31

14:00 -17:30

API/Microservice

2022-07-31

14:10-14:50

Foreseeing API development trends from the cloud-native gateway Apache APISIX

The API gateway is the synchronized and unified management of the API interface service capabilities provided by all microservices. The gateway can then implement general requirements for API interfaces such as Load Balancing, Dynamic Upstream, Canary Release, Circuit Breaking, Authentication, Observability, etc. As a technology product born with cloud-native features, it is closely related to API interfaces but a little different. By analyzing these two concepts, we may be able to foresee the future API development trend.

Speaker

Jiang Chenwei

Apache APISIXServer-side Development Engineer

2022-07-31

15:30-16:10

Starting Apache APISIX on the right foot

It has been said that data are the new oil. The de facto standard is HTTP APIs to connect oil pipelines between heterogeneous systems, whether plain, RESTful or REST. But no company just exposes their information system to the outside. You need a central unified entry-point to handle cross-cutting concerns: classic ones like authentication and IP blacklisting and API-related ones like rate limiting and canary release. In this workshop, we will use Apache APISIX on Docker to show a couple of nifty features that can help your information system cope with the challenges introduced by APIs. Routing your calls to the correct upstream Available abstractions: Route, Upstream, Service The Apache APISIX dashboard Configuring APISIX with the dashboard Configuring APISIX with the command-line Monitoring APISIX Low-code Plugin Orchestration

Speaker

Umurzokov, Bobur

Apache APISIXDeveloper Advocate

Das, Ayush

Apache APISIXContributor to Apache APISIX

2022-07-31

16:10-16:50

Protecting your APIs with Apache APISIX

Protecting your APIs is no easy task. As the size of your APIs increase, so does the overhead that comes with adding security to each of your endpoints. A solution is to setup an API gateway like Apache APISIX, which accepts all the requests to your APIs and in turn calls the various APIs required to fulfil the request and return the appropriate response. APISIX can also protect your APIs by being the central system that handles authentication, authorization, rate-limiting, logging and monitoring. In this talk, Navendu introduces Apache APISIX and how development teams can leverage them to easily build secure APIs.

Speaker

Navendu, Pottekkat

API7.aiDeveloper Advocate

2022-07-31

16:50-17:30

Chopping the monolith

Micro services are ubiquitous. However, most companies that implement micro services do not reap their full benefits - at best. At worst, it’s an epic failure. There are reasons for micro services: independent deployment of business capabilities. However, the unspoken assumption is that you need to deploy all capabilities all the time. My experience has shown me that it’s plain wrong. Some capabilities need frequent deployment, while some are much more stable. In “the past”, we used Rule Engines to allow updating business rules without deployment. While it solved the problem, this approach had issues. Between introducing a Rule Engine and migrating your complete system to micro services, I believe that there’s a middle path, and that this path is Function-as-a-Service. In this talk, I’ll detail every point I’ve made above, and show how one can use Serverless to pragmatically design a system that allows deploying as often as you need.

Speaker

Fränkel Nicolas

Apache APISIXHead of Developer Advocacy

2022-07-31

14:00 -16:00

WebServer/Tomcat

2022-07-31

14:00-14:40

Jakarta EE

Apache Tomcat implements the Jakarta Servlet, Jakarta Pages, Jakarta Expression Language, Jakarta WebSocket and Jakarta Authentication specifications. Jakarta EE 8 and Jakarta EE 9 were transition releases that did not provide new features to end users. The recently released Jakarta EE 10 is the first release to include new features and other changes. This session will look at the changes in Jakarta EE 10 for the specifications that Apache Tomact implements and what these changes mean for developers looking to deploying their on Apache Tomact 10.1. Planning for Jakarta EE 11 is now starting so the second part of the session will look at the changes planned for Jakarta EE 11 and provide an opportunity for attendees to provide input into that planning process.

Speaker

Thomas, Mark

VMwareStaff Engineer

2022-07-31

14:40-15:20

基于 dubbo-go-pixiu 的 dubbo mesh

主要分享 Dubbo-go在mesh 领域中 proxyless service-mesh 方向的探索。类似与sidecar模式，proxyless通过在dubbo-go框架内置xDS协议的支持，直接与Istiod控制面组件交互，实现在istio mesh环境下，服务注册发现、流量治理等方面功能。proxyless mesh方案在考虑大规模服务网格中的资源效率、时间敏感应用代理层时延影响等一些场景中，具有明显的优越性。在外部流量接入方面，Dubbo-go-pixiu 做为dubbo gateway实现http to dubbo的协议转换，接入istio 后可做为Ingress gateway，实现从外部访问mesh环境中dubbo服务的能力。

Speaker

麻志辉

dubbo-go-pixiu社区dubbo-go-pixiu committer

2022-07-31

15:20-16:00

How we use and optimize Tomcat at Alibaba

Alibaba has one of the largest number of applications that using Tomcat as web server. In this topic, I will share how we use Tomcat in a production ready environment and how Alibaba optimize Tomcat in terms of additional security and performance.

Speaker

Huxing Zhang

Alibaba CloudStaff Engineer

2022-07-31

13:30 -18:10

Messaging A

2022-07-31

14:10-14:50

Make Apache Pulsar as Lakehouse: Introduction Lakehouse Tiered Storage Integration for Pulsar

Apache Pulsar is a message bus to cache data and decouples between different systems. To support long-term topic data storage, we introduced tiered storage to offload cold data into tiered storage such as GCS, S3, HDFS, etc. However, current offloaded data is organized by Pulsar and is not open-format, which is a raw data format, and only Pulsar can access the data. It is hard to integrate with other significant data components, such as Presto, Flink SQL, and Spark SQL. To solve the problems we faced, we introduced Lakehouse storage lib to manage offload data and integrate with the current topic cold data offload mechanism. With Lakehouse storage lib support, we can use all features provided by Lakehouse, such as transaction support, Schema enforcement, governance, and BI support. We will read data from BookKeeper or tiered storage according to data location for streaming data reading. Due to Lakehouse's open storage format, we can support all kinds of ecosystems sustained by Lakehouse to read data. To support streaming offload and make the offloading mechanism more scalable, we introduce an offload by reader mechanism to read data from the topic and write into tiered storage. Moreover, we also can provide a compaction service backend by offloader and act the topic as a table. Each update operation for a key is transformed as an upsert action to the table.

Speaker

陈航

StreamNativeEngineer

2022-07-31

13:30-14:10

How Tencent Applies Apache Pulsar to Apache InLong

1 What is InLong? 2 How is Pulsar applied in InLong? 3 What optimizations have been made for Pulsar？ 4 The Practice Case

Speaker

LinChen

TencentBig data development engineer

2022-07-31

14:50-15:30

Another Way for Intergrating Pulsar and Lakehouse: Connector for Sinking Pulsar Topic Data into the Lakehouse Storage

Pulsar mainly focuses on the streaming process. But the streaming data will be used to analyze or something else. Because we are using different systems to handle different cases, data storage will be a problem in many cases. Introduce the lakehouse connector for sinking pulsar topic data into the lakehouse storage to unify the storage.

Speaker

Yong Zhang

StreamNativeSoftware Engineer

2022-07-31

15:30-16:10

Build one Real-time Data Warehouse Based on Apache Pulsar

Introduce the Apache Pulsar storage separation architecture and hierarchical storage mode, and the construction ideas of real-time data warehouse.

Speaker

Dezhi Liu

StreamNativesolution expert

2022-07-31

16:10-16:50

Apache Pulsar's Authentication and Authorization Practices for Clusters and Cloud

数据安全已经成为企业的重要竞争力。众所周知，Apache Pulsar 支持大量的认证和鉴权插件，不同的公司，不同的组织也会根据自己的需要开发自己的认证和鉴权插件。那针对集群和云上环境有哪些一线实践可供参考呢？本次将针对 Apache Pulsar 的认证和授权实现，从集群和 Cloud 两个方向出发，分别讲述企业如何构建一个安全的 Apache Pulsar 集群和打造满足个性化要求的 Apache Pulsar 认证或授权插件，以及如何在 Cloud 上为用户提供统一的认证的授权框架以满足不同的认证和鉴权需求，提供良好的体验。

Speaker

傅腾

StreamNativeTechnical Support Engineer

2022-07-31

16:50-17:30

Intergrating Apache Pulsar with BigQuery to Build Data Pipeline

介绍BigQuery的功能以及使用场景，并介绍如何使用Pulsar BigQuery Connector把Pulsar中数据实时写入BigQuery当中进行分析。

Speaker

石宝迪

StreamNativePlatform Engineer

2022-07-31

17:30-18:10

Apache Pulsar 在open-telemetry-collector上的应用

OpenTelemetry生态在可观测性解决方案中已经颇为成熟，并且目前已经积累了一定量的用户，社区已经将Apache Pulsar集成到OpenTelemetry的生态环境中作为其可观测性底层支撑的一部分。在实际的可观测性场景中，数据采集端巨大的数据生产速率和相对缓慢消费速率往往是制约数据完整性的最大问题。本次演讲将向大家分享如何在open-telemetry-collector平台上使用Apache Pulsar receiver/exporter作为缓冲器来解决上述问题。

Speaker

Tao Jiuming

StreamNativeSoftware Engineer

2022-07-31

14:00 -18:00

RPC

本论坛聚焦高性能、易扩展的 RPC 框架及其实践案例分享。

2022-07-31

14:00-14:40

Dubbo协议向Dubbo3（Triple）协议的变迁

Apache Dubbo 3.x 版本目前已经成为Dubbo主要维护的版本，在这个版本中迎来了Dubbo新一代的RPC协议——Triple，Triple协议将补齐Dubbo协议的短板，推动Dubbo向云原生、跨平台的愿景靠拢。本次演讲将带大家简单的回顾Dubbo协议，并向大家介绍Dubbo向以上这些愿景靠拢时的阻碍，Triple的出现是如何扫除这些阻碍，Triple的优势在哪，以及目前Triple协议已经发展到什么程度，除此之外还将介绍Dubbo协议迁移到Triple协议的最佳实践以及Triple协议的未来。

Speaker

华钟明

杭州有赞科技有限公司中间件技术专家、Apache Dubbo PMC、Dapr member、Apache Tomcat Contributor

2022-07-31

14:40-15:20

Dubbo 3.0 应用级服务发现详解

随着云原生时代的到来，基础设施能力不断向上释放，像 Kubernetes 等平台都集成了微服务概念抽象。 Dubbo 2.x 中基于接口级粒度的服务发现模型和其他平台所抽象的基于应用级粒度的服务发现模型完全不一致，导致了 Dubbo 用户无法原生地接入到如 Kubernetes 的治理体系下。 Dubbo 3.0 中的应用级服务发现模型正是为了适应这样的架构应运而生的，应用在迁移到 Dubbo 3.0 的应用级服务发现模型以后，天然拥有对其其他生态的服务发现模型的能力。本次演讲将对 Dubbo 3.0 中应用级服务发现模型的设计理念、原理进行拆解分析。

Speaker

江河清

阿里云Java 研发工程师

2022-07-31

15:20-16:00

阿里本地生活Dubbo3实践分享

主要介绍阿里本地生活部署架构和遇到的问题，以及如何通过Dubbo3.0解决业务痛点和Dubbo3.0最终在本地生活的落地情况

Speaker

孙刚

阿里巴巴本地生活研发工程师

2022-07-31

16:00-16:40

Dubbo3 在钉钉文档云原生过程中的最佳实践

Dubbo 是一款微服务开发框架，它提供 RPC通信与微服务治理两大关键能力。Dubbo3 基于 Dubbo2 演进而来，在保持原有核心功能特性的同时， Dubbo3 在易用性、超大规模微服务实践、云原生基础设施适配等几大方向上进行了全面升级。钉钉文档是钉钉自研的在线协同文档套件，随着集团上云战略的推进，钉钉文档在云原生方面进行了大量实践。在这次演讲中，董建凯展示了如何依靠 Dubbo3 来解决钉钉文档云原生过程中遇到的一套代码多环境交付、云上云下互通、单元化路由等问题。

Speaker

董建凯

钉钉文档高级研发工程师

2022-07-31

16:40-17:20

dubbo-go配置设计

Dubbo Go 已经在众多企业落地事件，是 Dubbo 多语言体系的重要补充。在此次主题分享中，将主要阐述 Dubbo go 配置的设计原理与使用方式。一方面帮助入门用户快速掌握 Dubbo Go 开发，如本地文件配置、远程托管配置等；另一方面，通过深入讲解配置加载流程与工作原理，帮助开发者了解背后的设计哲学。

Speaker

赵云兴

dubbo-go软件工程师

2022-07-31

17:20-18:00

Dubbo-go 在生产环境中的落地和实践

国内非常多的用户使用 Dubbo 和 Spring Cloud 构建微服务，随着业务的发展和云原生的深入落地，多语言的微服务体系在企业内部是必然的趋势。本此分享主要介绍如何使用 Dubbo-go 构建低成本的、兼容原有服务治理体系的异构服务治理能力。

Speaker

王晓伟

Dubbo-go 社区commiter

2022-07-31

18:00-18:00

茶歇

Apache 在中国的成功故事

姜宁

华为Apache 软件基金会（ASF）2022 年度董事

吴晟

Tetrate 创始工程师

刘天栋

开源社理事

谭中意

Apache 软件基金会成员

江波

SegmentFault 思否运营合伙人兼 COO

顶级项目新创企业圆桌

黄向东

Apache Member

温铭

深圳支流科技的联合创始人&CEO

代立冬

易观大数据平台总监

张亮

SphereEx 创始人兼CEO

乔嘉林

IOT AND IIOT

聚焦开源基础软件，共同繁荣开源生态

任旭东

华为首席开源联络官

InnerSource and the Apache Way: How to learn Open Source

Danese Cooper

Chair of the InnerSourceCommons.org

Danese Cooper has been an outspoken Free and Open Source Software activist for more than 20 years. Over that time she has consistently worked for the health and welfare of the FOSS movement at jobs such as CTO of Wikipedia, Chief Open Source Evangelist for Sun, Senior Director of Open Source Strategy for Intel, and Board Member with the Drupal Association, the Open Hardware Foundation and the Open Source Initiative. Seven years ago while running PayPal's OSPO, Danese started thinking, talking, and writing about InnerSource as the logical next step to sustain the FOSS movement. Today Danese is Chair of the InnerSourceCommons.org, a US 501(c)3 non-profit. She still consults on Open Source (and InnerSource) via DaneseWorks, Ltd and lives in Western Ireland.

What We All Need To Do Together To Secure The Open Source Software Supply Chain

Brian Behlendorf

Open Source Security FoundationGeneral Manager

祝贺 Apachecon Asia(亚洲)在线会议召开

陆首群

教授

AGE's Journey to Graduation

Abdisho, Eya

AgeDBTechnical Engineer

Gemignani, John

AgeDBLead Software Engineer

Innis, Josh

AgeDBSenior Software Engineer

How does Apache Pegasus (incubating) community develop at SensorsData

YingChun, Lai

Sensors DataSoftware Engineer

Dan, Wang

Sensors DataSoftware Engineer

为什么你应该选择 Apache Incubator

陈梓立

无无

Apache ShenYu网关的前世今生

肖宇

京东科技架构师

The Practice of Apache DolphinScheduler as a unified scheduler center in Lenovo

Gang Li

LenovoData Architecture Optimization Engineer

Apache Oozie 的深度实践

张俊帆

爱奇艺大数据工程师

apache dolphinscheduler调度在联通大数据的二次开发与实践

刘武

联通数字科技有限公司数据开发工程师

基于Apache Flink的流批一体在京东物流的实践

康琪

京东技术专家

基于FlinkSQL的小米实时数据集成实践

胡焕

小米集团高级软件工程师

大规模集群下的 Apache Flink 稳定性优化实践

邱从贤

腾讯科技有限公司高级开发工程师

基于Apache Flink的实时计算数据流框架在京东零售业务的实践和落地

张颖

京东算法工程师

闫莉刚

京东资深技术专家

腾讯广告 Flink 实战：特征生产、训练样本、策略计算

林立伟

腾讯科技（北京）有限公司腾讯广告特征生产、样本数据、策略框架技术负责人

基于Apache Flink的金山云实时计算平台实践与防疫场景下的应用

郑舒力

金山云研发专家

Apache EventMesh事件驱动分布式多运行时

薛炜明

深圳前海微众银行股份有限公司中间件平台开发工程师

Apache ShardingSphere Scaling解析

钟红胜

SphereEx数据库中间件开发工程师

Apache Kvrocks(Incubating) 设计与实现

林添毅

AfterShip技术经理

Apache Zookeeper and Apache Curator Meet the Dining Philosophers

Paul Brebner

InstaclustrChief Technology Evangelist

拥抱云原生，基于 Kubernetes 的 ShardingSphere 云化改造

李卓

SphereEx云研发工程师

基于Apache EventMesh构建云原生数据流转平台

梁荣华

微众银行中间件开发工程师

茶歇

研发效能数据集成平台DevLake的架构分享

陈映初

思码逸软件开发工程师

Citizen Streaming Engineer - A How To

SPann, Timothy

StreamNativeDeveloper Advocate

Camel K goes Quarkus Native

Congiusti, Pasquale

Red HatSoftware Engineer

Integrating systems in the age of Quarkus, serverless and Kafka

Bendhiba Zineb

Red HatSenior Software Engineer

Up & Running: Low Code Cloud-Native Integrations

Yordán, Rachel

Red HatPrincipal Software Engineer

Building a real-time analytics dashboard with Apache Kafka, Apache Pinot, and Streamlit

Dhanushka, Dunith

StarTreeDeveloper Advocate

Wolok, Karin

StarTreeHead of Developer Community and Marketing

Apache Ozone: Multi-Protocol aware system handles both Files and Objects efficiently

Radhakrishnan, Rakesh

Cloudera Private LimitedStaff Software Engineer

Singh, Mukul Kumar

Cloudera Private LimitedSenior Engineering Manager

eBay基于Apache Kyuubi(Incubating) 构建Unified & ServerLess Spark网关实践

王斐

eBayStaff Engineer,Apache Kyuubi PPMC Member

Optimization and practice of Apache InLong in Tencent Cloud

Yunqing Mo

Tencent CloudBig data senior engineer

基于 Zeppelin 的 Flink/Spark 云原生实践

陶克路

字节跳动基础架构研发工程师

王正

字节跳动工程师,

基于血缘的离线数仓数据发现方法

韩帅

字节跳动高级研发工程师

孙科

字节跳动高级研发工程师

Apache Doris 1.x 极速版的新特性和云原生时代的未来规划

杨政国

百度Apache Doris PPMC，资深研发工程师

An extension of Apache Atlas’ data model and an alternative open source user interface.

Wombacher, Andreas

Aurelius Enterprise B.V.CTO

Apache Druid cloud native architecture evolution

金嘉怡

SHOPEEExpert Engineer

Scaling Open Source Big Data Cloud Applications is Easy/Hard

Paul Brebner

InstaclustrChief Technology Evangelist

Fine grained authorization to Cloud stores using Apache Ranger

Mukund

ClouderaStaff Software Engineer

货拉拉大数据基础架构体系演进

张伟伟

深圳依时货拉拉科技有限公司大数据SRE负责人

小米基于 RocketMQ 搭建高可用在线消息平台实践

王帆

小米高级软件研发工程师

基于 RocketMQ 的全链路业务灰度

黄展鹏

政采云有限公司政采云资深架构专家

大数据生态的RocketMQ事件、数据流融合处理

李伟

腾讯资深开发工程师

RocketMQ 与 Kafka 的比较及一种绘画技巧在中西文化中应用的比较。

彭龙

美的集团软件工程师

大规模ActiveMQ平滑迁移RocketMQ

高向阳

北京转转精神科技有限责任公司资深研发工程师

RocketMQ消息队列在移动云端的云原生实践与应用

胡宗棠

中国移动云能力中心技术专家

Apache ECharts 的无障碍设计

羡辙

Apache EChartsPMC Chair

Introducing advanced drill-down functionality in Apache Superset using Apache ECharts

Brofeldt, Ville

AppleSoftware Engineer

How we use ECharts in SkyWalking

Fan Qiuxia

Apache SkyWalking PMC, Apache CommitterSoftware Engineer, Tetrate.io

大数据可视化低代码平台的探索与实践

王海虎

云智慧（北京）科技有限公司研发经理

差序格局与开源文化的碰撞

姜宁

华为技术专家

李圳虎

北京大学硕士在读

如何吸引亚洲文化背景的开发者参与到开源

张晋涛

API7.ai云原生技术专家

How SegmentFault build community? In an opensource way

Bo Jiang

SegmentFault, KAIYUANSHE, CCF Open-Source Development CommitteeCOO of SegmentFault; Board Member of KAIYUANSHE; Executive Member of CCF Open-Source Development Committee

UPSTREAM FIRST: AN ETHNOGRAPHIC STUDY OF OPEN SOURCE SOFTWARE COMMUNITY

Zhou, Jesse(周禹任）

Peking UniversityStudent

开源与商业化友好的过去和未来

适兕

「开源之道」主创

Apache Submarine 云原生机器学习平台

刘勋

滴滴高级技术专家

OpenMLDB: An Enterprise-Grade Feature Platform Built Upon Spark

LU MIAN

OpenMLDB Community; 4ParadigmOpenMLDB PMC core member; Tech lead of HPC and database teams in 4Paradigm

Pegasus与Flink在小米机器学习平台中的实践

黄飞

小米互联网业务部商业平台技术部负责人

抓住P99的尾巴 -- 机器学习推理的性能调优

兰青

亚马逊云科技软件开发工程师

茶歇

华为终端基于Apache Pulsar的消息队列演进之路

林琳

华为SDE专家

王小童

华为资深工程师

Apache Pulsar在vivo的探索与实践

全利民

维沃移动通信（深圳）有限公司大数据工程师

KoP在新浪微博的优化与实践

沈文兵

新浪微博数据平台开发工程师

Apache Pulsar 在腾讯云稳定性优化实践

冉小龙

腾讯云高级研发工程师

BIGO基于Pulsar在高吞吐追赶读场景下的性能优化实践

吴展鹏

BIGOStaff Engineer

基于Pulsar Functions的日志加工DSL设计与实现

王嘉凌

中国移动云能力中心软件开发工程师

Apache SkyWalking: An open source holistic application performance monitoring and observability tool

Navarro Sonnenfeld, Marc

TetrateSoftware Engineer

Apache SkyWalking with Native eBPF Agent

Han, Liu

TetrateEngineer

Approaching Robust Anomaly Alerting Capabilities at Apache SkyWalking with AIOps

Chen Yihao

Queen's UniversityMaster's Student, Apache SkyWalking Committer

Observability Solution for Apache Http Server

Nagariya, Ajay

CiscoDirector Of Engineering at AppDynamics (Cisco)

Das, Debajit

CiscoSenior Software Engineer at AppDynamics (Cisco)

Pratyush, Kumar

CiscoSoftware Engineer at AppDynamics(Cisco)

How SkyWalking Uses BanyanDB

Hongtao Gao

tetrate.ioFounding Engineer

Apache SkyWalking MAL practice -- VMs and Kubernetes Monitoring

Kai Wan

TetrateEngineer

中国如何发展全球性开源治理策略

王伟

华东师范大学研究员

王永雷

DevSecOps 专业人士

边思康

企业 OSPO 专业人士

赵生宇

博士研究生

Apache Doris开源项目到产品的商业化探索之路

连林江

SelectDB公司CEO

OpenChain Project Introduction

Shane Coughlan

expert

Brand Management - An Introduction

Mark Thomas

member of the ASF

Practical Steps to Encourage Community Growth

Rich Bowen

member of the Apache Software Foundation

DolphinScheduler 在T3出行一站式平台中的应用

李心恺

T3出行大数据平台研发工程师

赵玉威

T3出行大数据平台研发工程师

Apache DolphinScheduler with Kubernetes for Big Data Processing

Liu Dingzheng

CiscoLeader, Engineering

oppo基于Apache Seatunnel的数据处理平台

范未太

oppo移动通信有限公司高级后端开发工程师

The improvement and application of DolphinScheduler in BIGO

XU SHUAI

BIGOPrinciple Engineer

95 后 Apache Member 与基金会的故事

琚致远

API7.aiHead of Global

How to enable and foster open source collaboration with leading corporations listed on the stock market

Yacine Si Tayeb, PhD

SpereEx & Apache ShardingSphereHead of International Operations

怎么与开源开发者们进行友好的远程协同工作

李广远

深圳市Bello智能科技有限公司高级前端开发工程师

openGauss社区治理实践分享：从Open Source到Open Governance

梅相如

华为技术有限公司openGauss社区运营经理

Apache Doris 社区运营实践与思考

鲁志敬

Apache Doris 社区社区运营负责人

茶歇

Apache Ozone behind Simulation and AI industries

Kota Uenishi

Preferred Networks, Inc.Engineer

What's new in Apache Impala 4.x

黄权隆

ClouderaStaff Software Engineer

HBase在美团的改进和实践

哈晓琳

北京三快在线科技有限公司研发工程师

Support Customized Kubernetes Schedulers: 为Spark on Kubernetes提供更完善的调度能力

姜逸坤

华为高级软件工程师

Cloud Shuffle Service 在字节跳动 Spark 场景的应用实践

魏中佳

字节跳动基础架构大数据开发工程师

Interactive data engineering workload execution using Livy session on Kubernetes cluster

Chaturvedi, Anmol

Informatica CorporationDirector of Engineering

Apache Shenyu 基于 OpenSergo 实现全链路灰度

鲁严波

阿里巴巴高级开发工程师

API 网关的实践之路

展留坤

API7.ai产品经理

APISIX Runtime Debugging

ZhengSong Tu

Apache APISIXSoftware Engineer

基于 Apache APISIX 的 K8s 网关建设

曾强

武汉木仓科技股份有限公司运维开发

安信证券基于 Apache APISIX 的云原生网关实践

卢勇辉

安信证券股份有限公司软件工程师

为什么现代微服务架构需要云原生 API 网关

王院生

API7.aiCo-founder & CTO

基于 Flink CDC 和 Hudi 高效地构建实时数据湖

徐榜江

阿里云高级研发工程师

使用 Apache Pulsar 开发基于 Apache Flink 的流批一体化应用

盛宇帆

StreamNativeFlink 开发工程师

基于数据湖格式构建流式增量数仓——CDC

毕岩

阿里云智能-计算平台事业部-开源大数据平台技术专家

Introduce the Flink SQL Connector for Pulsar and PulsarCatalog

Yufei Zhang

StreamNativeSoftware Engineer

Making Flink K8S works as your wish

赵波

华为无

Use Apache Pulsar Functions in a Cloud-Native way

Rui Fu

StreamNativeSoftware Engineer

行家设备云 AIOT助力工业服务数字化转型

李知周

震坤行工业超市物联网专家

工业物联网时序数据库 Apache IoTDB

张金瑞

天谋科技(北京)有限公司软件开发工程师

Apache JMeter 在 IoT 测试中的应用

殷翀元

EMQ项目经理

Apache IoTDB在华为云的实践

王超

华为云计算技术有限公司Apache IoTDB Committer，华为云MRS时序数据库研发负责人

IOTDB 在阿里云智能制造业务中的实践

巩宁

阿里巴巴高级技术专家

MQTT-on-Pulsar：How MoP Supports MQTT 5 on Pulsar

Zhao, Qiang

StreamNativeSoftware Engineer

IoTDB-Workbench低成本助力工业实时数据库

郑强

重庆市赛迪信息技术有限公司工程师

Apache IoTDB UDF 的基本原理及其在工业数据质量领域的实践

苏宇荣

清华大学软件学院硕士研究生

贺文迪

清华大学软件学院硕士研究生

喜马拉雅基于Apache ShardingSphere实践

彭荣新

上海喜马拉雅科技有限公司架构师

沈辉

上海喜马拉雅科技有限公司高级工程师

Apache EventMesh如何解决SaaS组合式应用集成标准化问题

罗锦荣(Alex Luo)

华为技术有限公司华为主任软件工程师

Kafka高可用架构实践

刘俊洋

华为云计算技术有限公司消息中间件技术负责人

基于消息的分布式事务

余洲

华为云计算技术有限公司高级工程师

云原生数据库如何走向极致扩展- Amazon Aurora 的做法

马丽丽

亚马逊云科技数据库专家架构师

SpamAssassin 4.0: new features to detect new spam types

Bechis Giovanni

SNB S.r.l.Ceo

Creating cross-platform, reproducible, binary builds for Java projects

Thomas, Mark

VMwareStaff Engineer

Capturing per thread statistics for a job - Thread-level IOStatistics - HADOOP-17461

Singh, Mehakmeet

ClouderaSoftware Engineer II

Apache Ozone 的最近进展和实践分享

Yan Liu

ClouderaSenior Sales Engineer

陈怡

ClouderaPrincipal Engineer

字节跳动基于 Apache HUDI 的数据湖表优化管理服务

喻兆靖

字节跳动高级开发工程师

如何使用 Apache Seatunnel 简化数据同步

陶克路

字节跳动基础架构研发工程师

Hadoop Vectored IO: your data just got faster!

Thakur, Mukund

ClouderaStaff Software Engineer

Apache Hive 4.0 的新特性

Yan Liu

ClouderaSolution Engineering

FLiPN Awesome Streaming with Open Source

Spann, Timothy

StreamNativeDeveloper Advocate

Introducing TableView: Pulsar's database table abstraction

Kjerrumgaard, David

StreamNativeDeveloper Advocate

Get rid of topic metadata limitation for infinite data retention of Apache Pulsar

Penghui Li

StreamNativeTech lead & Apache Pulsar PMC member/committer

How Kafka-on-Pulsar Integrates with Pulsar Schema to Leverage Pulsar for Kafka Users

Yunze Xu

StreamNativeSoftware Engineer

Deep Dive into Message Chunking in Pulsar

Zike Yang

StreamNativeSoftware Engineer

Deep Dive into Apache Pulsar: How Two-Phase Deletion Protocol works between Storage and Metadata

赵延

StreamNativeSoftware Engineer

State of the Cat

Thomas, Mark

VMwareStaff Engineer

Tomcat优雅停机设计及相关特性应用实践介绍

饶子昊

阿里云阿里云智能研发工程师

Extending Valves in Tomcat

Jacob, Dennis

VISASenior Consultant

tomcat在快手的应用实践

宋洋

快手（北京达佳互联科技有限公司）后端研发工程师

开源软件和软件许可证

Rui Chen

MeetupSenior Staff Software Engineer

「开源之史」介绍与思考

适兕

「开源之道」作者

认识“开源”两百天

展留坤

API7.ai产品经理

当开源和开放合流：Open 的价值核心和大众认同

高丰

开放数据中国执行主任

The art of technical writing

Sacks, Matthew

StoneFish Defense Inc.Ceo

bRPC：云原生时代的RPC框架进化

王伟冰

百度资深研发工程师

brpc高性能最佳实践

周末

百度时代网络技术（北京）有限公司资深研发工程师

基于bRPC打造高性能Service Mesh数据平面

刘帅

百度资深研发工程师

从Dubbo、SpringCloud 到服务网格的平滑演进实践

曾宇星

阿里云技术专家、云原生架构师

Service Mesh 在小米的实践

张志勇

小米高级开发工程师

Apache ShenYu网关如何代理Dubbo服务

刘良

Apache ShenYu CommunityApache ShenYu PPMC

茶歇

RocketMQ Streams-轻流计算在云安全和边缘计算的最佳实践

袁小栋

阿里云高级技术专家

鲁班RocketMQ平台消息灰度方案

区二立

vivo技术总监

基于RocketMQ Connect构建全新数据流转处理平台

周波

阿里巴巴高级开发工程师

Apache RocketMQ 5.0：消息、事件、流融合高可用架构的演进

金融通

阿里巴巴Apache RocketMQ PMC Member/ Commiter，阿里巴巴高级研发工程师

消息驱动的轻量级计算在物流领域的应用

王鑫

菜鸟网络菜鸟网络高级技术专家、出口物流数据智能负责人，Apache Storm、RocketMQ、IoTDB Committer

Apache RocketMQ 5.0 在流存储领域的实践与探索

刘振东

Apache RocketMQ CommunityApache RocketMQ PMC Member/Committer

实时深度学习训练PAI-ODL

刘童璇

阿里云智能计算平台事业部PAI高级技术专家

Spark + ONNX + CANN: 如何提升分布式推理的性能与体验？

王玺源

华为高级软件工程师

姜逸坤

华为高级软件工程师

黄之鹏

华为华为昇腾开源生态总监

Flink ML: 基于Apache Flink的实时机器学习

高赟

阿里巴巴技术专家

张智鹏

阿里巴巴高级算法工程师

BladeDISC: 支持动态Shape的深度学习编译器实践

邱侠斐

阿里云计算有限公司高级技术专家

茶歇

POWER TUNA

刘一芃

TUNA 会长

来自 Z 世代的开源新生力量

李梦

中科院软件所运营

陈意昊

学生

谢其骏

学生

张俊杰

学生

黄章衡

学生

实时数仓 Apache Doris 发展历程和技术解析

张东进

百度架构师

在开放社群中推动教育公益和青年发展

刘于瑜

freeCodeCamp.org中文社区大使

开源社区是最好的社会大学

夏小雅

博士

数解开源

赵生宇

博士研究生

Learning the Apache Way through the ASF Incubator

Justin Mclean

InstaclustrVice President

如何在银行当中让更多人受益数据处理

陈卫

四川新网银行大数据架构师

Apache Linkis 数据处理实践

李孟

上海仙翁科技数据架构

When failure is not an option.

Kjerrumgaard, David

StreamNativeDeveloper Advocate

用傲腾持久内存打造极速Pulsar

胡风华

英特尔公司傲腾产品事业部云软件架构师

How Key_Shared subscription works

郭吉伟

StreamNative软件工程师

Deep Dive into Apache Pulsar Transaction - How it works and notes for it.

孟祥迎

StreamNativePlatform Engineer

Apache Pulsar 中 Load Manager 的工作原理与实践

王锴

StreamNative软件工程师

移动云RabbitMQ消息队列在Openstack超大规模应用中的实践

胡宗棠

中国移动云能力中心技术专家

From Zero to 1.5k Stars: Launching a New Open Source Project

Wheatley, Maxim

MericoHead of Business and Marketing

从运维到 Tech-Writer，如何理解文档对于开源软件的重要性

韩飞

Apache APISIX技术文档工程师

开源网站信息架构的攻守之道

Yu Liu

StreamNativeInformation Architect

Building Super-Contributors in Alluxio Open Source Community

Wang, Jasmine

Alluxio, Inc.Community Manager

Apache APISIX 的社区运营实践分享

Jing Li

Apache APISIX 社区Apache APISIX Contributor/ API7.ai 用户增长团队

Reese Liu

Apache APISIX 社区 Apache APISIX Contributor / API7.ai 用户增长团队

来自 API7.ai 用户增长团队，虽然学习的是行政管理，但从上学开始就致力于做好每一场活动的执行。并为 Apache APISIX 贡献策划方案，其中包括 Apache APISIX Summit Asia 2022。

茶歇

BIGTOP 3.0 with the upgraded Mpack: New era of BigData Distribution

Yuqi Gu（顾煜祺）

Arm ChinaSenior Software Engineer

Large scale migration to Parquet in Uber

Huicheng Song

UberSenior Software Engineer

Spark在小米的应用实践

王准

小米研发工程师

Linkis 如何为多样化的大数据计算存储引擎提供计算治理能力

邸帅

微众银行Apache Linkis(incubating) PPMC，微众银行大数据平台负责人，ALC深圳成员

基于 Apache Calcite 的多引擎虚拟列技术

谢佳君

字节跳动高级研发工程师

大数据Python生态在传智教育的实践和思考

张敬存

江苏传智教育科技有限公司资深研究员

赵晨杰

江苏传智教育科技有限公司资深研究员

apache doris (incubating) flink connector 2.0

杨勇强

SelectDB产品vp

Disaster Recovery in Apache Ozone

Sadanand Shenoy

ClouderaSoftware Engineer ||

Rakesh Radhakrishnan

ClouderaStaff Software Engineer

更易用、更强劲的大数据分析平台 -- Kylin 5.0 社区路线一览

俞霄翔

Kyligence高级研发工程师

Flink Table Store：流式数仓架构与场景

李劲松

阿里巴巴技术专家

开源大数据Studio: Dolphinscheduler + Notebook

高楚枫

阿里云EMR数据开发团队基础平台开发工程师

虎牙基于图数据的应用元数据平台实践

邹磊

虎牙直播信息科技有限公司高级工程师

Tales at Scale: Analytics at 1000 QPS and Beyond

Merlino, Gian

ImplyCo-Founder and Chief Technology Officer

Foreseeing API development trends from the cloud-native gateway Apache APISIX

Jiang Chenwei

Apache APISIXServer-side Development Engineer

Starting Apache APISIX on the right foot

Umurzokov, Bobur

Apache APISIXDeveloper Advocate

Das, Ayush

Apache APISIXContributor to Apache APISIX

Protecting your APIs with Apache APISIX

Navendu, Pottekkat

API7.aiDeveloper Advocate

Chopping the monolith

Fränkel Nicolas

Apache APISIXHead of Developer Advocacy

Jakarta EE

Thomas, Mark

VMwareStaff Engineer

基于 dubbo-go-pixiu 的 dubbo mesh

麻志辉

dubbo-go-pixiu社区dubbo-go-pixiu committer

How we use and optimize Tomcat at Alibaba

Huxing Zhang

Alibaba CloudStaff Engineer

Make Apache Pulsar as Lakehouse: Introduction Lakehouse Tiered Storage Integration for Pulsar

陈航

StreamNativeEngineer

How Tencent Applies Apache Pulsar to Apache InLong

LinChen

TencentBig data development engineer

Another Way for Intergrating Pulsar and Lakehouse: Connector for Sinking Pulsar Topic Data into the Lakehouse Storage

Yong Zhang

StreamNativeSoftware Engineer

Build one Real-time Data Warehouse Based on Apache Pulsar

Dezhi Liu

StreamNativesolution expert

Apache Pulsar's Authentication and Authorization Practices for Clusters and Cloud

傅腾

StreamNativeTechnical Support Engineer

Intergrating Apache Pulsar with BigQuery to Build Data Pipeline

石宝迪

StreamNativePlatform Engineer

Apache Pulsar 在open-telemetry-collector上的应用

Tao Jiuming

StreamNativeSoftware Engineer

Dubbo协议向Dubbo3（Triple）协议的变迁

华钟明

杭州有赞科技有限公司中间件技术专家、Apache Dubbo PMC、Dapr member、Apache Tomcat Contributor

Dubbo 3.0 应用级服务发现详解

江河清

阿里云Java 研发工程师

阿里本地生活Dubbo3实践分享

孙刚

阿里巴巴本地生活研发工程师

Dubbo3 在钉钉文档云原生过程中的最佳实践

董建凯

钉钉文档高级研发工程师

dubbo-go配置设计

赵云兴

dubbo-go软件工程师

Dubbo-go 在生产环境中的落地和实践

王晓伟

Dubbo-go 社区commiter

茶歇

Speakers

盛宇帆

StreamNative Flink 开发工程师

现任职于 StreamNative，负责 Flink 主仓库下 flink-connector-pulsar 模块的开发工作，以及 StreamNative Cloud 相关模块的开发维护工作。

Tao Jiuming

StreamNative Software Engineer

StreamNative Platform Engineer Apache Pulsar Contributor

张晨

尚硅谷IT教育项目总监

尚硅谷主持开发数据管理平台项目，用户画像、数据仓库、实时数据仓库项目，多年技术讲师，增参与国家电网公司、中信银行、中植集团的一线项目。

张伟伟

深圳依时货拉拉科技有限公司大数据SRE负责人

货拉拉大数据SRE方向负责人，负责大规模Hadoop集群和相关生态系统的稳定性保障，以及参与大数据安全体系建设、成本控制等方向的工作，有多年的跨云架构体系经验。

周波

阿里巴巴高级开发工程师

目前就职于阿里云消息团队，Apache RocketMQ Committer。

黄展鹏

政采云有限公司政采云资深架构专家

负责公司云原生、中间件、通用服务三大领域，搭建全链路业务灰度平台圭臬和婵娟，实现全链路业务灰度和应用持续交付。

林立伟

腾讯科技（北京）有限公司腾讯广告特征生产、样本数据、策略框架技术负责人

林立伟，腾讯广告特征生产、样本数据、策略框架技术负责人，腾讯广告 Spark、Flink 负责人；10 年大数据经验，Apache Spark / Apache Flink 开源社区活跃贡献者，《Spark Streaming 源码解析系列》（Github star 3k+）作者。

郑舒力

金山云研发专家

10+年分布式系统与大数据实时计算业务与平台研发经验，2015年加入金山云，一直从事实时计算相关工作，现为金山云大数据部实时计算平台部负责人，主要负责金山云实时计算平台与数据采集平台的架构与产品设计，核心模块研发以及疑难问题解决；擅长复杂分布式系统性能调优与可用性提升；专注于Flink实时计算技术和大数据平台架构，乐于推广与布道实时计算相关技术。 DTCC（中国数据库技术大会） 2021讲师

Rui Chen

Meetup Senior Staff Software Engineer

I am currently working as Senior Staff Software Engineer at Meetup. And I am also helping out as maintainer for Homebrew, Atlantis, tflint, and awesome-terraform projects. As OSS project lover, I also do various contributions to many other Terraform/k8s/Bazel projects as well.

Yordán, Rachel

Red Hat Principal Software Engineer

Rachel Yordán is a Principal Software Engineer for Red Hat, and a core contributor of Syndesis and Kaoto integration platforms. She is the director of a nonprofit that develops and advocates for the use of open source software in the fashion and textile industry. Originally a Chemistry major, Rachel spent most of her teen and college years coding for fun. She has now spent over a decade working professionally in web development. She also loves keyboards and dogs.

张俊帆

爱奇艺大数据工程师

开源项目个人参与 Flink/Oozie/Hadoop 等大数据项目，负责维护 github.com/tony-framework/tony 开源地址：github.com/zuston 过去无演讲记录

谢磊

中移(苏州)软件技术有限公司软件工程师

目前就职于中移（苏州）软件技术有限公司。主要负责大数据集群运维、调优，以及组件漏洞修复，保证集群稳定运行。同时主要负责维护实时计算平台的开发，在实时数据处理和大数据分析方面有丰富经验，也活跃于开源社区。

胡宗棠

中国移动云能力中心技术专家

胡宗棠中国移动云能力中心，云原生领域技术专家， Apache RocketMQ Committer，SOFAJRaft Committer， Alibaba/Nacos Committer，Linux OpenMessaging Member 熟悉分布式消息队列、API 网关和分布式事务等中间件设计原理、架构以及各种应用场景，具有丰富高性能、高可用和高并发经验。

Chaturvedi, Anmol

Informatica Corporation Director of Engineering

Anmol Chaturvedi heads up the stack that integrates the Cloud MDM, Cloud Data Engineering engine, Cloud Profiling, and the Cloud Data Quality suites in the Elastic Serverless Cloud Data Engine platform at Informatica. He has been responsible for the Data Virtualization suite, and a native distributed data processing engine engineered in house at Informatica. He has over 15 years of experience in Enterprise Software Development relating to data management, distributed computing, and databases.

卢勇辉

安信证券股份有限公司软件工程师

安信证券技术平台室软件工程师，主要负责云原生微服务网关和云原生中间件，曾在 Apache APISIX Summit Asia 2022 进行演讲。

Alibaba Cloud Staff Engineer

Huxing Zhang is a Staff Engineer at Alibaba Cloud, He is a Member of Apache software foundation and a PMC member of Apache Tomcat and Apache Dubbo.

Chen Yihao

Queen's University Master's Student, Apache SkyWalking Committer

Yihao Chen is a current master's student at the Software Analysis and Intelligence Lab (SAIL) at Queen's University and a Committer at Apache SkyWalking, he focuses on community building and emerging technology adoption.

Radhakrishnan, Rakesh

Cloudera Private Limited Staff Software Engineer

Rakesh Radhakrishnan is a committer and a PMC in Apache Ozone, Hadoop, ZooKeeper projects and primarily focusing on open source big data technologies. Rakesh is currently working at Cloudera and actively contributing on the Apache Ozone project. He has more than 15 years of experience in large scale Distributed Software Platforms design and development. Prior to joining Cloudera, he worked as a Big Data Software Engineer in Intel Corporation.

REGISTRATION

Preparation Stage

Registration

End of Event

Choose ticket

Type

Price(￥)

Sales End

Quantity

报名免费参会

Free

2022-08-31 23:55

Sold Out

Registrate For FREE

Free

2022-08-31 23:55

Sold Out

个人捐赠票

￥199

2022-09-10 20:05

Past Events

限量 200 张，赠送价值 150 元讲师同款 APACHE 社区周边，剩余费用我们将捐赠给 Apache 软件基金会。相关周边将在会议结束两周内快递发出。

Individual Donation

￥199

2022-08-31 23:55

Sold Out

Limited to 200 people, with an APACHE Community Souvenir worth ¥150.

Speaker 讲师票

Free

2022-08-06 14:12

Past Events

Registration will need approval from the organizer.

This ticket is for sending the conference souvenirs to the speakers.
此门票是用来统计讲师大会纪念品邮寄信息。

志愿者贡献票

By invitation only

Free

2022-08-31 23:55

Sold Out

Registration will need approval from the organizer.

志愿者贡献专用，需审核，请勿转发。

Tickets can only be acquired by invitation code.

REFUND POLICY:No Refund

Promotional code

Enter Invitation or Discount Code

Price

￥ 0

Event Ended

PARTNERSHIP

STRATEGIC

PLATINUM

GOLD

SILVER

STRATEGIC COMMUNITY

ORGANIZER

This event is technically supported by BagEvent