使用微信扫一扫分享到朋友圈
使用微信扫一扫进入小程序分享活动
CommunityOverCode (原 ApacheCon) 是 Apache 软件基金会(ASF)的官方全球系列大会。自 1998 年以来--在 ASF 成立之前 -- ApacheCon 已经吸引了各个层次的参与者,在 300 多个 Apache 项目及其不同的社区中探索 "明天的技术"。CommunityOverCode 通过动手实作、主题演讲、实际案例研究、培训、黑客松活动等方式,展示 Apache 项目的最新发展和新兴创新。
CommunityOverCode 展示了无处不在的 Apache 项目的最新突破和 Apache 孵化器中即将到来的创新,以及开源开发和以 Apache 之道领导社区驱动的项目。与会者可以了解到独立于商业利益、企业偏见或推销话术之外的核心开源技术。
CommunityOverCode 项目是动态的,每次活动的内容都是由精选的 Apache 项目开发者和用户社区直接推动的。CommunityOverCode 提供了最先进的内容,在一个协作、厂商中立的环境中,展示了大数据、云计算、社区发展、金融科技、物联网、机器学习、消息中间件、编程、搜索、安全、服务器、流媒体、网络框架等方面的最新开源进展。
本次会议将以线下的方式于 2023 年 8 月 18 日至 8 月 20 日在北京丽亭华苑酒店举行,并进行线上直播, 欢迎各位嘉宾的光临。
Since 1999 when The Foundation was established, the open source landscape has changed in many ways but the founding principles remain. The ASF continues to operate as a charity to serve the public interest. The projects, under the direction of the Project Management Committees, are the primary governing bodies, subject to oversight by the Board of Directors.
Over the past few years, both internal and external events have required changes to the way the ASF operates:
Governments have recognized that open source software presents new security challenges to the way the internet works;
Privacy concerns require changes to the ASF approach to transparency;
The ASF needs to recognize that new communications products and protocols change the way communities interact, both within and external to them.
如果说在过去 Apache Doris 更多是服务于高性能在线实时分析场景的话, 2022年底发布的 1.2 版本无疑标志着 Apache Doris 能力边界得到进一步拓展,越来越多用户开始基于 Apache Doris 构建高效的实时数据分析服务,而最近发布的 2.0 版本更是全面强化了 Apache Doris 在半结构化数据分析、混合工作负载以及数据湖联邦分析等场景下等场景下的能力。在本次的分享中,我将会为大家揭秘 Apache Doris 2.0 版本的最新重磅特性。同时结合过去几年里社区研发方向的思考,将会分享后续社区的重要发展方向以及版本迭代的详细计划。
阿里巴巴自 2009 年开始采用 Apache Hadoop 技术进行大数据分析,2010 年第一次将 Apache HBase 技术在商品搜索中大规模投产,2016 年将处于萌芽状态的 Apache Flink 在双 11 实时推荐场景落地,并在同年阿里云上发布支持 Apache Hadoop/Hive/Spark/Kafka 等主流开源大数据技术的 E-MapReduce 云产品。在最近几年,阿里云开源大数据 Flink 团队作为 Apache Flink 最主要的贡献者推动 Flink 成为全球流计算事实标准,并向 ASF 捐赠了 Apache Celeborn 和 Apache Paimon 开源大数据项目,本议题将介绍阿里云大数据如何一步步从拥抱、贡献开源走向开源社区的引领者。
在我们的数字世界中,开源软件已经成为了像路桥一样的基础设施的一部分,发挥着越来越大的作用。然而,随着开源生态系统的发展,我们也面临着诸多挑战。开源软件供应链安全,开源的可持续发展,以及如何处理好开源与商业之间的关系,已成为开源世界急需需要解决的问题。在这次圆桌讨论中,我们将与 Apache软件基金会的资深人士一起,探讨开源世界面临的挑战以及可能的解决方案。
随着计算任务的复杂性和数据量的增加,传统的通用计算平台已经无法满足高性能计算的需求。异构计算体系的加速到来,不同的计算平台具有不同的指令集和架构特点。面向异构计算的编译器体系可以提供更高的性能和效率,并且支持不同类型的计算单元和平台之间的无缝集成,从而推动计算技术的发展和创新。
BentoML provides tooling for packaging, deploying, and serving machine learning models at scale. Apache Spark is an open-source cluster computing framework for large-scale data processing. This talk will highlight how BentoML can unify real-time and batch inference workloads by integrating with Apache Spark. BentoML has rapidly gained popularity among its user base owing to its seamless open standards for constructing online AI applications as distributed services through simple Python code. In this regard, we present the novel integration of BentoML with Spark, which allows users to employ the Bento service, originally designed for real-time inference, within a Spark cluster for offline batch inference without altering any code. This functionality is enabled by the run_in_spark API, which automatically propagates the models and inference logic across all Spark worker nodes during batch inference. This integration offers an optimal solution for teams to manage both their real-time and batch inference logic under the same standards, facilitated with version control, and ensuring consistent library dependencies. As a result, this eliminates the concerns regarding divergence in the inference logic over time between real-time and batch inferences. The unified approach ensures consistent model application, fostering efficient AI service development and deployment. Attendees will learn how to:
1. Package models with BentoML;2. Deploy BentoServices to production;3. Invoke BentoServices from Spark for batch inference at scale;4. Leverage the same models for both real-time and batch predictions.
典型的流计算主要针对表模型的处理场景,而针对图模型如何进行流式的处理和分析,目前通用流计算还难以支持。本次分享主要介绍蚂蚁自研的流式图引擎GeaFlow,以及GeaFlow如何围绕Apache Calcite和Apache Gremlin构建流式图查询语言的能力。同时也会分享基于流式图计算在蚂蚁内部的实践和应用。
"Stream processing is rapidly evolving to meet the high-demand, real-time requirements of today's data-driven world. As organizations seek to leverage the real-time insights offered by streaming data, the need for robust, highly concurrent analytics platforms has never been greater. This presentation introduces Apache Druid, a modern, open-source data store designed for such real-time analytical workloads. Apache Druid's key strength lies in its ability to ingest massive quantities of event data and provide sub-second queries, making it a leading choice for high concurrency streaming analytics. Our exploration will cover the architecture, its underlying principles, tuning principals and the unique features that make it optimal for high concurrency use-cases. We'll dive into real-life applications, demonstrate how Druid addresses the challenge of immediate data visibility, and discuss its role in powering interactive, exploratory analytics on streaming data. Participants will gain an in-depth understanding of Apache Druid’s value in the rapidly evolving landscape of streaming analytics and will be equipped with the knowledge to harness its power in their own data-intensive environments. Join us as we delve into the future of real-time analytics, discovering how to 'Shaping the Future: Unveiling High-Concurrency Streaming Analytics with Apache Druid'.
Apache Pulsar 社区最近推出了 Apache Pulsar 3.0,这是 Pulsar 的第一个 LTS 版本。 在本次演讲中,我们将深入探讨Pulsar LTS 版本的重要性。 我们还将介绍 Pulsar 3.0 中引入的主要特性,包括新的负载均衡器、大规模延迟消息的支持以及Direct IO 优化等。
Currently, Kafka relies on ZooKeeper to store its metadata, ex: brokers info, topics, partitions...etc. KRaft is a new generation of Kafka that runs without Zookeeper. This talk will include: 1. Why Kafka needs to develop the new KRaft feature. 2. The architectures of the old (with Zookeeper) Kafka and new (without Zookeeper) Kafka 3. Benefit of adopting KRaft 4. How it works internally. 5. The monitoring metrics 6. Tools to help troubleshoot issues in KRaft 7. A demo to show what we've achieved so far. 8. The roadmap for the Kafka community to move toward KRaft. After this talk, the audience can have better knowledge of what KRaft is, and how it works, and what's the difference with Zookeeper based Kafka, and most importantly, how to monitor it and troubleshoot it.
Join us for an engaging discussion on how the Apache Way fosters community and ensures the longevity of open source projects. Explore the key principles behind successful Apache communities, including consensus-based decision-making, transparent communication, independent governance, and open development practices. Discover how embracing the Apache Way can cultivate a vibrant community, attract new contributors, and drive the sustained success of your open source project.
也许很多同学都有想过参与一些开源贡献,来提升自己的技术能力和影响力。但是理想跟现实之间通常有一些距离:因为工作太忙,没有时间参与;开源项目门槛太高,不知道怎么入门;尝试过一些贡献,但是社区响应度不高,没有坚持下去。本次 keynote,李本超会结合自己的经历,分享他在贡献开源社区过程中的一些小故事和思考,如何克服这些困难,最终在开源社区取得突破,并且在工作和开源贡献之间取得平衡。
开源社区如同有机的生物系统,本次演讲将会和大家一起来探讨社区发展的源动力,从道(Purpose)法(Principle)术(Process)器(products)多个角度来讲社区的力量,Community is people 揭示出社区最核心最有价值就是社区中的每一个人(People)。我们将会探讨每一个人参与到社区中贡献的源动力和收益是什么?再优秀和成功的项目,失去了社区共同体,失去了投入其中的开源贡献者,开源项目就失去了生命力。因此,不管是企业开源和维护的开源项目,还是个人发起或者基金会维护的开源项目,都是需要优秀的开发者以一种自驱的方式长期投入到这份开源事业之中,秉持长期主义,把视线拉长,享受开源带来的成就感和幸福感。
在这个充满挑战和机遇的时代,云原生技术正引领金融行业的变革与创新。本次演讲将分享将深入探讨和分享数字金融时代,云原生技术的应用与创新应用案例,以及其在分布式金融新核心转型中的重要性。
将以分布式金融新核心转型实践为例,介绍云原生技术在数字金融时代中实现分布式金融新核心转型的案例,聚焦于新一代无服务器事件中间件 Apache EventMesh,共同探讨如何提升业务效率、用户体验以及开源和云原生创新的重要性。
消息中间件作为消息通信的基础软件,已在业界诸多的IT系统(比如,大数据分析领域、面向云计算基础设施的 OpenStack 领域和物联网/车联网、边缘计算应用领域)中被广泛使用。
移动云在消息中间件的技术演进与发展方向上一直坚持着自研和开源融合的发展路线,在做好自研的基础上积极拥抱开源生态,尤其是近年来随着移动云业务的持续高速发展,
基于开源技术生态的消息中间件云产品体系越来越受到市场的青睐。
从18年开始,移动云一直积极参与Apache RocketMQ、Apache Pulsar和Apache Kafka等开源社区的共同建设工作。
目前,移动云消息中间件团队已经培养出超过多位 Apache 顶级项目 Commiter / PMC Member。
本次 Talk 我们将向大家介绍移动云在过去几年来的开源消息中间件发展历程、业务探索与实践以及未来规划。
本议题主要介绍中国工商银行通过技术手段解决分布式架构转型过程中分布式服务领域遇到的一些挑战,譬如:大规模集群场景下万级连接网络高性能优化、ZK注册中心性能优化、深度定制多点接入等;以及工商银行在分布式服务领域的建设情况与规划。
Apache Doris 是一个基于 MPP 架构的高性能、实时的分析型数据库,以极速易用的特点被人们所熟知。截止目前 Apache Doris 已经成为全球大数据和数据库领域最活跃的开源项目之一,在本次的大会上我将分享 Apache Doris 从孵化到成为顶级项目的成长之一,并分享如何推动社区用户和开发者快速增长。
Developing fast scalable Big Data applications has been made significantly easier over the last decade with horizontally scalable open-source databases and streaming technologies such as Apache Cassandra and Apache Kafka. Cloud-native trends have also accelerated the uptake and ease of use of these technologies, and they are available as managed services on multiple cloud platforms.
But maybe it has become too easy to embark on building complex distributed applications using multiple massively scalable open-source technologies, as there are still many performance and scalability issues to be aware of.
In this talk, I will give a high-level overview of some of the performance and scalability challenges I’ve overcome over the last six years building realistic demonstration applications using Apache Cassandra and Apache Kafka (and more), supplemented with performance insights from our operation of thousands of production clusters.
随着AI大模型的快速崛起,我们不得不重新审视以社区贡献为本的经典开源治理方式——"Community Over Code"的精神是否还能适应新的挑战。同时,以开发者体验为中心的开发者关系也在向新的范式转变,以适应日渐壮大的开发者社区的需求和期望。本演讲将探讨在这个发展飞速的AI大模型新纪元,开源项目的不同角色如何应对可能遭遇的挑战,如何抓住其中的机遇,并思考未来开源商业模式可能的变迁。希望通过这次分享,我们能共同探讨开源社区在面对AI大模型来袭时如何保持"Community Over Code"的精神,并找到适应新时代的方法和路径。
For those of us who already know how important open source is, it can
be challenging to persuasively make the case to management, because we
assume that everyone already knows the basics. This can work against
us, confusing our audience and making us come across as condescending
or concerned about irrelevant lofty philosophical points.
In this talk, we take it back to the basics. What does management
actually need to know about open source, why it matters, and how to
make decisions about consuming open source, contributing to open
source, and open sourcing company code?
开源社区与开源技术为各行各业的发展带来强大动力,也为在开源社区探索的组织、社区、个人带来文化与思想的转变。在开源生态里,参与方的角色不同,所关注的问题、参与的方式、收获的价值存在较大差异。本次分享将主要聚焦在作为个体,如何看待开源社区及开源技术、如何在开源社区里找到个人的参与方式,从而发现更多兴趣、进行自我提升、并在职业发展中获得新的能量和价值。
闪电演讲每场5分钟, 大概安排 8 - 10 场演讲
This talk provides an overview of the Apache Software Foundation (ASF) and its incubation process. It guides projects on learning the Apache Way, ensuring compliance with licensing and intellectual property rights, and fostering community growth. The process involves creating a proposal, entering the incubator, focusing on community building and making releases, and eventually graduating as a top-level ASF project. Key aspects covered in this talk include, complying with licensing, engaging in open and transparent practices, and adopting a vendor-neutral approach. This presentation offers valuable insights for those interested in joining the ASF or seeking an understanding of the incubation process.
Tomcat作为业界最广泛使用的Web容器,在用户实际的使用过程中,由于使用场景各不相同,往往也会遇到千奇百怪的问题。如何利用APM工具链在业务出现问题的时候快速定位问题、优化性能是一个让所有工程师都头疼的问题。本次演讲将会给大家分享本人多年以来在服务阿里巴巴内部业务以及公有云用户的过程中,总结出的一整套Tomcat问题排查最佳实践,帮助您在业务出现问题可以做到宝典在手、遇事不抖
Federated query processing enables distributed query processing across multiple data sources, eliminating silos and improving data accessibility. It allows organizations to seamlessly query and analyze diverse databases or systems as a unified virtual database. By leveraging federated query processing, businesses gain deeper insights from distributed data sources, while data remains in its original location. This approach simplifies data integration, enhances governance, and empowers informed decision-making. In this talk, I will present how we can achieve federated cross-platform query processing with Apache Wayang. Apache Wayang (incubating) is a scalable cross-platform system that decouples applications with data processing platforms and hence it frees developers from developing applications for specific platforms. It provides an abstraction layer on top of existing data processing platforms, such as Apache Spark and Apache Flink, with the aim of enabling cross-platform optimization and interoperability. It automatically selects the best data processing platforms for a given task and also handles cross-platform execution. Apache Wayang comes with a cross-platform optimizer at its core to achieve this. To enable federated SQL analytics, we have built a library on top of Wayang that provides a unified SQL interface for cross-platform SQL processing. The SQL library allows users to embed SQL queries in their cross-platform applications. I will talk about how we utilize Apache Calcite to support cross-platform SQL. The major benefit of Calcite integration in Wayang is that of platform independence and opportunistic cross-platform data processing. Apache Wayang with Calcite integration leads to a powerful system capable of federated data processing in a platform-agnostic way.
Apache Druid 作为一款著名的 OLAP 分析引擎,从 2012 年年底的 0.1 版本开始,十年磨一剑,终于迎来了最新的 26.x 大版本,使得整个架构设计和性能水平都达到了前所未有的高度。本次演讲我将带着大家深入了解 Druid 的发展历程,以及最新版本所带来的强大功能。
追踪数据是一种用于分析微服务系统性能和故障的重要数据源,它记录了系统中每个请求的调用链路和相关指标。随着微服务系统的规模和复杂度的增长,追踪数据的量级也呈指数级增长,给追踪数据的存储和查询带来了巨大的挑战。传统的关系型数据库或者时序数据库往往难以满足追踪数据的高效存储和灵活查询的需求。 BanyanDB是一个专为追踪数据而设计的分布式数据库,它具有高扩展性、高性能、高可用性和高灵活性的特点。BanyanDB采用了基于时间序列的分片策略,将追踪数据按照时间范围划分为多个分片,每个分片可以独立地进行存储、复制和负载均衡。BanyanDB还支持多维索引,可以根据不同的维度对追踪数据进行快速过滤和聚合。 在本次演讲中,我们将介绍BanyanDB的设计思想、架构和实现细节,以及它在实际场景中的应用和效果。我们也将展示BanyanDB与其他数据库的对比和优势,以及它未来的发展方向和计划。
Why do you need another API to handle external traffic when you have the stable Kubernetes Ingress API and dozens of implementations? What problems of the Ingress API does the new Gateway API solve? Does this mean the end of the Ingress API? In this short talk, Navendu will answer these questions by exploring how Gateway APIs evolved and solved the shortcomings of the Ingress API with hands-on examples using Apache APISIX. Attendees will learn about the new Gateway API and how they can implement feature-rich, extensible, vendor-neutral gateways to their Kubernetes clusters with Apache APISIX.
字节跳动开源办公室首席布道师,前华为开源管理中心技术专家,Apache 软件基金2022,2023 年度董事,Apache软件基金会孵化器导师, 前红帽软件首席软件工程师,Apache 本地北京社群(ALC Beijing)发起人,有十余年企业级开源中间件开发经验,有丰富的Java 开发和使用经验。
Apache Member and Incubator Mentor