Author: YBB Capital Zeke
Foreword
On February 16, OpenAI announced the latest text-controlled video generation diffusion model "Sora", which demonstrated another milestone moment in generative AI through multiple high-quality generated videos covering a wide range of visual data types. Unlike AI video generation tools such as Pika, which are still in the state of generating a few seconds of video from multiple images, Sora achieves scalable video generation by training in the compressed latent space of videos and images, decomposing them into spatiotemporal position patches. . In addition, the model also reflects the ability to simulate the physical world and the digital world. The 60-second demo finally presented is not an exaggeration to say that it is a "universal simulator of the physical world".
In terms of construction method, Sora continues the previous GPT model "source data- The technological path of "Transformer-Diffusion-Emergence" means that its mature development also requires computing power as an engine, and since the amount of data required for video training is much larger than the amount of data required for text training, the demand for computing power will further increase. However, we have already discussed the importance of computing power in the AI era in our earlier article "Preview of Potential Track: Decentralized Computing Power Market", and with the recent rise in the popularity of AI, there are already a large number of computing power projects on the market began to emerge, and other Depin projects (storage, computing power, etc.) that passively benefited have also experienced a surge. So in addition to Depin, what other sparks can the intersection of Web3 and AI create? What other opportunities does this track contain? The main purpose of this article is to update and complete previous articles, and to think about the possibilities of Web3 in the AI era.
The three major directions of AI development history
Artificial Intelligence is an emerging field that aims to simulate, expand and enhance human intelligence. Science & Technology. Since its birth in the 1950s and 1960s, artificial intelligence has experienced more than half a century of development and has now become an important technology that promotes changes in social life and all walks of life. In this process, the intertwined development of the three major research directions of symbolism, connectionism and behaviorism has become the cornerstone of the rapid development of AI today.
Symbolism
Also known as logicism or regularism, it is believed that it is feasible to simulate human intelligence by processing symbols. This method uses symbols to represent and operate objects, concepts and their interrelationships in the problem domain, and uses logical reasoning to solve problems, especially in expert systems and knowledge representation, which has made remarkable achievements. The core view of symbolism is that intelligent behavior can be achieved through the operation of symbols and logical reasoning, where symbols represent a high degree of abstraction from the real world;
Connectionism
Also known as the neural network method, it aims to achieve intelligence by imitating the structure and function of the human brain. This method achieves learning by building a network of many simple processing units (similar to neurons) and adjusting the strength of the connections between these units (similar to synapses). Connectionism particularly emphasizes the ability to learn and generalize from data, and is particularly suitable for pattern recognition, classification and continuous input-output mapping problems. Deep learning, as a development of connectionism, has made breakthroughs in the fields of image recognition, speech recognition and natural language processing;
Behaviorism
Behaviorism It is closely related to the research of bionic robotics and autonomous intelligent systems, emphasizing that intelligent agents can learn through interaction with the environment. Unlike the first two, behaviorism does not focus on simulating internal representations or thought processes, but rather on achieving adaptive behavior through cycles of perception and action. Behaviorism believes that intelligence is demonstrated through dynamic interaction and learning with the environment. This method is particularly effective when applied to mobile robots and adaptive control systems that need to act in complex and unpredictable environments.
Although there are essential differences between these three research directions, in actual AI research and applications, they can also interact and integrate to jointly promote the development of the AI field.
Overview of AIGC principles
Generative AI (Artificial Intelligence Generated Content, AIGC for short), which is currently undergoing explosive development, is a kind of connectionism Evolved and applied, AIGC is able to imitate human creativity to generate novel content. These models are trained using large data sets and deep learning algorithms to learn the underlying structures, relationships, and patterns present in the data. Generate novel and unique output based on user input prompts, including images, videos, code, music, designs, translations, question answers, and text. The current AIGC is basically composed of three elements: Deep Learning (DL), big data, and large-scale computing power.
Deep learning
Deep learning is a subfield of machine learning (ML). Deep learning algorithms are neural networks modeled after the human brain. For example, the human brain contains millions of interconnected neurons that work together to learn and process information. Likewise, deep learning neural networks (or artificial neural networks) are composed of multiple layers of artificial neurons working together inside a computer. Artificial neurons are software modules called nodes that use mathematical calculations to process data. Artificial neural networks are deep learning algorithms that use these nodes to solve complex problems.
Neural networks can be divided into input layers hierarchically. , hidden layer, output layer, and the parameters connected between different layers.
< strong>Input Layer: The input layer is the first layer of the neural network and is responsible for receiving external input data. Each neuron in the input layer corresponds to a feature of the input data. For example, when processing image data, each neuron may correspond to a pixel value in the image;
< span style="font-size: 18px;">Hidden Layer: The input layer processes data and passes it to further layers in the neural network. These hidden layers process information at different levels, adjusting their behavior as they receive new information. Deep learning networks have hundreds of hidden layers and can be used to analyze problems from many different perspectives. For example, if you are given an image of an unknown animal that you must classify, you can compare it to animals you already know. For example, you can tell what kind of animal it is by the shape of its ears, the number of legs, and the size of its pupils. Hidden layers in deep neural networks work in the same way. If a deep learning algorithm attempts to classify an image of an animal, each of its hidden layers processes different characteristics of the animal and attempts to classify it accurately;
Output Layer: The output layer is the last layer of the neural network and is responsible for generating the network Output. Each neuron in the output layer represents a possible output category or value. For example, in a classification problem, each output layer neuron may correspond to a category, while in a regression problem, the output layer may have only one neuron, whose value represents the prediction result;
Parameters: In neural networks, the connections between different layers Represented by weights and biases parameters, these parameters are optimized during training to enable the network to accurately identify patterns in the data and make predictions. The increase in parameters can improve the model capacity of the neural network, that is, the model's ability to learn and represent complex patterns in the data. But correspondingly, the increase in parameters will increase the demand for computing power.
Big data h4>
In order to be effectively trained, neural networks usually require large amounts of diverse, high-quality and multi-source data. It is the basis for machine learning model training and validation. By analyzing big data, machine learning models can learn patterns and relationships in the data to make predictions or classifications.
Large-scale computing power
The multi-layered complex structure of the neural network, a large number of parameters, big data processing requirements, and iterative training methods (in the training phase, the model needs to be iterated repeatedly , the training process requires forward propagation and back propagation for each layer calculation, including calculation of activation function, calculation of loss function, calculation of gradient and update of weight), high-precision computing requirements, parallel computing capabilities, optimization and The combination of regularization techniques and model evaluation and validation processes leads to its need for high computing power.
Sora
As the latest video generation AI model released by OpenAI, Sora represents a huge advancement in artificial intelligence's ability to process and understand diverse visual data. . By using video compression network and spatial-temporal patching technology, Sora is able to convert massive visual data captured by different devices from all over the world into a unified representation, thereby achieving efficient processing and understanding of complex visual content. Relying on the text-conditioned Diffusion model, Sora can generate highly matching videos or pictures based on text prompts, showing extremely high creativity and adaptability.
However, despite Sora's breakthrough in video generation and simulating real-world interactions , but still faces some limitations, including the accuracy of physical world simulation, consistency of long video generation, understanding of complex text instructions, and training and generation efficiency. And Sora essentially achieves a violent aesthetic through OpenAI's monopoly computing power and first-mover advantage, continuing the old technology path of "big data-Transformer-Diffusion-emergence", while other AI companies still have technological detours. Possibility of overtaking.
Although Sora has little relationship with the blockchain, I personally think that after in one or two years. Because of Sora's influence, it will force other high-quality AI generation tools to emerge and develop rapidly, and will radiate to many tracks such as GameFi, social networking, creation platforms, and Depin in Web3, so it is necessary to have a general understanding of Sora. How AI will be effectively combined with Web3 in the future may be a key point we need to think about.
Four major paths of AI x Web3
< p style="text-align: left;">
As mentioned above, we can know that there are actually only three underlying foundations required for generative AI: algorithm, Data, computing power, and on the other hand, AI is a tool that subverts the production method in terms of its versatility and generating effects. The biggest role of blockchain is twofold: reconstruction of production relations and decentralization. Therefore, I personally think that there are four paths that can be generated by the collision of the two:Decentralized computing power
Since related articles have been written in the past, this article The main purpose of this paragraph is to update the current situation of the computing power track. When it comes to AI, computing power is always an unavoidable factor. AI's demand for computing power is so great that it was unimaginable after the birth of Sora. Recently, during the 2024 World Economic Forum in Davos, Switzerland, OpenAI CEO Sam Altman bluntly stated that computing power and energy are the biggest shackles at this stage, and their importance in the future will even be equal to currency. . On February 10th, Sam Altman announced an extremely astonishing plan on Twitter to raise US$7 trillion (equivalent to 40% of China’s national GDP in 23 years) to rewrite the current global semiconductor industry pattern. Create a chip empire. When writing articles related to computing power, my imagination was still limited to national blockades and giant monopolies. Nowadays, it is really crazy for one company to want to control the global semiconductor industry.
So the importance of decentralized computing power is naturally self-evident. The characteristics of the blockchain can indeed solve the current problem of extreme monopoly of computing power and the expensive purchase of dedicated GPUs. From the perspective of AI needs, the use of computing power can be divided into two directions: inference and training. There are currently only a few projects focusing on training. From the need for decentralized networks to combined with neural network design, to the need for ultra-hardware High demand is destined to be a direction with extremely high threshold and extremely difficult to implement. The reasoning is relatively simple. On the one hand, the decentralized network design is not complicated, and on the other hand, the hardware and bandwidth requirements are low, which is considered a relatively mainstream direction at present.
The imagination space of the centralized computing power market is huge, which is often compared with "ten thousand The keyword “billion-level” is also the most frequently hyped topic in the AI era. However, judging from the large number of projects that have emerged recently, most of them are still rushing to the shelves to gain popularity. Always holding high the correct banner of decentralization, but keeping silent about the inefficiency of decentralized networks. Moreover, there is a high degree of homogeneity in design, and a large number of projects are very similar (one-click L2 plus mining design), which may eventually lead to a situation where it is difficult to get a share of the traditional AI track.
Algorithm and model collaboration system
Machine learning algorithms mean that these algorithms can learn rules and patterns from data and make predictions or decisions based on them. Algorithms are technology-intensive because their design and optimization require deep expertise and technological innovation. Algorithms are the core of training AI models and define how data is transformed into useful insights or decisions. The more common generative AI algorithms such as Generative Adversarial Network (GAN), Variational Autoencoder (VAE), and Transformer are each designed for a specific field (such as painting, language recognition, translation, and video generation). ) or in other words, it is born based on the purpose, and then a dedicated AI model is trained through the algorithm.
There are so many algorithms and models, each with its own merits. Can we Integrate it into a model that can be both civil and military? Bittensor, which has become very popular recently, is a leader in this direction. It uses mining incentives to allow different AI models and algorithms to collaborate and learn from each other, thereby creating more efficient and versatile AI models. Also focusing on this direction are Commune AI (code collaboration), etc. However, for current AI companies, algorithms and models are their own magic weapons and will not be borrowed at will.
So the narrative of AI collaborative ecosystem is very novel and interesting, and the collaborative ecosystem takes advantage of The advantages of blockchain can be used to integrate the disadvantages of AI algorithm silos, but it is still unclear whether corresponding value can be created. After all, the closed-source algorithms and models of leading AI companies have very strong ability to update, iterate and integrate. For example, OpenAI has developed in less than two years and has iterated from early text generation models to models generated in multiple fields. Projects such as Bittensor have made great progress in models and algorithms. The areas targeted may require new approaches.
Decentralized big data
From a simple perspective, using private data to feed AI and labeling data are very consistent with the blockchain In this direction, you only need to pay attention to how to prevent junk data and evildoing, and data storage can also benefit Depin projects such as FIL and AR. From a complex perspective, using blockchain data for machine learning (ML) to solve the accessibility of blockchain data is also an interesting direction (one of Giza's exploration directions).
In theory, blockchain data can be accessed at any time, reflecting the overall The state of the blockchain. But for those outside the blockchain ecosystem, accessing these massive amounts of data is not easy. Storing a blockchain in its entirety requires extensive expertise and a large amount of specialized hardware resources. To overcome the challenges of accessing blockchain data, several solutions have emerged in the industry. For example, RPC providers access nodes through APIs, while indexing services make data extraction possible through SQL and GraphQL, both of which play a key role in solving the problem. However, these methods have limitations. RPC services are not suitable for high-density usage scenarios that require large amounts of data query, and often fail to meet demand. At the same time, although the indexing service provides a more structured way of retrieving data, the complexity of the Web3 protocol makes it extremely difficult to build efficient queries, sometimes requiring hundreds or even thousands of lines of complex code to be written. This complexity is a huge obstacle for general data practitioners and those who don't understand the details of Web3 deeply. The cumulative effect of these limitations highlights the need for an easier way to obtain and utilize blockchain data that can promote wider adoption and innovation in the field.
Then through ZKML (zero-knowledge proof machine learning, machine learning for the chain is reduced) Burden) Combined with high-quality blockchain data, it may be possible to create a data set that solves the problem of blockchain accessibility, and AI can significantly lower the threshold of blockchain data accessibility, so over time, developers , researchers and enthusiasts in the field of ML will have access to more high-quality, relevant data sets for building effective and innovative solutions.
AI empowered Dapp
Since ChatGPT3 became popular in 2023, AI-empowered Dapp has been a very common direction. The extremely versatile generative AI can be accessed through API to simplify and intelligently analyze data platforms, trading robots, blockchain encyclopedias and other applications. On the other hand, you can also act as a chatbot (such as Myshell) or an AI companion (Sleepless AI), or even create NPCs in chain games through generative AI. However, because the technical barriers are very low, most of them are fine-tuned after accessing an API, and the integration with the project itself is not perfect, so it is rarely mentioned.
But after the arrival of Sora, I personally think that AI empowers GameFi (including Yuan Universe) and the direction of the creative platform will be the focus of attention next. Because of the bottom-up nature of the Web3 field, it will definitely be difficult to produce products that compete with traditional games or creative companies, and the emergence of Sora is likely to break this dilemma (perhaps in only two to three years). Judging from Sora's demo, it already has the potential to compete with micro-short drama companies. Web3's active community culture can also give birth to a large number of interesting ideas. When the limit is only imagination, the bottom-up industry and Top-down barriers between traditional industries will be broken down.
Conclusion
As generative AI tools continue to advance, we will experience more epoch-making "iPhone moments" in the future. Although many people sneer at the combination of AI and Web3, in fact, I think there are mostly no problems with the current direction. There are actually only three pain points that need to be solved, namely necessity, efficiency, and fit. Although the integration of the two is in the exploratory stage, it does not prevent this track from becoming the mainstream of the next bull market.
It is necessary for us to always maintain sufficient curiosity and acceptance of new things. In history, the transformation of cars replacing horse-drawn carriages has become a foregone conclusion in an instant. Just like inscriptions and NFTs in the past, holding too many prejudices will only lead to missed opportunities.