On Monday, Tencent, the Chinese Internet giant known for its video game empire and its chat application WeChat, revealed a new version of its open source video generation model DynamiCrafter on GitHub. It's a reminder that some of China's biggest tech companies have been quietly ramping up their efforts to make a dent in the text and image-to-video conversion space.
Like other generative video tools on the market, DynamiCrafter uses the streaming method to transform subtitles and still images into videos lasting a few seconds. Inspired by the natural phenomenon of diffusion in physics, machine learning diffusion models can transform simple data into more complex and realistic data, in the same way that particles move from an area of high concentration to another of low concentration. concentration.
The second generation of DynamiCrafter produces videos with a pixel resolution of 640 × 1024, an upgrade from its initial release in October which featured videos of 320 × 512. An academic paper published by the team behind DynamiCrafter notes that its technology differs from its competitors in that it broadens the applicability of frame animation techniques to “more general visual content.”
“The key idea is to use the prior movement of text-video streaming models by incorporating the image into the generative process as a guide,” the article states. In comparison, “traditional” techniques “primarily focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluids) or domain-specific motion (e.g. human hair or motion from the body). »
In a demo (see below) that compares DynamiCrafter, Stable Video Diffusion (launched in November) and the recently highlighted Pika Labs, the result from the Tencent model appears slightly more animated than the others. Inevitably, the samples chosen would favor DynamiCrafter, and none of the models, after my first attempts, leave the impression that the AI will soon be capable of producing films in its own right.
Still, there are high hopes for generative videos as the next focal point in the AI race after the rise of generative text and images. Startups and incumbent tech players are therefore expected to invest resources in this area. This is no exception in China. In addition to Tencent, TikTok parent ByteDance, Baidu and Alibaba have each released their video delivery models.
The two ByteDance MagicVideo and that of Baidu UniVG have released demos on GitHub, although none appear to be publicly available yet. Like Tencent, Alibaba has achieved its VGen video generation model Open sourcean increasingly popular strategy among Chinese tech companies hoping to reach the global developer community.