All posts tagged: API

OpenAI launches new voice intelligence features in its API

OpenAI launches new voice intelligence features in its API

OpenAI said Thursday that its API will now include a number of new voice intelligence features designed to help developers create apps that can talk, transcribe, and translate conversations with users. The company’s new GPT‑Realtime‑2 is another voice model, built to create a realistic vocal simulation that can converse with users. However, unlike its predecessor (GPT-Realtime-1.5) this one is built with GPT‑5‑class reasoning that OpenAI says was created to deal with more complicated requests from users. The company is also launching GPT‑Realtime‑Translate which, just as it sounds, is designed to provide real-time translation services that “keep pace” with the user, conversationally. The feature includes more than 70 input languages (that is, the languages that it can comprehend) and 13 output languages (the languages it relays to the speaker). Finally, the company has also launched a new transcription capability, GPT-Realtime-Whisper, which gives users live speech-to-text capabilities that are captured as interactions occur. “Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, …

OpenAI is shutting down Sora, its powerful AI video model, app and API

OpenAI is shutting down Sora, its powerful AI video model, app and API

OpenAI is shuttering Sora, its stand-alone AI video generation app and social network, and the availability for developers to access the Sora 2 video model family through its application programming interface (API) to rely on it for their own products or video generation pipelines. The announcement came abruptly this afternoon with OpenAI posting a message on X and not giving an exact shutdown date for the services, instead promising “timelines for the app and API and details on preserving your work.” Sora wowed the world with its highly realistic scene-crafting when it was first previewed by OpenAI in February 2024, more than two years ago now, only to be released to mixed reception as an updated Sora Turbo model 10 months later, at which point, many other competing video AI model providers such as Runway, Luma, and Chinese AI companies Kling and Minimax had already shipped impressive rivals. But OpenAI seemed intent on continuing to build out the video model and enabling creators until just now, releasing a Sora 2 model over API and apps …

Top permitting-reform senators meeting as talks thaw: API chief

Top permitting-reform senators meeting as talks thaw: API chief

U.S. Sen. Shelley Moore Capito (R-WV) speaks to the media following the weekly policy luncheons at the U.S. Capitol on June 21, 2023 in Washington, DC. Kevin Dietsch | Getty Images Senate Environment and Public Works Committee Chair Shelley Moore Capito and ranking Democrat Sheldon Whitehouse are meeting to discuss reforming the federal energy permitting process, American Petroleum Institute president Mike Sommers told CNBC. “We have both the Republican in the United States Senate … Shelley Moore Capito, and the Democrat Sheldon Whitehouse, who [are] responsible for permitting reform, finally meeting again to discuss how we get this done this year,” Sommers told CNBC. A person familiar with the negotiations, granted anonymity to discuss the details, said they “will be talking frequently this week about permitting since negotiations are back on,” but was unaware of any designated meeting time. The person noted committee staff are negotiating regularly. Capito, R-W.V., and Whitehouse, D-R.I., talking regularly comes after Democrats publicly announced a thaw in permitting talks. Democrats walked away from talks last year after the Trump administration ordered …

OpenAI upgrades its Responses API to support agent skills and a complete terminal shell

OpenAI upgrades its Responses API to support agent skills and a complete terminal shell

Until recently, the practice of building AI agents has been a bit like training a long-distance runner with a thirty-second memory. Yes, you could give your AI models tools and instructions, but after a few dozen interactions — several laps around the track, to extend our running analogy — it would inevitably lose context and start hallucinating. With OpenAI’s latest updates to its Responses API — the application programming interface that allows developers on OpenAI’s platform to access multiple agentic tools like web search and file search with a single call — the company is signaling that the era of the limited agent is waning. The updates announced today include Server-side Compaction, Hosted Shell Containers, and implementing the new “Skills” standard for agents. With these three major updates, OpenAI is effectively handing agents a permanent desk, a terminal, and a memory that doesn’t fade and should help agents evolve furhter into reliable, long-term digital workers. Technology: overcoming ‘context amnesia’ The most significant technical hurdle for autonomous agents has always been the “clutter” of long-running tasks. …

Gemini API Web Scraper for HTML, JSON, XML, and Image Links

Gemini API Web Scraper for HTML, JSON, XML, and Image Links

What if extracting data from PDFs, images, or websites could be as fast as snapping your fingers? Prompt Engineering explores how the Gemini web scraper is transforming data extraction with unparalleled speed and precision. Imagine parsing through dense financial overviews, extracting text from images, or gathering structured data from complex web pages, all in just seconds. This isn’t just a productivity boost; it’s a fantastic option for developers tired of juggling clunky, outdated methods. With its seamless integration into the Gemini ecosystem, this scraper promises to simplify workflows while delivering results that are both accurate and efficient. In this overview, we’ll break down how the Gemini web scraper works and why it’s becoming an essential part of modern development. You’ll uncover its ability to handle diverse formats, from HTML and JSON to PDFs and images, and learn how its dual approach to data retrieval balances speed with reliability. Whether you’re curious about its advanced document understanding or its compatibility with external platforms, this overview will help you see how it can elevate your projects. By …

Open Responses API Unifies Open Model Tools with One Standard

Open Responses API Unifies Open Model Tools with One Standard

What if the fragmented world of open AI models could finally speak the same language? Sam Witteveen explores how the newly introduced “Open Responses” is a new and open inference standard. Initiated by OpenAI and built by the open source AI community, with backing from the Hugging Face ecosystem. Open Responses is based on the Responses API and is designed for the future of Agents. For years, the lack of standardization has been a thorn in the side of AI innovation, forcing developers to wrestle with inconsistent APIs and compatibility headaches. But with features like multimodal inputs and reasoning token support, Open Responses promises to unify this chaotic ecosystem, making it easier than ever to build, scale, and innovate with AI. In this overview, we’ll break down what makes Open Responses a fantastic option and why it’s already gaining traction among major players like Hugging Face and Vercel. You’ll discover how its streamlined design eliminates complexity, fosters interoperability, and enables developers to focus on creating impactful solutions rather than navigating technical barriers. Whether you’re curious …

Why “which API do I call?” is the wrong question in the LLM era

Why “which API do I call?” is the wrong question in the LLM era

For decades, we have adapted to software. We learned shell commands, memorized HTTP method names and wired together SDKs. Each interface assumed we would speak its language. In the 1980s, we typed ‘grep’, ‘ssh’ and ‘ls’ into a shell; by the mid-2000s, we were invoking REST endpoints like GET /users; by the 2010s, we imported SDKs (client.orders.list()) so we didn’t have to think about HTTP. But underlying each of those steps was the same premise: Expose capabilities in a structured form so others can invoke them. But now we are entering the next interface paradigm. Modern LLMs are challenging the notion that a user must choose a function or remember a method signature. Instead of “Which API do I call?” the question becomes: “What outcome am I trying to achieve?” In other words, the interface is shifting from code → to language. In this shift, Model Context Protocol (MCP) emerges as the abstraction that allows models to interpret human intent, discover capabilities and execute workflows, effectively exposing software functions not as programmers know them, but …

Build Google AI Studio Apps with Google Forms API & Sheets

Build Google AI Studio Apps with Google Forms API & Sheets

What if you could create a fully functional app, complete with data storage, file hosting, forms, and sharing, without ever needing a backend? In this overview, Your AI Workflow explores how Google AI Studio turns this concept into reality. By harnessing browser-based technologies like IndexedDB and integrating advanced APIs, developers can build dynamic, feature-rich applications that run entirely on the front end. From crafting a task manager with local data storage to designing a multimedia platform with seamless file hosting, this approach eliminates the complexity of server-side infrastructure while delivering powerful user experiences. This feature dives into how Google AI Studio redefines app development by blending simplicity with innovation. With streamlined data collection using Google Forms API and shareable links powered by LZ-String, the platform offers practical solutions for developers of all skill levels. Whether you’re looking to simplify workflows or explore new creative possibilities, these techniques open doors to a world of backend-free development. How far can these capabilities go, and what challenges might arise? As we provide more insight into the insights shared …

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were immediately subverted by a wave of public ridicule about Grok's responses on the social network X over the last few days praising its creator Musk as more athletic than championship-winning American football players and legendary boxer Mike Tyson, despite having displayed no public prowess at either sport. They emerge as yet another black eye for xAI's Grok following the "MechaHitler" scandal in the summer of 2025, in which an earlier version of Grok adopted a verbally antisemitic persona inspired by the late German dictator and Holocaust architect, and an incident in May 2025 which it replied to X users to discuss unfounded claims of "white genocide" in Musk's home country of South Africa to unrelated subject matter. This time, X users shared dozens of examples of Grok alleging Musk was stronger or more performant than elite athletes and a greater thinker than luminaries such as …