All posts tagged: Measuring

Stealth Assessment: Measuring Training While It Takes Place

Stealth Assessment: Measuring Training While It Takes Place

Never got it. Never really got it. Why do we have tests at school? Why do we measure the performance of training after the training by means of a summative assessment? The ones given at the end of a term to measure mastery of the content that was taught. The multiple-choice tests, or short essay questions, that result in a good or bad grade. Sure, the lion’s share of students dislike taking pop quizzes, tests, and exams. I was one of those students. But even looking back as a former student, now being an instructor and trainer myself, in hindsight I still do not understand why we so often measure the success of training mainly through these final assessments. The primary reason why I don’t understand tests is that they generally do not give feedback that is timely or specific enough to improve learning while it is happening. Ultimately, the goal of the learning process is to learn as much as is needed to meet the goals of the training. The outcome of a test …

What a rare lensed supernova could mean for measuring cosmic expansion

What a rare lensed supernova could mean for measuring cosmic expansion

A burst of light in the deep sky is doing something it should not be able to do. It looks like one supernova, but it shows up as several copies at once, scattered around two foreground galaxies. The effect is not a telescope trick or a camera glitch. It is gravity, bending the path of the light so it reaches Earth along different routes, on different schedules. The object is SN 2025wny, nicknamed “SN Winny,” and it sits about 10 billion light-years away. It is also a superluminous supernova, a kind of stellar explosion so bright that it can still be detected from extreme distances. The team behind the work, from the Technical University of Munich (TUM), Ludwig Maximilian University (LMU), and the Max Planck Institutes for Astrophysics (MPA) and Extraterrestrial Physics (MPE), says the alignment is so unlikely that the odds of finding a similar event are below one in a million. That rarity is exactly why astronomers are excited. If they can measure the time gaps between the different images of the same …

We’re Measuring AI on the Wrong Ruler

We’re Measuring AI on the Wrong Ruler

Every debate about artificial intelligence (AI) seems to revolve around the same question: Is it smarter than we are? The subtitles of the questions might change, and the endpoints might be argued, but behind the cacophony of authoritative brilliance is a shared assumption—that intelligence lives on a single line. More of it on one end, less on the other. Humans are somewhere along that spectrum, and machines are moving toward us. But with all the discussion and debate, we rarely stop to examine the ruler itself. And the moment we ask whether AI is ahead of us, we have already accepted that we are measuring the same thing. The Illusion of a Shared Scale It’s understandable why we default to this handy ruler. Large language models create the very stuff of our humanity, from words to images. And this output clearly looks like thinking, and it is commonly better than what we humans produce. But let’s be careful not to get our hand slapped by that ruler in the process. Here’s what we need to …

How can we tell if citizen participation actually works? A new framework for measuring impact – Evidence & Policy Blog

How can we tell if citizen participation actually works? A new framework for measuring impact – Evidence & Policy Blog

Franziska Sörgel, Nora Weinberger, Julia Hahn, Christine Milchram and Maria Maia This blog post is based on the Evidence & Policy article, ‘Assessing the effectiveness of citizen participation: the development of an impact scheme’. Citizen participation has become central to research policy, yet we rarely ask the crucial follow-up question: what difference does it actually make? In our recent Evidence & Policy article, we propose an impact scheme that helps to move participation from a well-intentioned ritual to a practice with measurable, meaningful effects.    The last decade has seen an explosion of participatory formats designed to gather citizen and stakeholder feedback on science and innovation policy. From citizens’ assemblies to co-creation workshops, public dialogue has become the new punctuation mark in research agendas and beyond. Nevertheless, a fundamental problem persists: we lack systematic ways to measure whether these processes genuinely influence research priorities or merely provide a democratic façade with little real impact. This gap matters enormously for both research institutions that invest resources in participation and for citizens who provide their time and expertise.  At the Karlsruhe Institute …

Measuring the deep tech gender gap – POLITICO

Measuring the deep tech gender gap – POLITICO

A central output of the project is the Gender Gap in Investments Dashboard, developed by Dealroom. The dashboard is a prototype repository that already presents a clear picture of the current state of the gender investment gap using Dealroom data. It brings together information on company founding teams and venture funding outcomes across Europe in a single, accessible interface. The dashboard is not an endpoint. It is designed as a foundation that can, over time, incorporate additional data sources, improve coverage, and offer a more nuanced view of how gender, sector, funding stage and geography interact. The long-term ambition is to support the development of a credible, shared European data infrastructure on gender and investment. What the data show: Deep tech remains highly skewed Even at this early stage, the dashboard reveals persistent imbalances. Across Europe, startups with at least one woman founder raise just 14.4 percent of all venture capital (VC) rounds and 12 percent of total VC funding. In deep tech, the imbalance is even starker. Around 80 percent of deep-tech companies are founded by all-male …

Why measuring methane production matters

Why measuring methane production matters

Recent research highlights the advantages of C-Lock’s GreenFeed system for measuring methane production in dairy farming, emphasising that mass flux measurements provide more reliable data for research and decision-making than traditional concentration-based sensors. As the livestock industry moves toward more accurate methane measurement, one distinction is becoming increasingly important: the difference between methane concentration measurements and methane production measurements. While both can offer insights, they are not interchangeable – and new research shows why measuring methane as a mass flux provides a more reliable foundation for research, genetic selection, and on-farm decision making. A recent five-month study conducted in Switzerland evaluated methane ‘sniffer’ sensors installed inside an automated milking system (AMS) and compared them with C-Lock’s GreenFeed Emission Monitoring System — a widely used on-farm tool that measures methane production in grams per day. The results reinforce a key message: concentration-based measurements are highly sensitive to sensor placement and animal behaviour, while GreenFeed’s mass-flux approach offers greater consistency and confidence. Concentration vs production Most AMS sniffers measure methane concentration (ppm) in the air near a …

Enterprises are measuring the wrong part of RAG

Enterprises are measuring the wrong part of RAG

Enterprises have moved quickly to adopt RAG to ground LLMs in proprietary data. In practice, however, many organizations are discovering that retrieval is no longer a feature bolted onto model inference — it has become a foundational system dependency. Once AI systems are deployed to support decision-making, automate workflows or operate semi-autonomously, failures in retrieval propagate directly into business risk. Stale context, ungoverned access paths and poorly evaluated retrieval pipelines do not merely degrade answer quality; they undermine trust, compliance and operational reliability. This article reframes retrieval as infrastructure rather than application logic. It introduces a system-level model for designing retrieval platforms that support freshness, governance and evaluation as first-class architectural concerns. The goal is to help enterprise architects, AI platform leaders, and data infrastructure teams reason about retrieval systems with the same rigor historically applied to compute, networking and storage. Retrieval as infrastructure — A reference architecture illustrating how freshness, governance, and evaluation function as first-class system planes rather than embedded application logic. Conceptual diagram created by the author. Why RAG breaks down at …

Measuring Machine Intelligence Using Turing Test 2.0

Measuring Machine Intelligence Using Turing Test 2.0

In 1950, British mathematician Alan Turing (1912–1954) proposed a simple way to test artificial intelligence. His idea, known as the Turing Test, was to see if a computer could carry on a text-based conversation so well that a human judge could not reliably tell it apart from another human. If the computer could “fool” the judge, Turing argued, it should be considered intelligent. For decades, Turing’s test shaped public understanding of AI. Yet as technology has advanced, many researchers have asked whether imitating human conversation really proves intelligence — or whether it only shows that machines can mimic certain human behaviors. Large language models like ChatGPT can already hold convincing conversations. But does that mean they understand what they are saying? In a Mind Matters podcast interview, Dr. Georgios Mappouras tells host Robert J. Marks that the answer is no. In a recent paper, The General Intelligence Threshold, Mappouras introduces what he calls Turing Test 2.0. This updated approach sets a higher bar for intelligence than simply chatting like a human. It asks whether machines …