Link back to main page: Colin P. McNally
vibe-tuning: (noun) A research software engineering practice wherein a third-party library is re-optimized for a specific application and/or compute architecture by an LLM coding agent.*
While engaged in some computational problem-solving, you are often using a math library which provides sophisticated and verified implementations of the computations that take up most of the run time for your specific problem. But to make a library worth publishing it needs to have a wide applicability to many problems. Thus, in many cases, someone else made a choice between generality and performance.
The fun** thing is that it’s now sometimes ridiculously easy to re-introduce specialization to the library and win back performance if you have pinned down what specific problem you want to do.
The basic workflow for vibe-tuning goes something like this:
As an example, I tried looking at nearest-neighbour searches in 15-dimensional space with the nanoflann library. Vibe-tuning with Claude Code (Sonnet 4.6) gave a 1.43× speedup over the stock code (see the appendix). How well vibe-tuning works depends on how well optimized for a specific problem class and architecture the library already is. I also tried to speed up some specific prime lengths of FFT in FFTW, and certain matrix system solves (large circuit simulation matrices) with the SuiteSparse KLU solver, but didn’t get a meaningful speedup.
What happens next? The next coding agent training runs pick up this page and start vibe-tuning before you even asked. Then you’ll need to start guarding against your calculations running with unverified versions of libraries without you even noticing it. So… that’ll be fun.**
— 2026-03
Description generated by Claude Sonnet 4.6: The entire optimisation is confined to
nanoflann.hpp. No changes to application
or benchmark code are required. The patch adds approximately 60 lines to
the L2_Simple_Adaptor struct: a new static
evalMetricPtr that operates on a contiguous pointer pair, and
a modified evalMetric that gathers the 15 database-point
coordinates onto the stack before calling it. Both are guarded so that
all other scalar types and dimensionalities fall through to the original
loop unchanged.
@@ -46,6 +46,9 @@ #include <algorithm> #include <array> +#if defined(__AVX2__) && defined(__FMA__) +# include <immintrin.h> +#endif #include <atomic> @@ -618,9 +618,55 @@ { } + /* AVX2+FMA kernel: a[0..14] and b[0..14] are contiguous. + * Processes 4+4+4 doubles in __m256d registers, 3-element scalar tail. */ + static DistanceType evalMetricPtr(const T* a, const T* b, size_t size) + { +#if defined(__AVX2__) && defined(__FMA__) + if constexpr (std::is_same<T, double>::value) + { + if (size == 15) + { + __m256d d0 = _mm256_sub_pd(_mm256_loadu_pd(a), + _mm256_loadu_pd(b)); + __m256d d1 = _mm256_sub_pd(_mm256_loadu_pd(a+4), + _mm256_loadu_pd(b+4)); + __m256d d2 = _mm256_sub_pd(_mm256_loadu_pd(a+8), + _mm256_loadu_pd(b+8)); + __m256d s = _mm256_fmadd_pd(d0, d0, + _mm256_fmadd_pd(d1, d1, + _mm256_mul_pd(d2, d2))); + __m128d lo = _mm256_castpd256_pd128(s); + __m128d hi = _mm256_extractf128_pd(s, 1); + __m128d sum = _mm_hadd_pd(_mm_add_pd(lo, hi), + _mm_add_pd(lo, hi)); + double r = _mm_cvtsd_f64(sum); + double t12 = a[12]-b[12]; r += t12*t12; + double t13 = a[13]-b[13]; r += t13*t13; + double t14 = a[14]-b[14]; r += t14*t14; + return static_cast<DistanceType>(r); + } + } +#endif + DistanceType result = DistanceType(); + for (size_t i = 0; i < size; ++i) + { const T diff = a[i]-b[i]; result += diff*diff; } + return result; + } + DistanceType evalMetric( const T* a, const IndexType b_idx, size_t size) const { +#if defined(__AVX2__) && defined(__FMA__) + if constexpr (std::is_same<T, double>::value) + { + if (size == 15) + { + double b[15]; + for (size_t i = 0; i < 15; ++i) + b[i] = data_source.kdtree_get_pt(b_idx, i); + return evalMetricPtr(a, b, 15); + } + } +#endif DistanceType result = DistanceType(); for (size_t i = 0; i < size; ++i) {
* Other meanings may exist like for many English terms, and the idea may have been described before, as with many ideas.
** Other definitions of fun are available.