nicovank's comments

nicovank · 2025-05-07T15:20:27 1746631227

This is in the works :) In the past and still today, OpenAI has much better and easier function call support, something we rely on.

We currently are running through LiteLLM, so while undocumented in theory other LLMs could work (in my experience they don't). I’m working on updating and fixing this.

nicovank · 2025-05-05T16:42:54 1746463374

Beyond saving tokens, this greatly improved the quality and speed of answers: the language server (most notably used to find the declaration/definition of an identifier) gives the LLM

1. a shorter path to relevant information by querying for specific variables or functions rather than longer investigation of source code. LLMs are typically trained/instructed to keep their answers within a range of tokens, so keeping shorter conversations when possible extends the search space the LLM will be "willing" to explore before outputting a final answer.

2. a good starting point in some cases by immediately inspecting suspicious variables or function calls. In my experience this happens a lot in our Python implementation, where the first function calls are typically `info` calls to gather background on the variables and functions in frame.

nicovank · on Oct 11, 2024

> Quality of Benchmark Implementations

Correct. "Selection of Benchmark Implementations" is a better name here. We'll update this in the next iteration. The point in this subsection is indeed that the selection is not adequate for comparison. This is not the only issue, even an adequate selection of perfectly idiomatic and identical implementations would not have resulted in accurate comparison.

> C/C++ Outlier

Correct, Section 4.5.2 details this. It is 8.9x for us.

> JS/TS Outlier

The main outlier on our machine is mandelbrot, 21x (Section 4.5.1). Our second outlier is n-body (not discussed).

igouy · on Oct 11, 2024

> would not have resulted in accurate comparison

Because? Is the reasoning for that spelled-out somewhere in the paper?

> Section 4.5.2

> Section 4.5.1

After the paper had discussed “Pereira et al.” I repeatedly confused discussion of your new measurements with discussion of the old “Pereira et al.” measurements.

> "forcing benchmarks to run on a single core" p2&3

> "we eliminate the effect of varying concurrency in different benchmark implementations by limiting benchmarks to execute on a single core" p6

> "the JavaScript version uses 28 cores on average" p14

fwiw I am now very confused.

emeryberger · on Oct 14, 2024

We only pin to one core for one experiment described in Section 4.3. All the remaining experiments are run with full access to all cores.

igouy · on Oct 15, 2024

Thank you.

I'm concerned that section 2.2.1 is a misreading of Pereira et al.

[29] "… the performance of a language is influenced by the quality of its compiler, virtual machine, garbage collector, available libraries, etc."

In that context it seems plain that "language" must be understood as a shortening of "language implementation."

> "For instance, Pereira et al. treat Ruby and JRuby as different languages, while they are in fact two separate implementations of the same Ruby language."

It seems to me that Pereira et al. treat Ruby and JRuby as different "language implementations" and compare each one independently against the other language implementations.

(In the "corpus of small benchmark implementations" it was simply convenient to keep separate programs for Ruby and JRuby.)

emeryberger · on Oct 15, 2024

Those papers say "language" over and over again, in the titles, in the body of the text. That work confounds languages and their implementations, and make it sound like there is a one-to-one connection between the two (of course, there is not necessarily such a correspondence).

With respect to Ruby vs. JRuby: my student just checked and verified that some but not all of the benchmarks are implemented differently (k-nucleotide, mandelbrot, pidigits, spectral-norm).

igouy · on Oct 16, 2024

> Those papers say "language" over and over again, in the titles, in the body of the text.

Yes they do! And over and over again in-context we sensibly read that to mean what you wish to term more precisely "language implementation".

> Fig. 4 Fig. 5 "We are in fact comparing implementations of programming languages, not the languages themselves."

They know. They just prefer shorter names.

Here's their short-name precise-name lookup table:

https://sites.google.com/view/energy-efficiency-languages/se...