Kevin's Homepage

The ‘researcher’s stuff’

Sat, 23 Mar 2024 23:00:00 GMT

In the hope of trying to better understand the thing I pretend to do for a living, I have been reading Isabelle Stenger’s “Another Science is Possible: A Manifesto for Slow Science”. My goal is to better understand why I feel that science is seemingly less efficient and why academia is a, perhaps, an increasingly “special” place to work in.

Early in the book, she compares the “right stuff” NASA test pilots needed to have with what she calls the “researcher’s stuff”.

DALL E’s illustration of the “right stuff” and “researcher’s stuff”

In Tom Wolfe’s book the “right stuff” was the “stuff” the NASA pilots who survived had — and those who died didn’t have

In this fraternity, even though it was military, men were not rated by their outward rank as ensigns, lieutenants, commanders, or whatever. No, herein the world was divided into those who had it and those who did not. This quality, this it, was never named, however, nor was it talked about in any way.

Stenger rephrases this as

It is precisely this unacceptable degree of dependency that the expression hides: whatever flying coffin they were given to test, those who were killed didn’t have the right stuff.

and links this to working conditions in academia

Far from being treated as a primary resource that is now under threat, young researchers of either gender, doctoral students or postdocs, have to accept the realities of onerous working conditions and fierce competition. They are supposed to grin and bear it: the great adventure of human curiosity presented to them as children is replaced by the theme of a vocation that demands body-and-soul commitment. And this is what we accuse today’s young people of no longer accepting: compliance with the sacrifices that service to science demands.

While there is a lot to say about (working) conditions in academia and how the system in many parts failed to evolve, the link to “over objectivization”, which is perhaps very natural to many scientists, was more interesting to me. In an attempt to increase transparency and objectivity, “objective metrics” are being used to quantify how much “researcher stuff” a researcher has. However, those metrics do, of course, not work for every type of science (Stenger’s attempts to show that they stem from what she calls “fast sciences”). More importantly, however, we know from works such as the one from Kenneth Stanley and Joel Lehman that “greatness cannot be planned” as paths to great discoveries ofteen go via “stepping stones” we cannot anticipate and which optimization of “naiive” metrics would us not lead to.

There is empirical research that some things can be found more easily when not looking for it. This could, for example, be seen in the PicBreeder experiment where participants were asked to “breed” images.

From this point of view, viewing academia via the lense of the comic Company Hierarchy by Hugh MacLeod makes some sense. In many layers of academia we have the tendency to optimize for metrics (h index, citations, …) which is in this perspective the definition of the “clueless”. [Stenger also has an interesting tangent how this might be tight to current science education. In a Kuhnian perspective of paradigms and “normal science”, we are not really taught to question different ways of thinking, but rather focus in methodological details. Questioning different schools of thinking is perhaps more natural to the social sciences.]{:.aside}

Hugh MacLeod’s Company Hierarchy.

The Clueless cannot process anything that is not finite, countable and external. They can only process the legible.” Certainly this describes the behavior of faculty, literally counting lines on their CV, grubbing for citations, breathlessly calculating their h-index.

To help science, Stenger argues that scientists should start caring more about the broader relevance of their work and not forget, what relevance means in the end: Not bibliometric metrics but rather evaluation by the community

“if a scientific claim can be trusted as reliable, it is not because scientists are objective, but because the claim has been exposed to the demanding objections of competent colleagues concerned about its reliability”

Latter might sometimes correlate with bibliometric metrics but will not always do so. Simply because we rely on many different things (software, databases, …) that are created on very different timescales.

To me, Stenger really urges us to step out of the “ivory tower” and “appreciate the originality or the relevance of an idea but also pay attention to questions or possibilities that were not taken into account in its production, but that might become important in other circumstances”. This is also very important when we think about all the ways technologies can be misused. Stepping out of the ivory tower and taking society serious, however, probably also has to prompt us to rethink working conditions in academia.

In any case, I am very happy to see that new forms of doing science are being explored, because academia certainly is not the only and best way to do science.

Language I want to be more mindful of

Fri, 01 Mar 2024 23:00:00 GMT

While an idea meritocracy might be an ideal way to run science. Academia is not a meritocracy . Even worse, some of the language we (including myself) use might make some with great ideas feel unsafe and not welcome.

Some of the metascience works of Aaron Clauset give great evidence for that. For example, this talk.

Junior group leader

In some communities, the term “junior group leader” is quite common. Why is this suboptimal? The term “junior” might suggest to some colleagues or students that the group leader has significantly less expertise or authority compared to “senior” colleagues and reinforces hierarchical structures within academia.

A simple title such as “Research Group Leader” without the “junior” prefix can emphasize the role rather than the perceived hierarchy or experience level.

Before: “We need a junior group leader to handle the initial phase.”

After: “We’re looking for an independent research leader to spearhead the initial phase.”

This is a special case of seniority and age being more important in some societies than skill and accomplishment.

Gender

Gender is diverse and nothing we can assume based on names, roles, or societal expectations. If we can be more proactive in communicating in a way that makes people more respected, we can do so.

Before: “Each student must submit his or her proposal by next week.”

After: “All students must submit their proposals by next week.”

In academia we can also be more inclusive by being mindful of how we address people. Instead of using Mr or Ms we can simply address them using gender-neutral earned titles.

Before: “Dear Ms. Curie”

After: “Dear Dr. Curie”

Speaking of students as commodities

Cartoon illustrating the commoditization of students.

As team leader, one easily slips into language that strips students of their human nature and makes them seem like a commodity for the production of papers. However, it is important to realize that we all have been a “productive student” (or a less productive one) at points of our career.

Before: “We need to put more students on this to increase our output.”

After: “Let’s involve more team members to bring diverse perspectives and enrich our project.”

Authorship lists

Authorship is still the currency of academia. We currently indicate the “relevance” of each other by their position on the list of others on a paper. However, contributions are very diverse and cannot be easily rank-ordered (there are many dimensions and introducing a total order would require us to introduce some weighting of the different dimensions).

Before: Listing authors strictly by seniority, regardless of contributions.

After: Using contributorship statements that detail each author’s role, such as “A.B. designed the study and wrote the manuscript. C.D. conducted the experiments and analyzed the data.”

Multiple instances learning

Fri, 01 Mar 2024 23:00:00 GMT

Molecules or materials are dynamic. At realistic temperatures, there will always be an ensemble of different conformers. In addition, we typically do not deal with pure materials but more commonly with blends for which the exact structure is not known.

Multiple instances learning (MIL) is a framework that allows us to make predictions for such systems. For example, by thinking of molecules as bags of conformers or materials as bags of components of a blend.

Often, practioners already use without explicitly naming it. An overview over applications in chemistry can be found in Zankov et al.

The idea behind multiple instances learning

At its core, MIL is a variant of supervised learning that handles data grouped into bags, each containing multiple instances. In the context of chemical prediction, a “bag” might represent a single chemical compound, and the “instances” within could be different conformations, representations, or features of that compound. The distinctive aspect of MIL is that it assigns labels to bags, not to the individual instances they contain, making it particularly suited to scenarios where precise instance-level labels are hard to obtain or define.

It was formalized 1997 by a team around Thomas G. Dietterich with the goal of better drug-activity predictions.

Overview of multiple instances learning. A bag (e.g. molecule) consists of multiple instances (e.g. conformers or tautomers). The goal is to make predictions for each bag.

Approaches to MIL

There are different ways to perform MIL: At the instance-level or the bag-level

Instance-level MIL

The perhaps conceptually simplest way to perform MIL is to make a prediction for each instance and then aggregate the predictions.

One approach to MIL is to make a prediction for each instance and to then aggregate those predictions.

Conceptually, this is quite similar to Behler-Parinello Neural Networks. Here, we decompose a target, such as the energy, into atomic contributions and then make predictions for atomic energies and then add those up.

Behler-Parinello style models can be thought of instance-level MIL. We predict energies for each atom (instance) and then sum them up (aggregation) to obtain energies for the entire molecule (bag).

Bag-level MIL

Alternatively, one might obtain a representation for each instance and then make predictions based on aggregated representations. Note that this is not different from what we typically do in a graph-neural network: We obtain a representation for each atom using, for example, graph convolutions, then aggregate those (e.g. by taking the mean) abnd then perform the prediction over the full molecule (the bag). Also the fingerprint averaging methods for copolymers or polymer blends proposed by Shukla et al. can be seen as special case of MIL.

One can perform MIL by using representations for each instance in a learning algorithm. The simplest approach might be to average representations and to then feed them into a feedforward neural network.

If we use a more learnable pooling mechanism (e.g. attention-based), we can also attempt to find out what the most important instances are. This is known as key-instance detection.

Attention weighted aggregation might be used to identify key instances by identifying the largest attention weights

Specialized algorithms

Set comparisons based

Solving the MIL problem boils down to comparing sets. And there are various similarity measures for comparing set, which can then be implemented in distance-based algorithms such as SVM or kNN.

A common metric is the Haussdorff distance. In this metric

where is a distancve over the feature space of an instance in a bag . Essentially, the Haussdorff distance is the distance of the point from one set that is furthest away from any point in the other set, considering both directions. This ensures that the Hausdorff Distance captures the worst-case scenario — the greatest of all the distances from a point in one set to the closest point in the other set.

Diettrich’s original algorithm: Axis Parallel Rectangles (APRS)

The idea is to learn a “concept” in feature space as axis-parallel rectangle $$$ in which there is - at least one instance from each positive example - exclude all instances from negative examples

the prediction is then positive if a new is in the rectangle

Illustration of the axis-parallel rectangle approach. The filled shapes represent instances, the grey ellipses bags. The organe rectangle is the APR. Blue indicates negative instances, red ones postive ones. Each bag with at least one positive instance is labled as positive.

In the original article there are different algorithms for growing those rectangles. One rough implementation might look as follows:

Initialization: Choose a seed positive instance to start constructing the APR.
Grow APR: find the smallest APR that covers at least one instance of every positive molecule (i.e. bag). One can implement it greedly to add until there is at least one instance from every positive molecule. For addition, we choose the molecule that would lead to the smallest growth of the APR. This is run over a set of possible features.
Select Discriminating Features
- Evaluate each feature for its ability to exclude negative instances while including positive ones.
- Select features that provide the best discrimination between positive and negative instances.
Expand APR: The APR with the steps above is often too tight: “It is typically so tight that it excludes most positive instances in the test set”. Those, one can
- Apply kernel density estimation on each selected feature to determine the optimal expansion of the APR bounds.
- Adjust bounds to ensure a high probability of covering new positive instances and excluding negatives.
Iterate: Alternate between selecting discriminating features and expanding the APR until the process converges on a stable set of features and APR bounds.

References

Developing an intuition for backpropagation

Thu, 22 Feb 2024 23:00:00 GMT

Setting weights in neural networks

When we build neural networks, we tune weights to ensure that the outputs are close to what we want them to be.

The power of deep learning is that having many layers of weights allows us to learn very complex functions (i.e. mappings from input to output).

Here, we want to understand how to systematically tune the weights to achieve this.

Neural Network Visualization

Input:

Weight 1-1:

Weight 1-2:

Weight 2-1:

Weight 2-2:

Target Output:

Loss: 0.0000

When we think of the tiny neural network in the widget above one might think of many different ways for optimizing the weights (line strenghts) of this model.

Option 1: Randomly choose weights

One option you might try is to randomly try different weight values to then find one that minimizes the difference between ground truth and prediction (i.e., minimizes the loss). While we might be lucky for this toy example, we can imagine that it might take a long time until we guessed all the weights in a billion-parameter model (e.g. GPT-3) correctly.

Using a strategy like a grid search (in which you loop over a range of possible weight values for all weights) will also only work for small models (think of the combinations you would have to just try of 100 trial values for 4 weights).

Option 2: Using numerical gradients

When we think of our neural network, the loss forms a landscape, that can be very complex. In our simple example below, it looks as follows:

Code