#100 Looking Beyond Compute Limits
Weighing in on AI Model Compute Thresholds; This Could Have Been an Email - HR in the Age of AI
Today, Rijesh Panicker comments on the increasing emphasis policymakers are putting on compute thresholds as a proxy for AI model capabilities. Rohan Pai follows that up with a short primer on how AI tools are being used in the area of Human Resources.
Also, we are hiring for the role of Research Analyst (Pakistan Studies). If you are keen on this opportunity, apply here!
Cyberpolitik 1: Weighing in on AI Model Compute Thresholds
— Rijesh Panicker
Representative Image (source)
California recently passed its AI bill, SB 1047. Based broadly on the framework of the US executive order on AI, SB 1047 also targets large AI models that use substantial computing and financial resources - specifically, those that use over 10^26 operations and cost over $100 million. This is similar to the limits imposed by the US executive order and higher than the compute threshold in the EU AI bill (10^25 operations).
On the one hand, compute thresholds act as a reasonable proxy for the risk of a model. Training compute, the total amount of operations used to train a model, is a good indicator of its capabilities. It is quantifiable, excludable (by restricting access to compute resources), verifiable through a third-party audit, and not easily circumvented (lowering the compute resources generally reduces the capability of a model). In addition, compute thresholds are applied at the pre-training stage, well before deployment, providing time to assess risks from such models.
Conversely, the correlation between compute thresholds and model capabilities may not hold. The threshold limits in current AI regulations apply to only the largest of models. It assumes that current scaling laws hold and seeks to limit any future danger from larger models. However, it is entirely possible that real-world harm could arise from smaller models already deployed in the world today. Better edge compute might make faster and more complex inferences possible, and better fine-tuning techniques might improve the ability of models to spread to different domains.
Recent examples of such diffusion include the RT-2 robotic transformer from Google (see here for a detailed explanation), which combined a vision language model with fine-tuning on a robotic dataset to comprehend and perform actions. One example involves moving a banana to Germany, where the model could recognise the object (banana) and the location (flag of Germany) and move the object to it simply by using the embedded knowledge in the pre-trained model. Similarly, self-driving car companies like Waymo and Wayve incorporate LLMs within their self-driving models. You can read more about both of these in this excellent update from Timothy Lee in this post.
We have also seen, with the launch of GPT-o1 (Strawberry), that LLM training itself is undergoing a change. As models gain the ability to reason recursively and use their own reasoning without prompting, it is quite possible that smaller models will be able to trade off longer operation times for lesser compute and still be performant.
The point of all this is simply to note that the development, use and diffusion of AI techniques and models into different domains and use-cases continue to accelerate. While compute thresholds may be a good initial step to regulate AI models, we will need tools across the AI value chain ranging from model development, to deployment, to application in order for regulation to be effective.
So, if compute thresholds, model auditing and red-teaming alone are insufficient, what else could we use? One possible solution could be OpenAI's model spec- a document that lists the principles and rules that are followed while developing models. What the model specification essentially allows us to do is discern intentional from unintentional model behaviour.
Unintended behaviour, most likely as a result of deliberate attempts to jailbreak a model, cannot be a developer's responsibility. In most cases, developers can only be reasonably expected to develop safeguards when a model's intended behaviour may cause misuse or harm. The advantage of this approach is that one can expect over time to see a model specification stack develop, each building on the earlier layer, creating safeguards and guidelines for each use case, at the appropriate part of the value chain.
Ultimately, top-down regulation alone will never create safe AI models, and only in combination with new approaches to building guardrails and safeguards at all levels - data, model building and deployment/inference will we be able to come up with a set of regulations that keep with progress in AI.
Cyberpolitik 2: This Could Have Been an Email - HR in the Age of AI
— Rohan Pai
At the 5th edition of the FICCI Innovation Summit last week, a new report was presented claiming that roughly 45% of companies in India have begun using generative AI in their HR processes. When human resources management, arguably a fundamental sector in the hospitality industry, begins to outsource its human-centric responsibilities to AI, analysing the benefits and pitfalls becomes paramount.
A primary responsibility of the HR department in any organisation is hiring, and this has shown time and time again to be an area that presents many opportunities for discrimination to creep in. Studies have found that unconscious human bias during the process of reviewing job applicants tends to place women and minorities at an acute disadvantage. Outsourcing this process, at least in part, to an AI-based system would potentially eliminate these deeply held biases and offer a more diverse pool of applicants a seat at the table. While this may appear at first glance, an easy fix to the problem, AI is only as unbiased as its training data. For example, research carried out by the MIT Media Lab demonstrated that certain facial-analysis software is far less sophisticated when it comes to analysing the features of dark-skinned females than it does with light-skinned males.
AI has also been touted as revolutionary when it comes to the oft-monotonous task of addressing queries. Flipkart, for instance, employs a generative AI chatbot, similar to ChatGPT, that claims to save a lot of time by instantly communicating personalised feedback to employees and acting as a middleman when they need to apply for leave or express a grievance. Although possibly more efficient than its human predecessor, such uses of AI come at the cost of depersonalising what may be very complex scenarios that require empathy and emotional intelligence. A crucial element in the department of HR, after all, is the “H” which necessarily involves face-to-face interactions between human beings.
Another widespread issue in the HR arena is fraudulent resumes, which witnessed a spike during the COVID-19 pandemic when hiring was conducted remotely. This manifested in a number of ways, such as exaggerating one’s past accomplishments, omitting criminal records and even impersonating someone else without their consent. In this regard, AI has proven to be a useful tool for screening multiple resumes at a time and detecting any irregularities whether that concerns factual information or the language in which an SOP was phrased. And as previously mentioned, AI-powered facial recognition technology will also empower HR departments to verify the authenticity of a candidate by analysing speech patterns during, say, a job interview.
At the end of the day, however, employers must be cognizant of the risks of excessive surveillance, such as a possible data breach. It is ultimately the responsibility of the organisation to protect sensitive employee information, so they must actively place safeguards that evolve alongside AI tools.
If you like the newsletter, you will love to read our in-depth research and analysis at https://takshashila.org.in/high-tech-geopolitics.
Also, we are hiring! If you are passionate about working on emerging areas of contention at the intersection of technology and international relations, check out the Staff Research Analyst position with Takshashila’s High-Tech Geopolitics programme here.
What We're Reading (or Listening to)
[Article] The ultra-selfish gene, by Mathias Kirk Bonde
[Opinion] One Year On, Should India Revisit its Drone Components Ban? by Satya Sahu and Anushka Saxena
[Opinion] Perils of decentralisation with Chinese characteristics, by Pranay Kotasthane and Manoj Kewalramani