#63 Training (our sights on) Artificial Intelligence Governance
Why India should Scrutinise the US’ New AI Rulebook, Whose Data is it Anyway?
Today, Rijesh Panicker dissects the US Executive Order on AI from an Indian perspective, while a guest post by Deepanker Koul attempts to answer teething concerns about data ownership.
Course Advertisement: Admissions for the Sept 2023 cohort of Takshashila’s Graduate Certificate in Public Policy (Technology and Policy) programme are now open! Visit this link to apply. Apply by 3rd Dec 2023 to avail the early bird scholarship!
CybperPolitik 1: Why India should Scrutinise the US’ New AI Rulebook
— Rijesh Panicker
At the AI safety summit in the UK on the 1st of November, 29 nations - the US, China and India among them, came together to sign the Bletchley declaration on AI safety, establishing a common understanding of the risks from frontier AI and the need for governments to work together to meet the significant challenges. The document recognizes that "... Many risks arising from AI are inherently international in nature, and so are best addressed through international cooperation." Even as it sets out an agenda for identifying AI risks of shared concern, the declaration accepts that approaches may differ across countries based on circumstances.
President Biden's executive order (EO) on AI, issued just two days earlier, gives us a first look at the US approach to AI regulation. With a focus on all AI systems, not just generative AI and recommendations across all industries, from mature AI implementers to startups, the order sheds light on how regulation might play out in the US and possible global opportunities and concerns for regulators.
Key Highlights of the order
The National Institute of Standards and Technology (NIST) and the Department of Commerce will develop guidelines and best practices for "... developing and deploying safe, secure and trustworthy AI systems". In addition, NIST will also establish standards and procedures for conducting AI red-teaming tests that identify potential flaws and vulnerabilities.
The order mandates studies by various departments such as the Department of Energy, Homeland Security and Department of Defense to study risks arising to critical infrastructure and in critical sectors like finance, cyber defence and healthcare from AI models while also studying how AI models themselves could be used to defend against these threats. The inputs from these studies will drive further policy-making in these areas.
To promote technologies that identify, authenticate and trace AI-generated content, the order has directed studies over the next year of the existing tools and techniques to detect AI-generated content and track provenance. Guidance will eventually be developed for digital content authentication, synthetic content detection, and labelling and classification. Similarly, the executive order directs the US Patents Office to issue recommendations and guidelines for copyrighting AI-generated work and using copyrighted work by AI models.
Several recommendations are a positive for consumers worldwide, such as the standards around AI-generated content. Given the ability of the US government to impose these standards on large AI players, we expect these to become global standards and improve the safety and quality of AI models.
What should India worry about?
From an Indian perspective, three specific recommendations of the executive order deserve closer scrutiny and thought.
First, the order directs that regulations be proposed requiring Infrastructure as a Service (IaaS) providers to notify the commerce department "when a foreign person transacts with that US IaaS provider to train a large AI model with potential capabilities that could be used in malicious cyber-enabled activity". US IaaS providers will also be prohibited from allowing foreign companies to resell capacity or open accounts unless they follow similar disclosures.
For any non-US government player wanting to research or build AI models, particularly those with dual-use, these provisions will mean review and oversight of their IP by the US government. In this context, the India AI report's recommendation to build a sovereign compute infrastructure may well be the prudent choice.
Second, concerning the deployment of dual-use foundational models (models trained on large datasets and applicable in a wide range of contexts), the order directs the Department of Commerce to establish requirements for companies "developing or demonstrating an intent to develop dual-use foundation models" to report certain information such as model weights, datasets and activities such as red team testing and risk analysis to the government. Firms will also be required to provide the federal government with information about the use of any large-scale computing cluster.
This specific mandate invokes the Defense Production Act to impose these mandates, and the US government effectively gains a right of refusal on any cutting-edge AI model, preventing any from being accessed by the market.
Similarly, we should expect policy and regulatory recommendations for open-source foundational models (models with widely available model weights) over the next few months. Depending on how the specific recommendations are created, open-source models, particularly those deemed dual-use, may no longer be widely available for research. This is likely to have negative consequences for smaller players and developers and is expected to negatively impact India's ability to catch up to the frontier of research.
The US executive order will be the cornerstone of AI policy for the government, and the next 6-9 months will see further clarity and specificity in the standards. From an Indian perspective, constraints around dual-use models, whether close source or open source, will hamper our ability to learn and scale, and we will need to ensure access to these models, either by building our own or by becoming preferred partners to the firms that own these models. In addition, requirements around cloud providers having to report the usage of compute resources by foreign players raise concerns that models being built by Indian players may face possible review by external governments. India's own AI policy measures will need to respond and adapt to these regulatory changes, especially from a global standard-setting perspective.
Cyberpolitik 2: Whose Data is it Anyway?
— Deepanker Koul
Do users really own their data?
Privacy advocates and governments worldwide, further the idea that users and not tech firms that own the data; I beg to differ.
Data ownership has become a contentious issue with the rise of the digital world. While passionate advocates make a case that data belongs to users and government policies seem to follow suit, at least in spirit, I intend to make a case for the opposite: "Users are not sole owners of their digital data".
This question has arisen primarily because of the intricate relationship between users and tech giants, where user data fuels the development of Machine Learning (ML) models, shaping the competitive edge of these Big Tech firms.
However, the discourse surrounding data ownership is far from straightforward; it is a complex interplay of user consent, behavioural metadata, privacy concerns, and the need for innovation.
"Users are not sole owners of their digital data".
Background
In recent years, Big Tech has been increasingly building and deploying ML models, often using user data. This grants them a competitive edge, which can stifle user choice and force users to rely more on these services, binding them into an inescapable cycle.
But before jumping to the question of ownership, we must first recognize the multifaceted nature of the data in question. Some information, such as photos, videos, texts, etc., are explicitly and willingly shared by the users on the platforms. This data exists independent of the platform, and users deserve exclusive ownership of this data.
However, a lot of the edge that ML models provide to these companies comes from behavioural data that users don't generate independently of the platform; instead, it is derived from user interactions with the platform.
For instance, platforms like YouTube or Netflix can infer our preferences when we explicitly upvote or downvote certain content. Still, even when users don't explicitly state their preferences, these platforms can gauge the interest sometimes better than users themselves.
This is done through behavioural data, which comes into existence not just because of the content the service provides you but also because you interact with the service in a specific manner. The platform makes an explicit effort to capture and interpret the interaction.
This raised the question: can we unequivocally assign ownership of this behavioural data to users?
While these platforms rely on user interactions to generate valuable insights, it is a collaborative effort between the user and the platform. Our preferences are inferred through our actions, which we might not have taken in the absence of the platform and that we might not even be consciously aware of. This blurred line challenges the traditional notion of ownership, complicating the narrative surrounding user data.
Consequences of Misdirected Ownership
Mislabelling the ownership of behavioural data comes with profound implications. Ownership implies control, yet in the context of behavioural data, users often lack the means to effectively manage or protect this information.
Even if this data were somehow transferred to users, many would find it unmanageable, leading to a surplus of unutilized, unsecured data. Moreover, the financial burden associated with data ownership, including storage and security costs, poses a significant challenge, especially for ordinary users.
If ownership is still conferred absolutely to users, to limit the collection of such data, we risk punishing these firms for a victimless crime. They are collecting data that they helped generate, and users have no use or means of benefiting from such data and yet firms get penalized for storing and utilizing the data.
This insistence on punishing firms through misallocated ownership might not necessarily limit data collection; instead, it might create perverse incentives for firms to get the data.
Some arguments have claimed that users should get paid for the data, but that would imply tracing owners of data, which would open a different can of privacy-related worms.
Even if we can protect privacy and hand over the data to users, the value of a single data point is worthless without being aggregated as a part of a dataset. So, the idea that a user can monetize their data as an act of empowerment is misdirected, ignoring the dehumanization of an individual that it implies.
In the pursuit of rectifying perceived data ownership issues, there is a risk of stifling innovation and progress, particularly in the realm of machine learning.
Restricting access to data impedes the development of sophisticated models, hindering the very technological advancements that could benefit society. Innovation thrives on data-driven insights, and limiting data availability curtails the potential for groundbreaking discoveries and solutions.
Ensuring Privacy: A Balancing Act
Amidst this complexity, finding a solution requires a delicate balancing act that respects user privacy while fostering innovation. The current discourse often pits privacy against progress, portraying tech companies as malevolent entities infringing on user privacy rights. However, a nuanced approach is necessary, one that acknowledges the diverse expectations users have regarding privacy and data usage.
Users, based on their roles and needs, value their data differently. A journalist might prioritize the confidentiality of their chat data. At the same time, a teenager seeking access to a service they cannot pay for might have a different value of their data. Recognizing these variations is essential in crafting effective policies.
Rather than a one-size-fits-all approach, tailored solutions that align with user expectations are key to ensuring privacy without impeding progress.
Tech companies, often vilified in this narrative as capitalistic bandits in a wild west of "surveillance", should instead be viewed as partners in enforcing societal and constitutional norms. Instead of succumbing to the simplistic notion that 'privacy is dead,' we must strive for a more nuanced understanding.
Governments also play a pivotal role. Firms that might want to prioritize user privacy as a business model are currently disincentivized as governments want to continue pressuring tech companies to relinquish data in the name of national security. They have often fought with companies when they have tried to preserve user privacy.
This creates an imbalance in the market in favour of firms willing to undermine user privacy, fully aware of the implicit nod of the state through the reliance on security apparatus thereby undermining the very principles governments aim to protect.
The Path Forward: A Transparent Dialogue
Finding the right balance between surveillance and privacy is a complex task. It demands an honest, transparent dialogue among all stakeholders. Tech companies, users, and governments must engage in open conversations, acknowledging their desires and limitations. Striking the perfect equilibrium requires a departure from idealistic rhetoric and embracing pragmatism.
In his book Tools and Weapons, Brad Smith of Microsoft emphasizes the desire of tech companies to side with users and build privacy as a differentiator in their services.
However, this promise remains hollow if governments continue to exert pressure, compromising user privacy under the pretext of national security. Achieving a harmonious coexistence between privacy and progress necessitates a re-evaluation of our approach.
Embracing the complexity of data ownership is essential for a brighter, more innovative future. Recognizing the collaborative nature of data generation, understanding the diverse value users place on their data, and fostering open dialogues among stakeholders are pivotal steps toward a solution.
As we navigate this intricate terrain, we must move beyond simplistic narratives and confront the nuanced reality of data ownership. By doing so, we can forge a path that safeguards user privacy, encourages innovation, and upholds the ethical principles that underpin our digital society. Through this nuanced understanding and collaborative effort, we can truly unlock the potential of data in the digital age.
** Deepanker Koul is an AI/ML Product Development Manager and an alumnus of the Post Graduate Programme (Public Policy) from the Takshashila Institution. **
What We're Reading (or Listening to)
[Newsletter] Takshashila Geospatial Bulletin (Unveiling the dragon's footprint: Geospatial intelligence of recent developments in the Doklam plateau)
[Video] Global Express | The Biden -Xi meet: China trap? Or lukewarm re-engagement? ft. Amit Kumar and Hemanth Adlakha
[Opinion] Biden-Xi Summit: Challenge is to continue US-China engagements despite tensions, by Amit Kumar and Manoj Kewalramani