#54 A suspiciously sourced Llama, A Generative AI Task-Force and how plans of EU cyber-resilience fluster FOSS

Aug 16, 2023

If you would like to contribute to TechnoPolitik, please reach out to satya@takshashila.org.in

Course Advertisement: Admissions for the Sept 2023 cohort of Takshashila’s Graduate Certificate in Public Policy (Technology and Policy) programme are about to close! Visit this link to apply.

Cyberpolitik 1 : How Open is LLaMA 2 and How does it Matter?

— Bharath Reddy

In February 2023, Meta opened their large language model (LLM) called LLaMA for academic research, unleashing a wave of innovation. In July, the next version of LLaMA, named LLaMA 2, was released, now available for research and commercial use. However, compared to other open-source software packages or LLMs, Llama 2 is relatively closed off, prompting questions of whether the claim that LLaMA is open-source is merely a marketing ploy.

The release of LLaMA 2 did not adopt the typical open-source licences that allow users to freely use, copy, study, alter, and distribute the software. While LLaMA 2's licence does share some similarities with open-source licences, there are significant restrictions. For instance, a specific licence is necessary if an app or service with over 700 million active users intends to use the software, which restricts its use by potential competitors. Additionally, it is not permitted to use the model for training other LLMs, and there is a wide-ranging acceptable use policy that prevents the misuse of the model.

Furthermore, Meta has made the trained model accessible but has not disclosed the data used for training or the code utilised in the process. While many downstream applications have been built on top of the model, developers and researchers have limited access to examine the model itself. A study conducted by Radboud University researchers reveals that the LLaMA model fares poorly in terms of its availability, documentation, and accessibility methods.

Source

There are several motivations why Meta opened up LLaMA, but with restrictions. As Luke Sernau from Google points out, owning the ecosystem is extremely valuable. This strategy is similar to what Google has done with Chrome and Android. He states that "by owning the platform where innovation happens, Google cements itself as a thought leader and direction-setter, earning the ability to shape the narrative on ideas that are larger than itself".

The restrictions mentioned earlier allow the safeguarding of trade secrets related to model design and training while also enabling the models to be used for downstream applications. In addition, due to the potential for misuse of AI models, restrictions that are not typically included in open-source licences may be necessary. For instance, the Open Source Initiative answers "No" on restricting access to software for potential misuse. The Open Source Definition specifies that Open Source licences may not discriminate against persons or groups. Giving everyone freedom means giving evil people freedom, too. Fortunately, other laws constrain the behaviour of evil people.

LLMs are widely used in research, and modifying the base models can make it difficult to reproduce findings. Reproducibility is essential for peer review, and even small changes to the base models can result in significantly different results. The authors of AI Snake Oil point out that frequent updates to OpenAI's models make it difficult for researchers to reproduce the results of a paper. They observe that "if a researcher fails to reproduce a paper's results when using a newer model, there's no way to know if it is because of differences between the models or flaws in the original paper". By using open-source models like BLOOM, researchers would have direct access to the model instead of having to depend on technology companies, which would help to avoid any problems.

But for developers of downstream applications, the costs of using a proprietary model might be restrictive. For instance, open-source models have lowered costs and reduced hardware requirements in image generation, making high-quality models more accessible to everyone. This can help open up the playing field if it can reach parity with the quality of proprietary models. For many developers and users, that might be enough.

Cyberpolitik Explainer :EU Cyber Resilience Act - Impact on Open Source Projects

— Rijesh Panicker

The European Union's proposed Cyber Resilience Act (CRA) seeks to harmonize cybersecurity standards on products such as connected devices and software sold in the EU. It will supplement existing regulations such as NIS2 and other sectoral regulations that cover medical devices, aviation etc.

Some of the key aspects of this act include a requirement for all products to incorporate cybersecurity measures in their design phase, provide cybersecurity updates through the lifecycle of products and provide detailed information to consumers to enable them to make an informed choice. Products that meet this criteria will receive a CE marking and be allowed to sell in the common market.

There is a concern within the open source community that the act does not clearly exempt open source projects and developers from financial liability since it was not clear how commercial activity was being defined. The modified proposal, passed by the EU parliament in July, states," ... Free and open-source software is understood as free software that is openly shared and freely accessible, usable, modified versions and redistributable, and which includes its source code and modified versions. Free and open-source software is developed, maintained, and distributed openly, including via online platforms. A package manager, code host or collaboration platform that facilitates the development and supply of software is only considered to be a distributor if they make this software available on the market and hence supply it for distribution or use on the Union market in the course of commercial activity. Taking account of the elements mentioned above determining the commercial nature of an activity, this Regulation should only apply to free and open-source software that is supplied in the course of a commercial activity. "

This still does not assuage the concerns of the open-source community. While it clarifies the status of distribution platforms for open source, it does not clarify the status of open source developers themselves. Many developers on open source projects received grants, aid and donations that allow them to continue working on their projects. If their project becomes a commercial offering, will these developers now be considered part of the commercial activity since receiving payment? There is a clear risk that this will make it difficult for developers to contribute to open source projects, especially when some small aid or grant is involved or if it might create risk for their employer. Ultimately, this will lower the incentives to contribute to a project.

A second concern was the reporting requirements for exploited vulnerabilities to national cybersecurity agencies before countermeasures are in place, to be managed by the EU cybersecurity agency (ENISA) through a common platform. This, again, is similar to requirements in other countries, e.g. CERT-In has proposed similar requirements in India. The critical problem here is that reporting unpatched vulnerabilities and hosting them centrally makes them a key target for bad players. It goes against the usual practice today, where vulnerabilities are reported to the affected manufacturers and developers first, so they can create countermeasures before being reported widely.

Open-source software is now a vital part of the global digital infrastructure. Recent events such as the OpenSSL vulnerability from 2022 and reports of malware being found in spoofed node and Python packages make it clear that security risks that emanate from these can have vast and profound security implications. Just as git has evolved as the standard for collaboration and the GNU GPL license has created a template for how open source licenses, perhaps one way forward is for open source foundations like Apache, Eclipse, EFF and others to evolve a set of standards around what constitutes good cybersecurity design for an open source project. A widely accepted standard would make it easier for new developers to incorporate these easily into their projects and will likely be a base for future regulations.

Cyberpolitik 2 : Securing and Securitising Generative AI

— Anushka Saxena

Exploring technological front-footedness in the military domain, on August 10, 2023, the US Department of Defense announced the creation of a 'Task Force Lima' with the purpose of harnessing the power of Artificial Intelligence, specifically Generative AI tools such as Large Language Models (LLMs), and using them across verticals within the DoD. As per the announcement, the domains currently being reviewed for responsible induction of Generative AI technologies include specific ones, such as 'warfighting' and 'health', and relatively broader and vaguer ones, such as 'readiness' and 'policy'.

Commenting on the announcement of the Task Force, Deputy Secretary of Defense Dr Kathleen Hicks argued that the DoD's focus remains steadfast on ensuring national security, minimising risks, and responsibly integrating these [Generative AI] technologies. She further added: "The future of defense is not just about adopting cutting-edge technologies, but doing so with foresight, responsibility, and a deep understanding of the broader implications for our nation."

The Task Force has been operationalised under the mandate of the Chief Digital and Artificial Intelligence Office (CDAO) of the DoD, which was created over a year ago to make the US military and governance architecture a leading force in the analysis and adoption of critical and emerging technologies – especially AI. Since its launch in February 2022, the CDAO has been conducting in-depth research on subjects such as a warfighter's role in using AI responsibly, how to create a "trusted" AI ecosystem, and undertaking competitive big data analytics.

Further, the announcement has also been supplemented by a 9-page Memorandum explaining the Task Force's Mission, Goals, Deliverables and Leadership structure. The deliverables are particularly interesting because they reveal that the Task Force is meant to engage both domestic and international actors to turn the goals into reality. For example, it requires that within 15 days of the announcement, the subject matters determined by the Task Force leadership are to be consulted for "immediate questions and concerns" regarding LLMs. Within 60 days, an interagency and an international engagement plan have to be created on the subject of responsible adoption of Generative AI technologies.

Governments are increasingly feeling a need to securitise, regulate and control AI (the possibility of this happening effectively has already been debunked in our previous edition). With the blow-up of ChatGPT, the most popular Generative AI since last year, and the evolving nature of the US-China technological contestation, it is understandable that the US DoD wants to stay ahead of the game, and the Task Force is a manifestation of that requirement. Moreover, in May 2023, testifying at a hearing of the US Congress, former CEO of Google, Dr Eric Schmidt, argued that even though the US is "slightly ahead" of China by a few years in critical areas such as AI and quantum computing, "there's every reason to think they have more people working on strategic AI." For this reason, the Joe Biden government has made it a key policy priority to restrict China's access to American cutting-edge technologies and data on US-based innovations. This also explains why just a day before the announcement of the Task Force, Biden signed an Executive Order prohibiting investment by US industrialists in Chinese firms that may enable the latter's advancement in chips, microelectronics, quantum information technologies, and artificial intelligence.

Overall, this is a new field for military experts to be exploring, and the legal implications of data protection, cybersecurity, and intellectual property will be revealed in time with the actual implementation of the Task Force's mandate. For now, industrial experts and authorised government personnel have been asked to make submissions on potential use cases of LLMs in the DoD, and further deliberations will inform which ones emerge as key.

What’s on our Radar this Week?

[ Blog ] Laptop licence: Why are failed policies being revived again?, by Pranay Kotasthane

[ Article ] As US-China trade war heats up, a new era of govt-driven industrial policy is taking shape, by Keshav Padmnabhan

[ Interview ] How to read the People’s Daily: Q&A with Manoj Kewalramani and Jonathan Landreth

#54 A suspiciously sourced Llama, A Generative AI Task-Force and how plans of EU cyber-resilience fluster FOSS

Cyberpolitik 1 : How Open is LLaMA 2 and How does it Matter?

Cyberpolitik Explainer :EU Cyber Resilience Act - Impact on Open Source Projects

Cyberpolitik 2 : Securing and Securitising Generative AI

What’s on our Radar this Week?

Discussion about this post