At a talk in Haverford’s Lutnick Library, Dave Hansen, the Executive Director of Authors Alliance, discussed why open access to research is better for Artificial Intelligence (AI) model training, helping to enable more efficient use of AI in research, while protecting authors and the public from monopolies over information.
Hansen’s organization, Authors Alliance, is dedicated to the interests of academic authors and the open access to research and aims to advance their mission through the sharing of researchers creations on public domain platforms. After copyright infringement lawsuits over databases like Google Books opened discussions about open access to texts and research, that could lead to use in the training of AI models, open access became a divisive topic in academia. On one hand, some protest that their work, both fiction and nonfiction, being scanned onto databases to train AI models infringes on their intellectual property rights. But for many researchers and academics, scanning their work, particularly research that has beneficial applications in the public domain, increases accessibility and discoverability of their work. This is where Authors Alliance comes in, working to make open access beneficial for both researchers and users in sharing and accessing information in a shared digital space rapidly altered by AI. Their goal is to expand the accessibility of information with the advent of large language model (LLM) training via increased open access to academic works. This model would ultimately contribute to AI training towards more reliable data sets.
For many researchers, transferring their works’ copyrights to large corporations restricts open access to their work, preventing it from being incorporated into AI training. However, this form of copyright infringement protection often detaches the author from their work, giving the money that their work earns to the corporation rather than the original author. The author also does not get to control how their work is used and what research their work is referenced in. In the discussion of intellectual property rights, Authors Alliance aims to balance researchers’ control over their work while making information accessible to the public.
AI training, a heated debate across academic spheres, often uses academic works to develop models, potentially harming authors as their work is uncredited, thus decreasing visibility of the original author’s efforts. Hansen argues that controlling how information is revealed, including having only reliable sources training large language models, can improve the function of AI and reduce the risk of incorrect AI results, a major proponent of misinformation today. Increased policing of how AI models can be trained allows researchers to better understand where their information is going and users to better understand where their information comes from. Consequently, Authors Alliance aims to make improvements to LLMs that may make AI research and information easier to access and more reliable to use.
On AI usage in undergraduate education, Hansen says “The more that we can have quality scholarship incorporated into data sets…I think it will be as a learning tool because then it will point students to actual quality materials, versus whatever you happen to find on the Internet.” For students to use AI efficiently, Hansen points out certain software companies like Alan AI, which created “Semantic Scholar”; an AI chat interface that is solely focused on peer-reviewed literature.