Coder Social home page Coder Social logo

sadkowsk / native-lands-locator Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 310 KB

Native Lands Locator (NatLLo) is a custom GPT hosted by OpenAI that matches your inputs' geographic references with nearby native and indigenous lands.

Home Page: https://chat.openai.com/g/g-KP0esL5Q8-native-lands-locator-natllo

License: MIT License

chatgpt custom-gpt indigenous-peoples native-peoples openai

native-lands-locator's Introduction

Native Lands Locator (NatLLo)

Native Lands Locator (NatLLo) is a custom GPT hosted by OpenAI that matches your inputs' geographic references with nearby native and indigenous lands. After identifying geographic entities and relating them to nearby native lands, it then explores the cultural and historical connections of the two. NatLlo has implications for further development using other models and approaches.

1. Overview

1-A. Motivation

Popular references to geographic locations rarely acknowledge these locations' belonging, past or present, to native and indigenous peoples. NatLlo enables users to explore the potential relationships between their input materials with native and indigenous peoples and their lands. In doing so, it aims to counter their historical invisibility and marginalization, fostering appreciation of their cultural and ecological significance. Ultimately, NatLlo promotes more locally inclusive considerations of place.

1-B. Contributions

  • Research: no comparable fine-tuned models currently on Hugging Face or elsewhere
  • Use Cases: place-based education; project research and drafting
  • Development: adaption for other models; specification of issue areas (i.e., climate impact on indigenous lands, underresourced languages, etc.)

1-C. Terms and Ethics

This custom GPT operates at the intersection of geographic information systems, indigenous studies, and natural language processing. The Indigenous Protocol and Artificial Intelligence Position Paper by the Indigenous Protocol and Artificial Intelligence Working Group (2020) helps to explain the complexity of the term "indigenous":

“The emerging identity of 'indigenous peoples' has been adopted as an umbrella term by Indigenous leaders in international arenas, such as the United Nations, while simultaneously opposing a rigorous definition. The use of this term reflects the need for a collective label that supersedes the boundaries of nationstates. It encompasses over 370 million Indigenous peoples from disparate geographical and political backgrounds who, despite distinct cultural differences, share common experiences resulting from the relationship between the Indigenous peoples and present-day nation states."1

Please refer to the Indigenous Protocol and Artificial Intelligence Position Paper for further guidance on ethical considerations when engaging native and indigenous peoples in AI.

2. Approach

2-A. GPT Architecture

NatLlo is based on GPT-4, a decoder-only autoregressive transformer model. Phuong and Hutter (2022) provide the decoder-only architecture pseudocode in "Formal Algorithms for Transformers,"2 screenshot below:

Screenshot 2023-12-14 at 11 14 18

Note

Algorithm 10's pseudocode specifies the architecture for GPT-2. Phuong and Hutter explain the architecture pseudocode for GPT-3 "is identical except larger, and replaces dense attention in Line 6 by sparse attention, i.e. each token only uses a subset of the full context."

Interestingly though not surprisingly, in its technical report for GPT-4, OpenAI (2023) announces it no longer discloses details on the architectures and other aspects of its models:

"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."3

As a guess, Algorithm 10 adjusted for GPT-4 might include the following changes:

  • Line 2: updated token and positional embedding matrices
  • Line 6: more efficient or adaptive attention mechanisms, potentially beyond sparse attention
  • Lines 5, 7, 8, 10: enhanced layer normalization and MLP activation processes
  • Other: steps or parameters that address bias in the model's output, potentially impacting various lines

2-B. Task Sequence

In the first screenshot below, I instruct NatLlo to surprise me with a random geographic entity, referring to natural language processing step (i). NatLlo then proceeds to complete steps (i-v) as internally prompted.

For the exact instructions that prompt NatLlo according this NLP task sequence (i-vi), see instructionsprompt.md above and view as code.

i. Response Customization: This initial step depends on the user’s input. Responses are formatted based on whether the input includes a request for geographic entity identification, references geographic locations, or falls outside these criteria.

ii. Named Entity Recognition: As suggested in (i), NatLlo identifies geographic entities mentioned in the user-provided input.

Screenshot 2023-12-15 at 09 17 50 copy

iii. Information Retrieval: NatLlo then performs an online search to find information about native/indigenous lands associated with these entities.

iv. Data Formatting: The first part of the key output includes a structured table containing the geographic entities in (i) and the retrieved native/indigenous entities in (ii).

Screenshot 2023-12-15 at 09 17 50 copy 2

v. Text Summarization: The second part of NatLlo’s output includes a set of concise descriptive statements—one for each geographic entity and native land pair. In cases where a Wikipedia page exists for a native/indigenous land, it also inserts a hyperlink.

Screenshot 2023-12-15 at 09 17 50 copy 3

vi. Question Generation: NatLlo’s third output component involves synthesizing three new research questions based on the user's input and the identified geographic entities and native lands.

Screenshot 2023-12-15 at 09 17 50 copy 4

3. Demonstration

Currently lacking a dedicated Jupyter Notebook or a Hugging Face Space, please watch the NatLlo video tutorial linked below and above. Also consider testing NatLlo yourself using the interactive sample materials linked here:

The following screenshot displays the output NatLlo returned after receiving the poem "Ark" by Simon Armitage:

screencapture-chat-openai-g-g-KP0esL5Q8-native-lands-locator-natllo-c-3b3a7951-5a26-4547-b0e9-bc2099c3c565-2023-12-15-14_45_32

4. Critique

4-A. Strengths

  • Advanced Language Processing: GPT-4's autoregressive decoder architecture, training, and long context length effectively facilitate the language reasoning and generation required in NatLLo's latter output
  • Continual Improvement: as a custom GPT, NatLlo’s capabilities self-update alongside updates to ChatGPT as it is exposed to more data and use cases
  • Accessibility: GPT-4’s increasing multimodal capabilities bear positive implications for users requiring assistive technologies

4-B. Limitations

  • Biases: risk of biases in GPT-4’s training dataset leading to unfair interpretations of indigenous peoples
  • Errors: risk of GPT-4 providing incorrect information, especially when responding to complex requests
  • Knowledge Cutoff: ChatGPT-4's Jan. 2022 knowledge cutoff can limit the currency and breadth of information it provides
  • Latency: slow processing efficiency with documents uploaded to custom GPT’s knowledge base leads to time-outs
  • Model Appropriateness: ChatGPT-4’s general-purpose nature not conducive to accurately and reliably returning NatLlo’s multi-part output
  • Fine-Tuning: current lack of fine-tuning NatLlo on specific datasets leaves it dependent on GPT-4’s original training, likely limiting its effectiveness in certain downstream tasks
  • Evaluation: current lack of benchmarking NatLlo’s downstream performance limits knowledge of how to strategically improve its efficiency, accuracy, and reliability

5. Implications

Next steps for further developing NatLlo might involve adapting it to other transformer models and rigorously testing its capabilities with reputable benchmarks, while upholding ethical practices pertaining to the portrayal of indigenous narratives and intellectual property.

5-A. Models and Evaluation

NatLlo's three-part output might excel best when shared between multiple models and methods. Below is a possible new configuration to address some of NatLlo's limitations:

  • Named Entity Recognition: DistilBERT
  • Retrieval-Augmented Generation: data source with more comprehensive information on indigenous peoples and lands
  • Text summarization and question generation: GPT-4 or a free/open-source model like Mixtral 8x7B

Benchmarks and datasets for future evaluation, among many possibilities, include GLUE, F1, and CoNLL-2003.

5-B. Indigenous Ownership

Important

Further adaptations and deployments of NatLlo should prioritize ensuring cultural sensitivity and accurate representation of indigenous peoples, including but not limited to respecting indigenous intellectual property and cultural heritage, and promoting inclusivity and diversity to honor indigenous perspectives.

Footnotes

  1. J. E. Lewis et al., “Indigenous Protocol and Artificial Intelligence Position Paper.” Accessed: Dec. 14, 2023. [Online]. Available: https://spectrum.library.concordia.ca/id/eprint/986506/

  2. M. Phuong and M. Hutter, “Formal Algorithms for Transformers.” arXiv, Jul. 19, 2022. Accessed: Dec. 14, 2023. [Online]. Available: http://arxiv.org/abs/2207.09238

  3. OpenAI, “GPT-4 Technical Report,” arXiv.org. Accessed: Dec. 14, 2023. [Online]. Available: https://arxiv.org/abs/2303.08774v3

  4. J. Jones, “Where’s the Snow? The East Coast Is in for Another Wet Weekend.,” The New York Times, Dec. 14, 2023. Accessed: Dec. 15, 2023. [Online]. Available: https://www.nytimes.com/2023/12/14/us/rain-storm-forecast-snow.html

  5. R. Oelviani et al., “Climate Change Driving Salinity: An Overview of Vulnerabilities, Adaptations, and Challenges for Indonesian Agriculture,” Weather, Climate, and Society, vol. 16, no. 1, pp. 29–49, Dec. 2023, doi: 10.1175/WCAS-D-23-0025.1.

  6. S. Armitage, Ark. 2019. Accessed: Dec. 15, 2023. [Online]. Available: https://www.simonarmitage.com/wp-content/uploads/Amended-Ark.pdf

native-lands-locator's People

Contributors

sadkowsk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.