A new submission story.
In this section we outline the processing which happens in the background when a user supplies new information into the CVs. As a case study we shall explore what happens when we submit a new institution in the universal repository.
Issue Submission Form
The process starts with the submission of the relevant data. For this a user must find the relevant repository of the CVs provide all the required information to create such an entry.
There are two planned routes for entering the required data: - Github issue form - Dynamic website form
GitHub form
Github issue templates can be used to guide input in the form of markdown files. More recently the option to use a yml file to create a form has been added. Using this we define an input form with relevant descriptions which once filled creates a formatted issue.
For more information on the syntax click here
Where are the issue forms?
Issue forms reside in the .workflows/ISSUE_TEMPLATE/
directory. Here we explore the structure of the add_institution.yml
template. The structure of this is described below:
Issue Labels
Exploring the source code we can assign automatic labels for when the issue is created:
labels:
- alpha
- institution
Markdown
We also add a short markdown description
- type: markdown
attributes:
value: |
####
## Adding a new Institution.
Please fill in the information below.
Dropdown
And a dropdown to define what kind of action is required by the processing script. (Note although it is possible to determine this using labels, there are cases where conflicting labels may be used to identify issues that span several areas of interest, and therefore this is a more explicit method to do this.)
- type: dropdown
id: category
attributes:
label: "Issue Type"
# description: "This is pre-set and cannot be changed."
options:
- "institution"
default: 0
validations:
required: true
As this is a required value, it is also set to auto-populate with the first (0th) item. If we had an entry which could be added or modified, there may be multiple options within the issue type category selection.
Inputs
Finally we define a set of inputs giving them a label, description in markdown, example text and whether or not the input is required for form submission. Unfortunately no further validation is possible within the github forms, and has to be carried out within the action described below.
- id: label
attributes:
label: Acronym
description: |
A short acronym from which your institution can be identified with.
Note: This name must be unique across all organisations and cannot be changed once data is published with it.
placeholder: 'e.g., UoYork, MOHC, CMIP-IPO'
validations:
required: true
type: input
Dynamic Web-Form
Although not currently available, the future plans is to replicate the web form on the mipcvs.dev server. This will constitute a fastAPI server which can access the suitability of a field as it is being entered. This will mean that any errors are flagged during entry, and do not require a review, edit, test loop. The end result from the website will still be the creation of a new issue, which upon submission triggers the same post-processing functions.
Submitting a new issue
To submit a new form we start by going to the Issues tab on github. If our template yml file is correctly formatted we should be able to select if from the list of available templates.
From here we are able open the relevant form, completing any required fields.
On submission, the pre-allocated labels are appended:
and the form content saved as the main issue.
Workflow
When an issue is created or changed, we want to be able to run a series of scripts. These are called workflows. A workflow is defined in .github/workflows
, with the components of the issue specific workflow new_issue.yml being outlined below.
Workflow content
Triggers
The on section of a workflow outlines when an issue is run. Common triggers are on push, issue opened/closed, cron and dispatch.
on:
issues:
types: [opened, edited]
This means that should a submission need to be edited, the checks will run again until the relevant sections pass or the issue is closed.
Permissions
For workflows which need to interact with the repository, we need to define what the action is allowed to do. In this case, we can change the issue, repository content and create a pull request.
permissions:
issues: write
contents: write
id-token: write
pull-requests: write
Jobs
The next section defines the scripts which run. Instead of duplicating the same processing scripts in every repository we make use of reusable actions. These are modular workflows that work like functions, and can be located in a separate repository.
steps:
- name: Run the parser on a new issue
id: new-issue-action
uses: WCRP-CMIP/CMIP-LD/actions/new-issue@main
Reusable Actions.
The reusable action is located as part of the CMIP-LD repository.
Here the action begins by installing the cmipld python library
- name: cmip-ld library
id: install-cmipld
uses: WCRP-CMIP/CMIP-LD/actions/cmipld@main
new_issue
as defined in the cmipld library.
- name: Checkout repository
uses: actions/checkout@v4
- name: Run Python script
id: run_python
env:
ISSUE_TITLE: ${{ github.event.issue.title }}
ISSUE_BODY: |
${{ github.event.issue.body }}
ISSUE_SUBMITTER: ${{ github.event.issue.user.login }}
ISSUE_NUMBER: ${{ github.event.issue.number }}
GH_TOKEN: ${{ github.token }}
GITHUB_TOKEN: ${{ github.token }} # We need this to update the issue
run: |
new_issue
CMIPLD command line scripts.
When installing a python package, we are able to define entry-points. These are python functions which can be run as command line scripts. For example the new_issue
command executes the main
function from cmipld.generate.new_issue
.
This starts by parsing the issue content:
issue = get_issue()
parsed_issue = parse_issue_body(issue['body'])
We then use the issue_type category field to determine what script to run, and load it
issue_type = parsed_issue.get('issue_type', '')
script_path = f"{path}{issue_type}.py"
spec = importlib.util.spec_from_file_location(issue_type, script_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
Finally we execute the run
function of the newly imported module and provide it with the issue content, and respective number.
module.run(parsed_issue,issue)
Issue specific script: Processing
The scripts for each issue will be placed in the .github/ISSUE_SCRIPT/
directory and will have the same name as the issue_type (see above).
import sys
from pathlib import Path
import update_ror
import json, os
from cmipld.utils import git
from cmipld.tests import jsonld as tests
from cmipld.tests.jsonld.organisation import ror
path = './src-data/organisation/'
def similarity(name1, name2):
from difflib import SequenceMatcher
matcher = SequenceMatcher(None, name1, name2)
similarity = matcher.ratio() * 100
return similarity
def run(issue, packet):
git.update_summary(f"## Issue content\n ```json\n{json.dumps(issue, indent=4)}\n```")
ror = issue['ror']
acronym = issue['acronym']
id = acronym.lower()
title = f'{issue["issue_type"].capitalize()}_{acronym}'
git.update_issue_title(title)
git.newbranch(title)
acronym_test = tests.field_test(tests.components.id.id_field)
ror_test = tests.field_test(tests.organisation.ror.ror_field)
if ror != 'pending':
tests.run_checks(acronym_test, {"id": id})
tests.run_checks(ror_test, {"ror": ror})
data = update_ror.get_institution(ror, acronym)
ranking = similarity(issue['full_name_of_the_organisation'], data['long_label'])
git.update_summary(f"## Similarity\nThe similarity between the full name ({issue['full_name_of_the_organisation']}) of the organisation and the long label ({data['long_label']}) is {ranking}%")
if ranking < 80:
git.update_issue(f"Warning: The similarity between the full name of the organisation and the long label is {ranking}%")
else:
tests.run_checks(acronym_test, {"id": id})
data = {
"id": f"{id}",
"type": ['wcrp:organisation', f'wcrp:{issue["issue_type"]}', 'universal'],
"label": acronym,
}
git.update_summary(f"## Data content\n ```json\n{json.dumps(data, indent=4)}\n```")
outfile = path + id + '.json'
json.dump(data, open(outfile, 'w'), indent=4)
if 'submitter' in issue:
os.environ['OVERRIDE_AUTHOR'] = issue['submitter']
author = os.environ.get('OVERRIDE_AUTHOR')
git.commit_one(outfile, author, comment=f'New entry {acronym} in {issue["issue_type"]} files.', branch=title)
git.newpull(title, author, json.dumps(issue, indent=4), title, os.environ['ISSUE_NUMBER'])
The pull request is also linked to the issue, allowing it to be closed on merge.
Action output.
It is possible to view the status of the action under the actions tab in GitHub. Here we can select the relevant action, view what has been run, and the console output from this (should an error occur).
Additionally when opening a specific action, the action summary has been updated to provide an update of what has been done. An example of this can be seen below.
Quick summary of the submission process
graph TB
A[Run Script] --> B[Update Summary: Issue Content]
B --> C[Update Issue Title and Create Branch]
C --> D[Run Checks]
D --> E[Update Summary: Data Content]
E --> F[Write Data to File]
F --> G[Commit Changes]
G --> H[Create Pull Request]