docs/OSEE/ReqAndDesign.md - osee/org.eclipse.osee - Git at Google

 ## Fast, highly relevant search

 ### Requirements

   - support regular expression searches
   - support case-insensitive and case-sensitive (use second pass)
   - history on name and url changes
   - history on search terms changes
   - access control

 ### Design

   - Query Parser Steps

 <!-- end list -->

 1.  make search text lowercase
 2.  split search text into search terms by white space, allow quotes to
     enclose a search term that includes white space
 3.  strip leading and trailing characters that are neither letters or
     digits
 4.  remove trailing 's (possessive case)
 5.  discard single char terms that are not digits (only implement if
     proven useful)
 6.  transform words into singular form
 7.  discard duplicate terms
 8.  discard stop words. see
     <http://www.gobloggingtips.com/wp-content/uploads/2014/08/Google-stopwords.txt>
 9.  truncate term if longer than 128 characters (max token length)
 10. sort terms
 11. compute query_hash with Arrays.deepHashCode() on the sorted terms
     from previous step

 | source text    | options | matching query | non-matching query |
 | -------------- | ------- | -------------- | ------------------ |
 | C. S. Lewis    | tbd     | C S Lewis      | CS Lewis           |
 | 75             | tbd     | 7              | 0.7                |
 | 75%            | tbd     | 75             | %                  |
 | nav_power_on | tbd     | power_on      | power on           |
 | Books          | tbd     | booK           | bookes             |
 | source text    | tbd     | matching query | non-matching query |

 Test Cases

   - Query options

 ;

 :\* Case sensitive

 :\* Match word order (no effect with 1 search term)

 :\* Include leading punctuation

 :\* Include trailing punctuation

 :\* Whole content match

 :\* Whole word match

   - Regular expression match- a list of regular Expressions that are
     individually matched against terms in the first pass, and a single
     regular expression that is applied in the second pass or blank to
     skip second pass (no other option should be combined with this one)

 #### Data Model

 `osee_search_url(BIGINT url_id, Varchar(128) title, Varchar(2048) url, valid_test_time) PRIMARY KEY (url_id) ORGANIZATION INDEX`

 :\* title - search terms will be extracted from the title using the
 query parser steps and stored with rank name

 :\* url uniqueness is enforced via atomic insert command

 :: INSERT INTO osee_search_url m1 (match_id, name, url,
 valid_test_time) SELECT ?,?,?,? FROM dual WHERE NOT EXISTS (SELECT 1
 From osee_search_url m2 WHERE m2.url = ?)

 :\* maintenance operation identifies dead links. valid_test_time is
 updated with current date and time when the link is tested as valid

 `osee_search_term(Varchar(128) text, SMALLINT match_type_id, BIGINT app_id, SMALLINT rank_type_id, BIGINT term_group_id, SMALLINT term_group_size, BIGINT match_id) PRIMARY KEY (text, match_type, term_group_id, match_id) ORGANIZATION INDEX COMPRESS 2`

 :\* text is normalized search term (see query parser steps)

 :\* match type

 :: artifact, attribute, relation (for rationale), url type

 :\* app_id

 :\* match_id can be a osee_search_url.url_id or an attribute or
 relation gamma_id (or possibly extended to any other id)

 :\* the rank allows early exclusion of unwanted term types such as
 content with the sql criteria that only includes rank \> 20

 :\* Results are returned in ascending rank order of the sum of all ranks
 for a match_id

 :\* rank_type_id:

   -

       -
         Tuple3Type{RankTypeId, Long, String} RankType =
         Tuple3Type.valueOf(SearchFamily, 12L); // rank_type, rank
         (larger is higher), rank
           -
             1 \[220\] unique id - (employee id, charge number, USPS
             Tracking number, etc.)
             2 \[200\] keyword - terms that when searched together should
             represent a top result
             3 \[ 40\] tag (meta) - terms applied by human intelligence
             to classify and categorize
             4 \[ 20\] content - terms extracted from the content body
             using the query parser steps
             5 \[ 10\] team - relevant to a given team

 type name

 :\* term_group_id - 1 when the term_group_size is 1; otherwise
 randomly generated id used to group search terms to apply a rank only
 when all terms in the group match

 `osee_search_query(BIGINT query_hash, Varchar(500) query_text, SMALLINT status, BIGINT top_match)`

 :\* CREATE INDEX OSEE_SEARCH_QUERY_TEXT_IDX ON OSEE_SEARCH_QUERY
 (QUERY_TEXT);

 :\* query_hash is computed using the Query Parser Steps

 :\* query_text exactly as entered

 :\* status

 ::\* no results

 ::\* unverified results

 ::\* verified results

 :\* top_match is the top match returned the last time this query was
 run. Useful for sharing links/ searches. consider supporting more than 1
 top matches

 :\* search prediction based on query_text with past queries of user
 sorted first

 `osee_search_share(BIGINT to_user, BIGINT from_user, BIGINT query_hash, timestamp time) PRIMARY KEY (to_user, from_user, query_hash) ORGANIZATION INDEX COMPRESS 2`

 -----

 `osee_search_history(BIGINT item_id, BIGINT user_id, timestamp) PRIMARY KEY (item_id, user_id, timestamp) ORGANIZATION INDEX COMPRESS 2`

 #### Rest Interface

 :\* Allow request to specify max number of links or all (default is 25).
 Top ranked n links are returned as json {id, name, url, search terms,
 rank}

 :\* in order to set access control on a link you have to have the access
 to the group you are setting it to, you can add anyone else

 #### User Interface

 When entering a new URL the name is initially populated from the web
 page title

 `Search results will include a why link that provides the rank information and allows users to improve this information`

 ## Fast, versioned tuple service

 ### Requirements

   - Creation and deletion of tuples shall have transactional history
   - tuple operations shall be near constant time
   - shall efficiently support tuples of numbers and strings of length 2,
     3, and 4

 #### Uses Cases

   - multiple tags per transaction with with history

 ` osee_tuple3(tx_key_type, branch_id, tx_id, attr_id, gamma_id)`

   - cross branch linking

 ` osee_tuple2(relation_type, a_art_id, b_art_id, gamma_id)`

   - user defined tables

 ` osee_tuple2(BIGINT my_table_type, BIGINT row_id, String row_data, gamma_id)`

 :\* the row_id allows the rows to be returned in order, also allows
 showing history of a given row

 :\* the row_data stores the json representing the row data

 ### Design

 #### Data Model

   - Tuples provide a high performance general purpose storage that
     leverages index-organized tables. Traditional database tables are
     unorganized heaps that require indexes for fast access. If the size
     of individual rows can be kept small, then index-organized tables
     can be used and remove the overhead and duplication of also having
     the unorganized heap. Additionally, these tables support key
     compression which further improves performance and reduces storage
     requirements.

 `osee_tuple2 (tuple_type, element1, element2, gamma_id) PRIMARY KEY (tuple_type, element1, element2) ORGANIZATION INDEX COMPRESS 2;`
 `osee_tuple3 (tuple_type, element1, element2, element3, gamma_id) PRIMARY KEY (tuple_type, element1, element2, element3) ORGANIZATION INDEX COMPRESS 2;`
 `osee_tuple4 (tuple_type, element1, element2, element3, element4, gamma_id) PRIMARY KEY (tuple_type, element1, element2, element3, element4) ORGANIZATION INDEX COMPRESS 3;`
 `unique index: gamma_id`

   - Each tuple has a type. See
     org.eclipse.osee.framework.core.enums.CoreTupleTypes
   - gamma_id provides for branching and history. Tuples can be created
     and deleted on a branch using transactions
   - elements can be of type number or string. A string is stored as a
     key to the osee_key_value table
   - In the table each tuple element is stored as a BIGINT and if it
     represents a String then this number is a key into the
     OSEE_KEY_VALUE table where the string is stored. This approach
     keeps each row in a tuple table to be of a small fixed size.

 ## Product Line Engineering

 ### Requirements

   - User can quickly select branch-view without requiring knowledge of
     corresponding version of product line. Select by branch-view name
     (i.e. Country X San) which is short hand for selecting My Product
     Line 2.0 branch with Country X view.
   - View applicability has history and can vary independently by branch.
   - External tools can join on view applicability in their sql queries
   - Need a list of branch-views to show in branch selection dialogs

 ### Definitions

   - Applicability_Id - just an id
       - part of osee_txs table, and thus versioned.
       - applicability id per artifact, attribute and relation.
       - applicability can be on sub attribute level using grammar within
         Word or other content
   - Feature - decomposed item of product or configuration
       - is an Applicability_Id
       - represented as unique NAME, ID
       - can be a single Feature or some mathematical combination of
         Features. eg: ROBOT_ARM_LIGHT, ROBOT_SPEAKER, ROBOT_ARM_LIGHT & ROBOT_SPEAKER
   - Variant - combination of Features that make up a version of a
     product
       - is an Applicability_Id
       - represented as unique NAME, ID
   - View - cross-section of a Branch that shows items applicable to that
     view
       - either a Configuration, Feature or grouping
       - read-only. Editing can only be done on parent Branch
       - what is shown is computed from selected Configs, Features

 ### Design

 #### Data Model

   - item applicability is fully transactional on all branches

 `osee_txs (branch_id, transaction_id, gamma_id, tx_current, mod_type, app_id)`

 Default value for app_id in osee_txs is 1 which means in all views
 (base). View artifacts are on the product line branches.

   - view applicability

 `osee_tuple2 (CoreTupleTypes.ViewApplicability, ArtifactId view, String applicabilityText, gammaId)`

 The view applicability tuples and view artifacts are on the product line
 branches thus maintaining full transactional history. The applicability
 text is the feature applicability text or the name of the of the view
 for non product line engineering use of views. Each view will be mapped
 to applicability id 1 (Base) so that base will be included for all
 configs by simply joining on the view_id. The featureApplicabilityId is
 the randomly generated key to applicabilityText in osee_key_value.

   - named branch-views

 `osee_tuple3 (CoreTupleTypes.BranchView, BranchId branch, ArtifactId view, String name, BIGINT gammaId)`

 Views are stored as artifacts of type "Branch View" on the branch their
 are views of. Named branch-views are stored as tuples on the common
 branch. These are used to populate the list of branches faster than
 searching and loading every "Branch View" on every branch. The branch
 view is accessed via BranchId.getView()

   - feature applicability definition

 `osee_tuple4 (app_feature_tuple_type, app_id, feature_id, value_attr_id, and_or_flag, gamma_id)`

 The Feature artifact has an attribute for each possible value. It also
 has a name, description, abbreviation, and type (single or
 multi-valued).

 `osee_tuple3 (compound_app_tuple_type, app_id, app_id, and_or_flag, gamma_id)`

   - feature applicability involving more than one feature is stored as a
     list of feature applicability (recursive to any depth).

 <!-- end list -->

 ``` sql
 -- return all items (referenced by gamma_id) currently in a view of a branch
 select * from osee_txs txs, osee_tuple2 app where branch_id = ? and tx_current = 1 and txs.app_id = app.e2 and e1 = ?;
 -- applicability clause in SQL is only applied when using a branch view
 -- select applicability for a given view (e1)
 select e2, value from osee_txs txs, osee_tuple2 app, osee_key_value where tuple_type = 2 and e1 = ? and app.gamma_id = txs.gamma_id and branch_id = ? and tx_current = 1 and e2 = key;
 ```

 ## Activity Logging and Monitoring

 ### Requirements

   - shall handle creation/update of fine-grained log entries for at
     least 500 concurrent users
   - shall support logging by OSEE and other applications
   - the web of log entries related to an individual instance of a user
     request shall be able to be hierarcically related
   - log entries shall be quickly accessible based on any combination of
     source, user, timestamp, log type, duration, status
   - log entries shall be accessible (especially) when an application
     server is unresponsive
   - log entries shall be available until they are deleted by an admin or
     admin policy (applied by server automatically)
   - at run-time logging shall be enabled/disabled based on any
     combination of user, source, log level, and type
   - access control shall be applied at the log entry type basis

 ### Design

 #### Data Model

   - osee_activity db table

 :\* Log entry in Java: BIGINT entryId, BIGINT parentId, BIGINT typeId,
 BIGINT startTime, BIGINT duration, BIGINT agentId, BIGINT status, String
 msgArgs

 :\* entry_id - random long returned for log method call

 :\* parent_id - id of entry used for grouping this and related entries.
 For root entries, it is the negative of session id of the client or the
 server id. Ranges are used to group by client/server kind (IDE client,
 app server, rest client).

 :\* type_id - foreign key to type_id in osee_log_type table

 :\* start_time - long with ms since epoch

 :\* duration - starts at -1 and is never updated if duration does not
 apply, otherwise updates when the associated job ends with duration in
 ms

 :\* account_id - long account id (the account_id returned from account
 management services

 :\* status:

 ` 0     initial value`
 ` 1-99  percent complete`
 ` 100   completed normally`
 ` 101   completed abnormally`

 :\* msg_args newline separated list of strings used with
 String.format(msg_format, msg_args);

   - Each new log entry's parent_id, agent_id is mapped to the thread
     that created it (only the most recent mapping per thread is
     maintained)
   - When an exception is thrown, it is logged as a child of the parent
     corresponding to the current thread. If no mapping is found in
     ConcurrentHashMap{Thread, Pair{Long, Long}}()

 <!-- end list -->

   - Log entry type in DB: type_id, log_level, software_unit,
     message_format
       - type_id - a fine-grained application defined type, random id,
         defined as tokens and stored in the db for cross application
         support
       - log_level - as defined by java.util.logging.Level
       - module - application defined name of the software unit that uses
         this log entry type
       - msg_format - format defined by
         [java.util.Formatter](http://docs.oracle.com/javase/6/docs/api/java/util/Formatter.html)
         or if blank the raw message details are used directly

 #### High Performance

   - 2 ConcurrentHashMap are allocated with an initial configurable size:
     newLogEntires, updatedEntries
   - newly created log entries are added to newLogEntires using the
     entry_id as the key and the array of sql insert parameters as the
     value
   - updated log entries are checked for in newLogEntires and updated if
     they exist, otherwise the update map is checked and updated if
     exists, else added to updatedEntries
   - A timer tasks runs at a configurable (short) periodic rate and batch
     inserts the log entries in the insert map and then runs the updates.
     This means that any update to a log entry that occurs in less than
     this configured time will not require a database update (i.e.
     writing the duration of a short operation). This also means only one
     thread writes to the log table per JVM.
   - new DrainingIterator(newLogEntires.values().iterator()) is used to
     iterate through the values and remove them one at a time during the
     batch insert
   - upon server shutdown must flush log
   - IDE client will directly use the same service that is used on the
     server
   - [data structure
     options](http://stackoverflow.com/questions/8203864/the-best-concurrency-list-in-java)
   - [Optimize JDBC
     Performance](http://www.precisejava.com/javaperf/j2ee/JDBC.htm)

 #### Java API

 ``` java
  Long createThreadEntry(long userId, Long typeId);

  Long createThreadEntry(long userId, Long typeId, long parentId);

  Long createEntry(Long typeId, Object... messageArgs);

  Long createEntry(Long typeId, Long parentId, Object... messageArgs);

  void updateEntry(Long entryId, Long status);

  Long createExceptionEntry(Throwable throwable);
 ```

   - The first interface to the logging data can be the basic REST
     navigation

 ## Exception Handling

 ### Requirements

   - avoid unnecessary wrapping of exceptions

 ### Design

 [Checked exceptions I love you, but you have to
 go](http://misko.hevery.com/2009/09/16/checked-exceptions-i-love-you-but-you-have-to-go/)
 [Why should you use Unchecked exceptions over Checked
 exceptions](http://jyops.blogspot.com/2012/03/why-should-you-use-unchecked-exceptions.html)
 [Clean Code by Example: Checked versus unchecked
 exceptions](http://convales.blogspot.com/2012/09/clean-code-by-example-checked-versus.html)

   - Use application specific exceptions that extend RuntimeException -
     application specific allows for setting exception breakpoints in the
     debugger
   - Do not declare any run-time exceptions in any method signatures

 ## Artifact Query and Loading

 ### Requirements

   - As a developer and user, I would like quick searching/loading in as
     few sql statements provided from the server in a single call
       - Allow search rest call on server to load artifacts and just send
         data back necessary to construct artifacts/relations instead of
         client making more database calls to load
       - Consolidate the searching/loading from 8 queries to 1 or 2
   - As a developer, I would like to specify which relations and
     attributes and artifact data to load for the results of my query.
       - Provide selective loading
   - As a developer, I would like to search for artifacts and then load
     related artifact to a certain level. Example: ATS loads 3 Team
     Workflows, then needs to bulk load related actions, tasks and
     reviews.
       - Add ability to bulk load to search criteria. Include load level
         and list of relations to follow.
   - As a developer, I would like to be able to load artifacts with only
     artifact data (ids, types, etc) and a few attributes like Name.
     Example: Artifact Explorer shouldn't need to load the entire
     artifact when only the Name, Type and whether it has children is
     needed. Another example is Quick Search results, where all you need
     is the basic data from the results until the user is interested in
     selecting a resulting item.
       - Allow ability to specify artifact data and minimal attrs or
         relations.
       - Also provide case where want to "Show Relations" which would
         require knowing what relations exist and their types.
   - fast, scalable

 ### Design

 ## Dynamic Web App

 ### Requirements

   - As a developer and user, I would like to be able to specify the
     content of a web page by providing a JSON description.
       - Allow a group of controls to be displayed, and to specify REST
         calls for obtaining content and changing content.

 ### Design

 We have adopted JSON Forms to supply this capability, but have written
 our own Angular JS controls to support the capability to have each
 control have its own REST calls. All of the controls in the box below
 additionally provide links for getting and setting data. This allows the
 page to update and check data one control at a time.

 `Text box control: Similar to the JSON Forms text control.`
 `Checkbox control: Similar to the JSON Forms checkbox.`
 `Multi-line text box control: Provides a resizeable multi line text box.`
 `Dropdown control: Similar to the JSON Forms dropdown control.`
 `Multi-select Dropdown control: Wraps popular dotansimha Angular JS Multiselect Dropdown.`
 `Program-version control: a special case of the multi select dropdown to support selection of OSEE ATS program and version.`
 `Table control: wraps Angular UiGrid control to allow for table data.`

 ## Distributed OSEE Repositories

 ### Level 0 capability

 Branches are private to their own OSEE repository, but distributed IDs
 are used to avoid future ID collisions between OSEE repositories.
 Implementation Note: IDs are randomly generated, except transactions
 which are numbered sequentially on a per branch basis (avoids
 transaction collisions since transaction ids can't be random due to max
 transaction calculations)

 ### Level 1 capability

 A branch may be cloned from the source OSEE repository into a
 destination repository as a read-only branch that will automatically
 update when network connectivity exists between the two repositories.
 Note: any repository may request to clone a branch from another
 repository. If the request is granted the source repository marks its
 branch as source and then allows the clone.

 ### Level 2 capability

 The branch updates for a cloned branch will also be able to be performed
 via an export of the source updates to the file system followed by an
 import in the OSEE that has the cloned branch.

 ### Level 3 capability

 A branch may be bidirectionally shared between OSEE repositories. The
 OSEE repository of the source branch tracks the branch lock. Other OSEE
 repositories start with a clone of the shared branch. In order for any
 OSEE to commit a transaction to the shared branch, it will first acquire
 the branch lock from the source repository. All transactions from the
 source are copied to the other repositories.

 ## Fast, Robust Attribute Storage

 ### Requirements

   - System can store and provide fast access to small attributes in bulk
   - System can store and provide fast access at least 1 MB attributes
   - System can store at least 1GB attributes

 ### Design

 #### Data Model

   - each osee attribute type explicitly lists supported search types:
     exact match, case insensitive exact match, partial match, or none

 `osee_search_hash (`
 ` match_type`
 ` app_id`
 ` term_hash`
 ` art_id`
 ` gamma_id)`

   - osee_attribute.value stores boolean, numeric, enumeration, date,
     strings, and binary

 `osee_attribute (art_id, type_id, item_id, order, gamma_id, value)`

 Binary data is rare and often large and is never searched directly
 (instead it is tokenized and indexed) so Store it in an iot with the
 gamma_id as key

 ``` sql
 -- return all attributes for a given artifact currently in a view of a branch
 select * from osee_txs txs, osee_tuple2 app where branch_id = ? and tx_current = 1 and txs.app_id = app.e2 and e1 = ?;
 -- applicability clause in SQL is only applied when using a branch view
 -- select applicability for a given view (e1)
 select e2, value from osee_txs txs, osee_tuple2 app, osee_key_value where tuple_type = 2 and e1 = ? and app.gamma_id = txs.gamma_id and branch_id = ? and tx_current = 1 and e2 = key;
 ```

 ## REST

 ### Design

   - The path component contains hierarchical data that, along with data
     in the non-hierarchical query component serves to identify a
     resource [RFC3986 3.3 first
     paragraph](https://tools.ietf.org/html/rfc3986#section-3.3)
   - name of JSON keys should be camelCase just like Java
   - each segment in the REST URL has a singular name even the name of a
     collection of resources
   - GET/HEAD/PUT/DELETE shall not use a request message body according
     to originator of REST [Roy T.
     Fielding](https://lists.w3.org/Archives/Public/ietf-http-wg/2020JanMar/0123.html)
   - URL all lowercase with words separated with hyphens
   - MUST use POST when the action is not idempotent. GET, PUT and DELETE
     methods are required to be idempotent.
   - [REST APIs must be
     hypertext-driven](http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven)